DML Analysis Part 2

Understanding gene enrichment

TL;DR: I got confused a lot but I think I’m on the right track, but this will take a while.

I switched over to an R Markdown file and tried working with topGO for gene enrichment. I wanted to use an R package over DAVID to improve the reproducibility of my work.

I set up some code and reformatted by DML-mRNA overlap file in the script. topGO requires GOterms as inputs as well, so I tried using org.Hs.eg.db to match gene IDs to GOterms. However, I ran into my main problem: I don’t have Entrez Gene IDs (official NCBI IDs) for my C. virginica genes. Without these, I cannot use any sort of gene enrichment R package, let anlone convert gene IDs to GOterms! Steven suggested I blastx the C. virginica genome against the UNIPROT database. This would give me UNIPROT accession codes and GOterms. I can find a way to convert UNIPROT accession codes to Entrez Gene IDs to use topGO. He also suggested I use DAVID for gene enrichment and compare the results.

I rearraged my R Markdown file and started blastx. We’ll see how this goes…

Want to know more about my thought process for this analysis? I started using Wordpress to document intermediate thoughts! Click on the “Feed” link in the top right corner of this webpage, or navigate to these links:

Written on June 15, 2018