DML Analysis Part 2
Understanding gene enrichment
TL;DR: I got confused a lot but I think I’m on the right track, but this will take a while.
I switched over to an R Markdown file and tried working with topGO
for gene enrichment. I wanted to use an R package over DAVID to improve the reproducibility of my work.
I set up some code and reformatted by DML-mRNA overlap file in the script. topGO
requires GOterms as inputs as well, so I tried using org.Hs.eg.db
to match gene IDs to GOterms. However, I ran into my main problem: I don’t have Entrez Gene IDs (official NCBI IDs) for my C. virginica genes. Without these, I cannot use any sort of gene enrichment R package, let anlone convert gene IDs to GOterms! Steven suggested I blastx
the C. virginica genome against the UNIPROT database. This would give me UNIPROT accession codes and GOterms. I can find a way to convert UNIPROT accession codes to Entrez Gene IDs to use topGO
. He also suggested I use DAVID for gene enrichment and compare the results.
I rearraged my R Markdown file and started blastx
. We’ll see how this goes…
Want to know more about my thought process for this analysis? I started using Wordpress to document intermediate thoughts! Click on the “Feed” link in the top right corner of this webpage, or navigate to these links: