DML Analysis Part 38
Gene enrichment and description
Now that I have annotated DML tables, I want to try a gene enrichment and find a good way to describe the functions of the genes in my annotations. Ideally, I can use the Uniprot Accession codes to get GOterms to do gene enrichment and to describe functions.
Obtaining GOterms
Turns out I had this exact issue last year (have I really not progressed…? shudders). After some back-and-forth with Sam and Shelly, I realized I could download the Uniprot-SwissProt databse with additional Gene Ontology columns. I went to this website and added columns of interest. I then downloaded the database as a tab-delimited file. I initially tried downloading it as a FASTA, but Sam pointed out that I needed to download it as a .txt file if I wanted to maintain the additional columns. My file had the following columns:
- Entry (Uniprot-Accession)
- Entry Name (Uniprot-ID)
- Status (reviewed)
- Protein names
- Gene names
- Organism
- Length
- Gene ontology IDs
- Gene ontology (GO)
- Gene ontology (biological processes)
- Gene ontology (cellular component)
- Gene ontology (molecular function)
I then imported the file in this R Markdown file. I skimmed some of the columns off, so my final annotation tables (found in this folder) now include GO-ID, GO-BP, GO-CC, and GO-MF.
Functional description
Now that I had GOTerms assigned to genes, I could try grouping GOterms together to describe genes. For each Uniprot Accession code, I have three different GOterm categories: biological processes, cellular component, and molecular function. For my DML-exon and DML-intron annotations, I isolated the first three GO-BP codes for each Uniprot accession code with an e-value no larger than 10-10. I used count
in the dplyr
package to create summary tables, found here:
- DML-exons: 746 total GOterms, 261 categories
- DML-exons (hypermethylated): 172; 151
- DML-exons (hypomethylated): 574; 139
- DML-introns: 86; 154
- DML-introns (hypermethylated): 48; 78
- DML-introns (hypomethylated): 38; 80
It’s really good information, but I’m not sure how to include such long tables in a paper. I think I’ll need to map the GOterms to parent (or grandparent) GOterms similar to what Shelly did. I’ll review her lab notebook to see how she did that.
Gene enrichment with DAVID
I know one method of obtaining GOterms from Uniprot Accession codes it to use DAVID. Before I could do this, I needed to match Uniprot Accession codes to my gene background file. I returned to this Jupyter notebook to use intersectBed
and characterize DML background and mRNA overlaps. In my R Markdown file, I matched Uniprot Accession codes to DMLBackground-mRNA overlaps.
I then performed a gene enrichment with DAVID and put the output here. No surprise: nothing was significantly enriched.
A new approach
During our NSF E20 meeting, Steven suggested I focus on enrichment instead of description. Katie pointed out a different gene enrichment tool, GO-MWU. I’ll tackle this next.
Going forward
- Perform gene enrichment with GO-MWU
- Work through gene-level analysis
- Update methods and results
- Update paper repository
- Outline the discussion
- Write the discussion
- Write the introduction
- Revise my abstract
- Share the draft with collaborators and get feedback
- Post the paper on bioRXiv
- Prepare the manuscript for publication