DML Analysis Part 38

Gene enrichment and description

Now that I have annotated DML tables, I want to try a gene enrichment and find a good way to describe the functions of the genes in my annotations. Ideally, I can use the Uniprot Accession codes to get GOterms to do gene enrichment and to describe functions.

Obtaining GOterms

Turns out I had this exact issue last year (have I really not progressed…? shudders). After some back-and-forth with Sam and Shelly, I realized I could download the Uniprot-SwissProt databse with additional Gene Ontology columns. I went to this website and added columns of interest. I then downloaded the database as a tab-delimited file. I initially tried downloading it as a FASTA, but Sam pointed out that I needed to download it as a .txt file if I wanted to maintain the additional columns. My file had the following columns:

  • Entry (Uniprot-Accession)
  • Entry Name (Uniprot-ID)
  • Status (reviewed)
  • Protein names
  • Gene names
  • Organism
  • Length
  • Gene ontology IDs
  • Gene ontology (GO)
  • Gene ontology (biological processes)
  • Gene ontology (cellular component)
  • Gene ontology (molecular function)

I then imported the file in this R Markdown file. I skimmed some of the columns off, so my final annotation tables (found in this folder) now include GO-ID, GO-BP, GO-CC, and GO-MF.

Functional description

Now that I had GOTerms assigned to genes, I could try grouping GOterms together to describe genes. For each Uniprot Accession code, I have three different GOterm categories: biological processes, cellular component, and molecular function. For my DML-exon and DML-intron annotations, I isolated the first three GO-BP codes for each Uniprot accession code with an e-value no larger than 10-10. I used count in the dplyr package to create summary tables, found here:

It’s really good information, but I’m not sure how to include such long tables in a paper. I think I’ll need to map the GOterms to parent (or grandparent) GOterms similar to what Shelly did. I’ll review her lab notebook to see how she did that.

Gene enrichment with DAVID

I know one method of obtaining GOterms from Uniprot Accession codes it to use DAVID. Before I could do this, I needed to match Uniprot Accession codes to my gene background file. I returned to this Jupyter notebook to use intersectBed and characterize DML background and mRNA overlaps. In my R Markdown file, I matched Uniprot Accession codes to DMLBackground-mRNA overlaps.

I then performed a gene enrichment with DAVID and put the output here. No surprise: nothing was significantly enriched.

A new approach

During our NSF E20 meeting, Steven suggested I focus on enrichment instead of description. Katie pointed out a different gene enrichment tool, GO-MWU. I’ll tackle this next.

Going forward

  1. Perform gene enrichment with GO-MWU
  2. Work through gene-level analysis
  3. Update methods and results
  4. Update paper repository
  5. Outline the discussion
  6. Write the discussion
  7. Write the introduction
  8. Revise my abstract
  9. Share the draft with collaborators and get feedback
  10. Post the paper on bioRXiv
  11. Prepare the manuscript for publication
Written on June 27, 2019