DMG Analysis Part 3
Enrichment and description of differentially methylated genes
I talked to Shelly about my non-significant results. She said I could use uncorrected p-values in my results and argue that this is an exploratory study, just like she did with her crab metabolomics paper. I returned to my R Markdown script to do just that.
Methylation differences and annotation
I wanted to use GO-MWU again. I knew I needed methylation difference information between treatments to correct p-values if I wanted to use signed log p-values. To do this, I subsetted methylation information for ambient and treatment samples separately, then calculated median percent methylation by gene:
ambientSamples <- subset(fullPercentMethExpanded, subset = fullPercentMethExpanded$treatment == "Ambient") #Subset ambient samples ambientSamplesPercentMeth <- aggregate(percentMeth ~ geneID, data = ambientSamples, FUN = median) #Calculate median methylation by gene for ambient samples
treatmentSamples <- subset(fullPercentMethExpanded, subset = fullPercentMethExpanded$treatment == "Treatment") #Subset treatment samples treatmentSamplesPercentMeth <- aggregate(percentMeth ~ geneID, data = treatmentSamples, FUN = median) #Calculate median methylation by gene for treatment samples
Returning to the Kruskal-Wallis statistical output, I subsetted genes with uncorrected significant p-values — my DMG. I then added annotation information to the DMG, including Uniprot Accession codes, GO terms, and methylation differences. I repeated this process with DMG with DML. There are 223 DMG and 6 DMG with DML! I saved the annotated DMG here and annotated DMG with DML here.
Gene enrichment with GO-MWU
Gene enrichment time! I generated a GO annotation table and table of significance measures using signed log p-values from Kruskal-Wallis tests. All genes were used; not just those that are DMG. I noticed that half of the genes in the GO annotation table did not have associated GOterms. Unsurprisingly, there were no significantly enriched parent GOslim terms for biological processes, cellular components, and molecular function categories. I counted the frequency of parental GOslim terms for all genes tested and for DMG:
Alright, now it’s time to write (and maybe make some new figures who knows).
- Update methods and results
- Finalize figures
- Update paper repository
- Outline the discussion
- Write the discussion
- Write the introduction
- Revise my abstract
- Share the draft with collaborators and get feedback
- Post the paper on bioRXiv
- Prepare the manuscript for publication