Hawaii Gigas Methylation Analysis Part 23
Revising figures and numbers in text
Welp…back at this paper again! We have a special issue deadline of April 18, so this time I actually have to finish things up. I started by picking up where I left off in this poorly written lab notebook post.
Hunting down numbers and contingency tests
The first thing I wanted to do was update the manuscript with numbers from my methylKit
analyses. I referred to this Jupyter notebook to get the total number of DML found for pH and ploidy contrasts, and the number of DML that overlapped with various genome features. In this process, I realized I needed to remake some figures. I remade the figure showing the number of 5x CpGs across the genome in various methylation categories in this R Markdown script.
Figure 1. Distribution of all 5x CpGs, highly methylated CpGs, moderately methylated CpGs, and lowly methylated CpGs across various genome features.
I then made a similar figure showing the distribution of ploidy-DML and pH-DML across various genome features in this R Markdown script.
Figure 2. Distribution of ploidy-DML and pH-DML in various genome features
In the same R Markdown script, I used a contingency test to understand if the distribution of ploidy-DML or pH-DML were significantly different from all 5x CpGs in the C. gigas genome. Interestingly, only the distribution of pH-DML in intergenic regions was significantly different. The distribution of ploidy-DML was not significantly different for any genomic feature.
Heatmaps
The last thing I wanted to do was remake heatmaps for my DML. I opened this R Markdown script to make heatmaps using pheatmap
.
Figures 3-4. Heatmap of ploidy-DML and pH-DML
So…….these don’t inspire any confidence in the DML identification. I think there are two things at play here:
- Using a minimum of eight samples/locus
- Holding pH as a covariate when examining ploidy-DML, and vice versa
I created this issue and asked Steven to examine the DML in IGV. If we don’t have any confidence, then I think the next step is to redo methylKit
without the covariate to see if that improves our confidence in the DML.
Going forward
- Review
methylKit
results - Examine distribution across chromosomes
- Add Rajan’s comments to the Google Doc
- Update methods
- Update results
- Revise discussion
- Revise introduction
- Transfer scripts used to a nextflow workflow