Hawaii Gigas Methylation Analysis Part 23

Revising figures and numbers in text

Welp…back at this paper again! We have a special issue deadline of April 18, so this time I actually have to finish things up. I started by picking up where I left off in this poorly written lab notebook post.

Hunting down numbers and contingency tests

The first thing I wanted to do was update the manuscript with numbers from my methylKit analyses. I referred to this Jupyter notebook to get the total number of DML found for pH and ploidy contrasts, and the number of DML that overlapped with various genome features. In this process, I realized I needed to remake some figures. I remade the figure showing the number of 5x CpGs across the genome in various methylation categories in this R Markdown script.

Image

Figure 1. Distribution of all 5x CpGs, highly methylated CpGs, moderately methylated CpGs, and lowly methylated CpGs across various genome features.

I then made a similar figure showing the distribution of ploidy-DML and pH-DML across various genome features in this R Markdown script.

Image

Figure 2. Distribution of ploidy-DML and pH-DML in various genome features

In the same R Markdown script, I used a contingency test to understand if the distribution of ploidy-DML or pH-DML were significantly different from all 5x CpGs in the C. gigas genome. Interestingly, only the distribution of pH-DML in intergenic regions was significantly different. The distribution of ploidy-DML was not significantly different for any genomic feature.

Heatmaps

The last thing I wanted to do was remake heatmaps for my DML. I opened this R Markdown script to make heatmaps using pheatmap.

Image

Image

Figures 3-4. Heatmap of ploidy-DML and pH-DML

So…….these don’t inspire any confidence in the DML identification. I think there are two things at play here:

  • Using a minimum of eight samples/locus
  • Holding pH as a covariate when examining ploidy-DML, and vice versa

I created this issue and asked Steven to examine the DML in IGV. If we don’t have any confidence, then I think the next step is to redo methylKit without the covariate to see if that improves our confidence in the DML.

Going forward

  1. Review methylKit results
  2. Examine distribution across chromosomes
  3. Add Rajan’s comments to the Google Doc
  4. Update methods
  5. Update results
  6. Revise discussion
  7. Revise introduction
  8. Transfer scripts used to a nextflow workflow
Written on February 18, 2025