Hawaii Gigas Methylation Analysis Part 24

Reboot: The final countdown (for the Hawaii paper)

This summer I am determined to finish this paper (third time is the charm, right?)! Not just determined, but I’ve actually set aside time to work on this paper and am doing Roberts Lab pubathon. So yeah, it’s going to get done. My goal for the summer is to have an updated draft that can go out to any other co-authors. Bonus points if I submit for publication, but I’m also happy to do that fall quarter. Here’s what stands between me and that goal (besides just revising different parts of the paper):

  • Alas, enough time has passed that there is a new C. gigas genome! At first I couldn’t find the files I needed so I thought I hallucinated the whole thing and posted a Github issue. Obviously, once I posted the issue I found files to indicate that yes, there was a new genome and Steven had run bismark with the new genome. Sam confirmed my findings and found all the associated output on gannet. So now I have to incorporate that into the analysis.
  • A few methodological novelties I want to try with this dataset include a randomization test with methylKit, integration with C. gigas ATAC-Seq, csRNA-Seq, and 5’-GRO data, and KOG-MWU for DML comparison with other Crassostrea spp. epigenetic studies.
  • Figure out if we can get the pH and water quality data. Probably a long-shot this far out…

Genome comparison for bismark results

I decided to start by just recapping the bismark results from the previous genome, and comparing that with the new genome.

Table 1. Comparison of mapping statistics across two genomes.

Metric Roslin Genome xbMagGiga1 Genome
Alignment score -0.6 -0.8
Alignments (# reads) 3.1 x 107 3.2 x 107
Alignments (%) 61 63
CpG methylation (%) 10.4 10.6

One thing to note here is that Steven and I used different mapping parameters. Steven used –score-min L,0,-0.8 while I used –score-min L,0,-0.6. Steven used a mapping parameter that was less sensitive and specific, which could explain the differences in the number and percent of alignments. I would prefer to use the more stringent alignment parameters, so I would need to talk to Steven about redoing the bismark alignments with the more stringent parameters. However, CpG methylation percentages are comparable across the two genomes and differing alignment parameters.

I also found that there was a different folder with bismark output from the new genome here. This folder seems to only contain the coverage files and bedgraphs that are created after alignment.

Genome comparison for methylKit results (or an attempt)

Now that I had a rough understanding of the differences in alignment between the genomes, I wanted to understand if that translated to a difference in methylation identification. The first thing I did was open this R Markdown script (side note: I also updated R and R Studio since it’s that time of year!). I also loaded the saved RData I had for methylKIt to avoid having to run unite again.

Anyways, I needed to 1) confirm that the number of DML I previously counted and 2) identify DML using the new alignments. The first step is important because when I tried looking at the data a year ago, the heatmaps did not insprie confidence in the DML identification. When reviewing my script I noticed that I processed my files using ploidy treatment information specifically. I wonder if this impacts downstream methylKit analysis. So perhaps I would need to run things from scratch at some point after all.

While I had my R Markdown script all loaded up, I remembered the biggest issue with running methylKit on a laptop: memory! R needs to process so much data that on a standard device the memory limit gets exhausted.

I posted this issue to request access to Gannet and Raven so I can run methylKit with the new alignments and compare the results.

Going forward

  1. Revise bismark methods and results
  2. Revise methylKit methods and results
  3. methylKIt randomization test
  4. ATAC-Seq data integration
  5. KOG-MWU for Crassostrea methylation comparison
  6. Revise discussion
  7. Revise introduction
  8. Transfer scripts used to a nextflow workflow
Written on July 2, 2026