Hawaii Gigas Methylation Analysis Part 15

Investigating methylation data

Quick notebook post to update my progress characterizing this dataset! I followed workflows I established with the Manchester data to characterize the general methylation landscape and understand DML locations.

Characterizing methylation landscapes

I used this Jupyter notebook to obtain methylated, sparsely methylated, and unmethylated CpG loci with at least 5x coverage in my sequencing data. Similar to the Manchester dataset, I created a union BEDgraph to concatenate percent methylation across samples, then used this union file and individual sequence files for downstream analyses. As I used intersectBed to identify methylated, sparsely methylated, and unmethylated loci for 25 files, then determined the genomic location of the loci in those output files, I realized I was producing a lot of intermediate output files! When I go visualize this data, I need to find a way to easily take file line counts and turn into a data table. But that’s a problem for future Yaamini.

DML locations

Now to DML locations! Based on my methylKit results, I decided to use a 25% methylation difference to define differential methylation. I first converted the DML lists into BEDfiles in this Jupyter notebook, and visualized the DML and 5x sample BEDgraphs in this IGV session. I color-coded the samples so that light blue = 2N + high pH, light purple = 3N + high pH, dark blue = 2N + low pH, and dark purple = 3H + low pH. I looked at a couple of the DML and saw the patterns matched pretty well with what I thought a pH- or ploidy-DML should look like. After having a quick look, I used intersectBed to find overlaps between pH- and ploidy-DML, and discern the genomic location of DML. Interestingly, only two DML overlapped between treatments! As expected, a majority of DML were found in genic regions. I also looked at overlaps between DML and C/T SNPs I filtered in this Jupyter notebook. Again, I need to find an efficient way to take line counts from my intersectBed output files and format them in a table!

Going forward

  1. Test-run DSS and ramwas
  2. Try EpiDiverse/snp for SNP extraction from WGBS data
  3. Run methylKit randomization test on mox
  4. Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
  5. Transfer scripts used to a nextflow workflow
  6. Update methods
  7. Update results
  8. Create figures
Written on May 17, 2021