Hawaii Gigas Methylation Analysis Part 15

Investigating methylation data

Quick notebook post to update my progress characterizing this dataset! I followed workflows I established with the Manchester data to characterize the general methylation landscape and understand DML locations.

Characterizing methylation landscapes

I used this Jupyter notebook to obtain methylated, sparsely methylated, and unmethylated CpG loci with at least 5x coverage in my sequencing data. Similar to the Manchester dataset, I created a union BEDgraph to concatenate percent methylation across samples, then used this union file and individual sequence files for downstream analyses. As I used intersectBed to identify methylated, sparsely methylated, and unmethylated loci for 25 files, then determined the genomic location of the loci in those output files, I realized I was producing a lot of intermediate output files! When I go visualize this data, I need to find a way to easily take file line counts and turn into a data table. But that’s a problem for future Yaamini.

DML locations

Now to DML locations! Based on my methylKit results, I decided to use a 25% methylation difference to define differential methylation. I first converted the DML lists into BEDfiles in this Jupyter notebook, and visualized the DML and 5x sample BEDgraphs in this IGV session. I color-coded the samples so that light blue = 2N + high pH, light purple = 3N + high pH, dark blue = 2N + low pH, and dark purple = 3H + low pH. I looked at a couple of the DML and saw the patterns matched pretty well with what I thought a pH- or ploidy-DML should look like. After having a quick look, I used intersectBed to find overlaps between pH- and ploidy-DML, and discern the genomic location of DML. Interestingly, only two DML overlapped between treatments! As expected, a majority of DML were found in genic regions. I also looked at overlaps between DML and C/T SNPs I filtered in this Jupyter notebook. Again, I need to find an efficient way to take line counts from my intersectBed output files and format them in a table!

Going forward

Test-run DSS and ramwas
Try EpiDiverse/snp for SNP extraction from WGBS data
Run methylKit randomization test on mox
Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
Transfer scripts used to a nextflow workflow
Update methods
Update results
Create figures

Written on May 17, 2021