Hawaii Gigas Methylation Analysis Part 15
Investigating methylation data
Quick notebook post to update my progress characterizing this dataset! I followed workflows I established with the Manchester data to characterize the general methylation landscape and understand DML locations.
Characterizing methylation landscapes
I used this Jupyter notebook to obtain methylated, sparsely methylated, and unmethylated CpG loci with at least 5x coverage in my sequencing data. Similar to the Manchester dataset, I created a union BEDgraph to concatenate percent methylation across samples, then used this union file and individual sequence files for downstream analyses. As I used intersectBed
to identify methylated, sparsely methylated, and unmethylated loci for 25 files, then determined the genomic location of the loci in those output files, I realized I was producing a lot of intermediate output files! When I go visualize this data, I need to find a way to easily take file line counts and turn into a data table. But that’s a problem for future Yaamini.
DML locations
Now to DML locations! Based on my methylKit
results, I decided to use a 25% methylation difference to define differential methylation. I first converted the DML lists into BEDfiles in this Jupyter notebook, and visualized the DML and 5x sample BEDgraphs in this IGV session. I color-coded the samples so that light blue = 2N + high pH, light purple = 3N + high pH, dark blue = 2N + low pH, and dark purple = 3H + low pH. I looked at a couple of the DML and saw the patterns matched pretty well with what I thought a pH- or ploidy-DML should look like. After having a quick look, I used intersectBed
to find overlaps between pH- and ploidy-DML, and discern the genomic location of DML. Interestingly, only two DML overlapped between treatments! As expected, a majority of DML were found in genic regions. I also looked at overlaps between DML and C/T SNPs I filtered in this Jupyter notebook. Again, I need to find an efficient way to take line counts from my intersectBed
output files and format them in a table!
Going forward
- Test-run DSS and ramwas
- Try EpiDiverse/snp for SNP extraction from WGBS data
- Run
methylKit
randomization test onmox
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results
- Create figures