Hawaii Gigas Methylation Analysis Part 5
Evaluating bismark
output
I’ve been sitting on the Hawaii bismark
data for a while, so it’s time I actually look at it! I referred to the MethCompare repository to remind myself what Shelly did to evaluate bismark
output quality. I found this script and realized all I needed to do was 1) get multiqc
information for bismark
output, and 2) look at the data in IGV!
multiqc
When I looked at my script, I realized I never ran multiqc
on the output! I navigated to where I stored the output on gannet
and ran multiqc *
to get the reports. From this command, I got a HTML report with relevant summary statistics, as well as individual text files with the raw data (found in this folder). Looking at the general statistics, sample alignment averaged at 60%, which feels pretty good! About 12% of reads per sample aligned ambiguously, and 26% did not align at all. I was more interested in duplicated read alignment. Since I’m comparing diploid and triploid oysters, sequence duplication seems like it would be a bigger issue if there were duplicates removed from triploid oysters. Percent duplicated reads removed ranged from 10-12% for each treatment combination (ploidy x pH), so there doesn’t seem to be a treatment effect (at least at first glance). I did notice that sample 7 had more reads than any other sample by a long shot.
Figure 1. Reads aligned for reach sample
IGV
My next step was looking at the coverage files in IGV! I created coverage files that merged CpG locations (*merged_CpG_evidence.cov
), filtered the data to 5x coverage, then created bedgraphs (*5x.bedgraph
). I wanted to look at these bedgraphs in IGV with respect to genome feature files.
The first thing I did was download the lateset version of IGV (v 2.9.2) from this link. I found the links to the C. gigas genome and genome feature files in the Genomic Resources wiki. I created this IGV file using the eagle
and gannet
links, and colored samples based on treatment (light blue (1-6): 2N high pH; light red (7-12), 2N low pH; dark blue (13-18): 3N high pH; dark red (19-24): 3N low pH). From my quick browse, most methylation seems to be in the gene regions (makes sense because invertebrates), but I did find some instances of transposable element methylation.
Figure 2-3. IGV tracks
Now that I’ve looked at the data, my next steps are to load data into R to explore with methylKit
. While I won’t be able to conduct a full analysis, I can obtain sample methylataion histograms and PCAs. To identify DML, I’m going to look into DSS
and ramwas
. Addtionally, Steven brought up that there’s a revised C. gigas genome! While I work through DML identification with this dataset, I should also repeat bismark
with the new genome.
Going forward
- Align samples to the revised C. gigas genome
- Obtain preliminary methylation assessment from
methylKit
- Test-run DSS and ramwas
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results