Hawaii Gigas Methylation Analysis Part 5

Evaluating bismark output

I’ve been sitting on the Hawaii bismark data for a while, so it’s time I actually look at it! I referred to the MethCompare repository to remind myself what Shelly did to evaluate bismark output quality. I found this script and realized all I needed to do was 1) get multiqc information for bismark output, and 2) look at the data in IGV!

multiqc

When I looked at my script, I realized I never ran multiqc on the output! I navigated to where I stored the output on gannet and ran multiqc * to get the reports. From this command, I got a HTML report with relevant summary statistics, as well as individual text files with the raw data (found in this folder). Looking at the general statistics, sample alignment averaged at 60%, which feels pretty good! About 12% of reads per sample aligned ambiguously, and 26% did not align at all. I was more interested in duplicated read alignment. Since I’m comparing diploid and triploid oysters, sequence duplication seems like it would be a bigger issue if there were duplicates removed from triploid oysters. Percent duplicated reads removed ranged from 10-12% for each treatment combination (ploidy x pH), so there doesn’t seem to be a treatment effect (at least at first glance). I did notice that sample 7 had more reads than any other sample by a long shot.

Screen Shot 2021-02-21 at 3 38 45 PM

Figure 1. Reads aligned for reach sample

IGV

My next step was looking at the coverage files in IGV! I created coverage files that merged CpG locations (*merged_CpG_evidence.cov), filtered the data to 5x coverage, then created bedgraphs (*5x.bedgraph). I wanted to look at these bedgraphs in IGV with respect to genome feature files.

The first thing I did was download the lateset version of IGV (v 2.9.2) from this link. I found the links to the C. gigas genome and genome feature files in the Genomic Resources wiki. I created this IGV file using the eagle and gannet links, and colored samples based on treatment (light blue (1-6): 2N high pH; light red (7-12), 2N low pH; dark blue (13-18): 3N high pH; dark red (19-24): 3N low pH). From my quick browse, most methylation seems to be in the gene regions (makes sense because invertebrates), but I did find some instances of transposable element methylation.

Screen Shot 2021-02-21 at 4 16 59 PM

Screen Shot 2021-02-21 at 4 17 39 PM

Figure 2-3. IGV tracks

Now that I’ve looked at the data, my next steps are to load data into R to explore with methylKit. While I won’t be able to conduct a full analysis, I can obtain sample methylataion histograms and PCAs. To identify DML, I’m going to look into DSS and ramwas. Addtionally, Steven brought up that there’s a revised C. gigas genome! While I work through DML identification with this dataset, I should also repeat bismark with the new genome.

Going forward

  1. Align samples to the revised C. gigas genome
  2. Obtain preliminary methylation assessment from methylKit
  3. Test-run DSS and ramwas
  4. Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
  5. Transfer scripts used to a nextflow workflow
  6. Update methods
  7. Update results
Written on February 17, 2021