Gonad Methylation Analysis Part 8

What are those?!


What did I generate using bismark alignment and bismark2report? Time to find out.


I thought I’d start with the HTML report since it doesn’t require any additional software finnagling. These reports are really cool! They provide basic statistics like the number of analyzed sequences, the kind of alignments, kind of cytosine methylation, and alignment to individual bisulfite strands. I think the most important information at this stage is the alignment. I summarized the alignment information from each file set:

zr2096_1_s1_R1.fastq.gz and zr2096_1_s1_R2.fastq.gz

  • Multiple alignments: 11%
  • Unique alignments: 30-32%
  • No alignment: 56-58%

zr2096_2_s1_R1.fastq.gz and zr2096_2_s1_R2.fastq.gz

  • Multiple alignments: 19%
  • Unique alignments: 54%
  • No alignment: 25%

zr2096_3_s1_R1.fastq.gz and zr2096_3_s1_R2.fastq.gz

  • Multiple alignments: 24%
  • Unique alignments: 62-64%
  • No alignment: 11-12%

zr2096_4_s1_R1.fastq.gz and zr2096_4_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 61%
  • No alignment: 15%

zr2096_5_s1_R1.fastq.gz and zr2096_5_s1_R2.fastq.gz

  • Multiple alignments: 22%
  • Unique alignments: 61-62%
  • No alignment: 15-16%

zr2096_6_s1_R1.fastq.gz and zr2096_6_s1_R2.fastq.gz

  • Multiple alignments: 23%
  • Unique alignments: 63-64%
  • No alignment: 12-13%

zr2096_7_s1_R1.fastq.gz and zr2096_7_s1_R2.fastq.gz

  • Multiple alignments: 21%
  • Unique alignments: 63-64%
  • No alignment: 14%

zr2096_8_s1_R1.fastq.gz and zr2096_8_s1_R2.fastq.gz

  • Multiple alignments: 19-20%
  • Unique alignments: 56-57%
  • No alignment: 23%

zr2096_9_s1_R1.fastq.gz and zr2096_9_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 61-63%
  • No alignment: 13-16%

zr2096_10_s1_R1.fastq.gz and zr2096_10_s1_R2.fastq.gz

  • Multiple alignments: 22-23%
  • Unique alignments: 63-65%
  • No alignment: 12-13%


All of the above information is nicely summarized in the Bismark Project Summary Report!


Figure 1. Collated alignment information for all sequence data. Sample 1 refers to zr2096_1_s1_R1.fastq.gz, sample 2 is zr2096_1_s1_R2.fastq.gz, …, sample 10 is zr2096_10_s1_R2.fastq.gz


Figure 2. Percent of calls with CpG methylation for all sequence data.

bismark alignment

The Bismark User Guide suggests using a genome viewer to visualize the output SAM files. I’m a bit rusty on my IGV skills, but hopefully previous exposure in Steven’s 2016 Bioinformatics class will help!

I downloaded the latest version of the Integrative Genomics Viewer (IGV 2.4) at this website. I then uploaded the C. virginica genome I downloaded for bismark into the viewer. It wouldn’t recognize the .fa file I had originally, so I duplicated the file and changed the extension to .fasta. It allowed me to upload the genome.


Figure 3. C. virginica genome in IGV.

Now, I needed to add my alignment files. I tried adding in one of the files, but I got the following message:


Figure 4. Path to index file request.

I don’t think bismark generated any .bai files, so I’m not sure how to proceed. I saved my IGV file here, then posted this issue to get to the bottom of it.

Written on May 7, 2018