DML Analysis Part 27

Resolving DML and DMR visualization issues

TL;DR Certain things are a dumpster fire so it’s time to put out the dumpster fire.

Quick overview

I looked at two different chromosomes to see what was happening. Here are some things I learned:

  1. The gene background is hot flaming garbage.

1-genebackground

2-genebackground

It should just be CG motifs that have 3x coverage or more, but instead it includes multiple loci that aren’t cytosines.

  1. The DML overlap CG motifs better.

3-DML

4-DML

I mildly trust the DML track.

  1. The 100 bp DMR are inconsistent

7-DMR

9-DMR

Sometimes DMR include mulitple CG motifs or DML, but sometimes they don’t. I think using a step size of 100 bp may be influencing this (more on that below).

  1. The 1000 bp DMR make no sense

8-100-1000-DMR

10-DMR

The 1000 bp DMR are heavily influenced by regions where one sample is hypermethylated in that region, but other samples aren’t methylated there (or have no data). I did find an instance or two where multiple samples were hypermethylated in a 1000 bp DMR, but these were rare over the two chromosomes I looked at.

Generate 5x and 10x DML tracks

At this point, I mildly trust DML. We’re currently using 3x coverage for analyses, but previous papers have used 5x coverage. We decided to look at 5x and 10x DML tracks as well.

In this R Markdown file, I created 5x and 10x DML bedfiles (found here). Steven pointed out that he used destrand = TRUE in his unite command, and I did not. My current stranded output includes + or - indications for forward and reverse strands, but normal discussion of methylated loci does not include strandeness. According to the methylKit manual, destrand = TRUE provides better coverage for CpG methylation. I created 5x and 10x coverage tracks using both destrand = FALSE, the default, and destrand = TRUE. To visualize everything in my IGV session, I also created 5x sample coverage tracks, found here.

  1. Some DML were retained through all coverage types.

1-all3

I didn’t see much loss going from 3x to 5x DML tracks, but there were more DML lost from 5x to 10x.

  1. Some DML were ony present in destranded 5x or 10x tracks.

3-weird

This is probably a result of the increased coverage in the destranded tracks.

  1. I also found something really weird.

4-weird

This CpG had a DML on the forward strand in the 3x, 5x, and destranded tracks, but on the reverse strand in the 10x track. Not sure how this can happen…

Going forward

  1. Figure out which DML track to use for remaining analyses and characterize DML locations in that track
  2. Describe methylation irrespective of treatment
  3. Work through gene-level analysis
  4. Figure out what’s going on with DMR
  5. Figure out what’s going on with the gene background
Written on March 13, 2019