DML Analysis Part 11

Testing methylKit parameters

But first, I quick side story about bismark:


I wanted to bismark on my full samples in this Jupyter notebook. However, when I tried the methylation extractor step, I ran into this issue. Turns out the hummingbird hard drive was maxed out. I spent the past few days moving large files to this gannet directory. I then removed the files from my local repository (and hummingbird). There are two takeaways from this process:

  1. All large files should live on an external server. No point keeping them in a local repository if they cannot be synced with Github
  2. I should run bismark on Mox

Before starting in on Mox and armed with more storage, I thought I could play around with methylKit on hummingbird. I finished the bismark pipeline in this notebook.


There are two things I need to test in methylKit:

  1. Different coverage metrics: Previously, I used a minimum coverage of 1. That is not good! Steven said to try 3x coverage, and after my PCSGA talk, Mac suggested I try 5x coverage. I’ll try 1x, 3x, and 5x coverage with the new -score_min parameter and see how it affects clustering.

  2. Tiling window analysis: The methylKit user guide has an in-built tiling window analysis that can pick out differentially methylated regions. So far, I’ve picked differentially methylated loci. Having DMRs will improve our exploratory analysis.

Different coverage metrics

I started in on testing different coverage in this R Markdown file. Plots depicting Percent CpG methylation and coverage for each file can be found in this folder. I think it could be useful to create a multipanel plot to display this in the future.

Because the subsets do not have enough data, I couldn’t test 5x coverage. I also couldn’t get any farther than creating methylation and coverage plots.

Going forward

Here are my priorities:

  1. Become a registered user on Mox
  2. Create a script to run bismark on the full samples
  3. Find other -score_min L,0,-1.2 data to play around with for methylKit parameter testing
  4. Figure out code for some multipanel plots
  5. Figure out tiling
Written on October 11, 2018