DML Analysis Part 11
Testing methylKit
parameters
But first, I quick side story about bismark
:
bismark
I wanted to bismark
on my full samples in this Jupyter notebook. However, when I tried the methylation extractor step, I ran into this issue. Turns out the hummingbird hard drive was maxed out. I spent the past few days moving large files to this gannet directory. I then removed the files from my local repository (and hummingbird). There are two takeaways from this process:
- All large files should live on an external server. No point keeping them in a local repository if they cannot be synced with Github
- I should run
bismark
on Mox
Before starting in on Mox and armed with more storage, I thought I could play around with methylKit
on hummingbird. I finished the bismark
pipeline in this notebook.
methylKit
There are two things I need to test in methylKit
:
-
Different coverage metrics: Previously, I used a minimum coverage of 1. That is not good! Steven said to try 3x coverage, and after my PCSGA talk, Mac suggested I try 5x coverage. I’ll try 1x, 3x, and 5x coverage with the new
-score_min
parameter and see how it affects clustering. -
Tiling window analysis: The
methylKit user guide
has an in-built tiling window analysis that can pick out differentially methylated regions. So far, I’ve picked differentially methylated loci. Having DMRs will improve our exploratory analysis.
Different coverage metrics
I started in on testing different coverage in this R Markdown file. Plots depicting Percent CpG methylation and coverage for each file can be found in this folder. I think it could be useful to create a multipanel plot to display this in the future.
Because the subsets do not have enough data, I couldn’t test 5x coverage. I also couldn’t get any farther than creating methylation and coverage plots.
Going forward
Here are my priorities:
- Become a registered user on Mox
- Create a script to run
bismark
on the full samples - Find other
-score_min L,0,-1.2
data to play around with formethylKit
parameter testing - Figure out code for some multipanel plots
- Figure out tiling