DML Analysis Part 11
But first, I quick side story about
I wanted to
bismark on my full samples in this Jupyter notebook. However, when I tried the methylation extractor step, I ran into this issue. Turns out the hummingbird hard drive was maxed out. I spent the past few days moving large files to this gannet directory. I then removed the files from my local repository (and hummingbird). There are two takeaways from this process:
- All large files should live on an external server. No point keeping them in a local repository if they cannot be synced with Github
- I should run
Before starting in on Mox and armed with more storage, I thought I could play around with
methylKit on hummingbird. I finished the
bismark pipeline in this notebook.
There are two things I need to test in
Different coverage metrics: Previously, I used a minimum coverage of 1. That is not good! Steven said to try 3x coverage, and after my PCSGA talk, Mac suggested I try 5x coverage. I’ll try 1x, 3x, and 5x coverage with the new
-score_minparameter and see how it affects clustering.
Tiling window analysis: The
methylKit user guidehas an in-built tiling window analysis that can pick out differentially methylated regions. So far, I’ve picked differentially methylated loci. Having DMRs will improve our exploratory analysis.
Different coverage metrics
I started in on testing different coverage in this R Markdown file. Plots depicting Percent CpG methylation and coverage for each file can be found in this folder. I think it could be useful to create a multipanel plot to display this in the future.
Because the subsets do not have enough data, I couldn’t test 5x coverage. I also couldn’t get any farther than creating methylation and coverage plots.
Here are my priorities:
- Become a registered user on Mox
- Create a script to run
bismarkon the full samples
- Find other
-score_min L,0,-1.2data to play around with for
- Figure out code for some multipanel plots
- Figure out tiling