Gonad Methylation Analysis Part 12
Reproducibility rodeo
My goal was to run my samples through Steven’s methylKit R code. The first thing I did was copy his code into this R script. I then added more descriptive names for the objects, and commented out each line of code so I understood what was happening.
I was able to run processBismarkAln
on my deduplicated.sorted.bam files! However, I could not unite
these files. That’s when I posted a Github issue. After some back-and-forth, Steven suggested I download some of his dedup.sorted.bam and run those files through the R script. By using his files, I could figure out if the unite
issue stemmed from my files, or from software differences on the Mac mini.
I created a new R script and downloaded dedup.sorted.bam files 1 and 10 (i.e. one from each treatment). I was able to successfully unite
these files and go through the entire script! While I couldn’t identify any DMRs between the two files, I did come up with two reasons why Steven’s files worked and mine did not:
- Steven’s files are trimmed, but mine are not
- Steven’s files used the first 1,000,000 as a subset, but mine used the first 10,000
Either way, Steven said I should move on to the full pipeline and validate everything. Sam and Steven both pointed me to this directory, which has the trimmed files. He also said that I should remove the -score_min
argument. This will make bismark
run the default setting for alignment mismatches, and allow us to compare the alignment results. Time to start a computer process that will take several days to run!