Gonad Methylation Analysis Part 12

Reproducibility rodeo

My goal was to run my samples through Steven’s methylKit R code. The first thing I did was copy his code into this R script. I then added more descriptive names for the objects, and commented out each line of code so I understood what was happening.

I was able to run processBismarkAln on my deduplicated.sorted.bam files! However, I could not unite these files. That’s when I posted a Github issue. After some back-and-forth, Steven suggested I download some of his dedup.sorted.bam and run those files through the R script. By using his files, I could figure out if the unite issue stemmed from my files, or from software differences on the Mac mini.

I created a new R script and downloaded dedup.sorted.bam files 1 and 10 (i.e. one from each treatment). I was able to successfully unite these files and go through the entire script! While I couldn’t identify any DMRs between the two files, I did come up with two reasons why Steven’s files worked and mine did not:

  1. Steven’s files are trimmed, but mine are not
  2. Steven’s files used the first 1,000,000 as a subset, but mine used the first 10,000

Either way, Steven said I should move on to the full pipeline and validate everything. Sam and Steven both pointed me to this directory, which has the trimmed files. He also said that I should remove the -score_min argument. This will make bismark run the default setting for alignment mismatches, and allow us to compare the alignment results. Time to start a computer process that will take several days to run!

Written on May 9, 2018