Gonad Methylation Analysis Part 13

The whole enchilada (but better this time)

Now that I’ve tested the entire bismark to methylKit pipeline on a 10,000 read subset for each C. virginica sequencing sample, I can begin validating the full pipeline. Steven already ran the full samples on this pipeline in this notebook. My job is to reproduce his workflow.

I opened my full pipeline Jupyter notebook. Since I’m using trimmed files from this directory, the first thing I wanted to do was test my find and xargs commands to ensure I isolated the right files.

screen shot 2018-05-09 at 1 45 41 pm

Figure 1. Testing find and xargs on a new set of filenames.

Then, I piped the find and xargs output into bismark. Unlike Steven, I did not set a parameter for -score_min. In our conversation today, we decided that I should run the alignment with the default parameter so we can compare our results. The default is L,0,-0.2 (instead of L,0,-1.2 used by Steven).

screen shot 2018-05-09 at 1 48 28 pm

Figure 2. Description of each line of code, along with the actual command.

It took me a bit of finnagling to get that code to work because of some typos! I’m going to jot down a few reminders for myself:

  • Always check to see if each line has a “"
  • Use tab complete to ensure lack of typos and the correct path to files
  • Verify you’re using the correct comand
  • Test small bits of code before putting everything together

I’ll check on the alignment progress tomorrow, but my guess is that this will take 2-3 days to complete. Once the alignment is finished, I will deduplicate, sort, and index the resulting .bam files.

Written on May 9, 2018