DML Analysis Part 17

methylKit and bedtools with Mox samples

I took my samples off of Mox, identified DML and DMR, and characterized their location! :tada:

bismark

I finished my bismark pipeline on Mox! It did, however, take me a bit of tweaking. I ended up needing a few different scripts because I kept making mistakes in my code:

  1. I used this script for my alignment. I could not complete deduplication because I did not include --samtools_path.
  2. I used this script for deduplication. I was able to get bismark reports as well, but again, I couldn’t complete the methylation extraction because I did not include --samtools_path.
  3. I used this script for methylation extraction and report creation. Everything worked!

I moved all of my scripts to this folder for defunct scripts, and created a master script with all revisions that can be found here.

To move my files off of Mox, I initially thought I should create a checksum file and then use rsync. When I tried creating a checksum file in the login node, I got an error message saying I was overloading the CPU. I posted this issue, and learned that I could either create checksums from the interactive node, or just rsync and create the checksum file later since rsync already verifies checksums as it is transferring the files. I went with the second option. All of the files from Mox are now on this gannet folder. I created the checksums with shasum.

methylKit

This part was easy since I already had an R Markdown file ready. I changed the path for the filenames, and cranked away. Output from DML identification is here, and output from DMR identification can be found here. Nothing changed between this run and my previous methylKit runs using files generated on genefish.

bedtools

I characterized the location of DML and DMR in this Jupyter notebook. Based on this issue, I added sections to find overlaps between DML, DMR, and transposable elements. There are two different transposable element files. According to Sam, “C_virginica-3.0_TE-all.gff used all species that exist in the database and C_virginica-3.0_TE-Cg.gff only used Crassostrea virginica database” to identify transposable elements in the C. virginica genome. C_virginica-3.0_TE-all.gff had more transposable elements, so I got different results for each file when I used intersectBed. I also looked at overlaps between transposable elements and either exons, introns, or mRNA coding regions. I generated a lot of data! All of the files with overlap locations can be found here.

Going forward

Steven said to focus on DMR characterization since he found those more interesting in the Olympia oyster data he’s working with, so I’ll develop all of the code for analyses for DMR first. I can then move on to DML if I want.

  1. Conduct flanking analysis for DMR
  2. Start gene enrichment for DMR
  3. Update methods and results in my draft manuscript
Written on November 7, 2018