Killifish Hypoxia RRBS Part 7

Extracting methylation information

The next BAT module is methylation calling, which is similar to the methylation extraction step in bismark. Methylation calls are extracted for all samples, then filtered based on user-set criteria.

Testing BAT_calling and BAT_filtering

The first things I did was run code interactively to see if I could get the code to work with one sample:

BAT_calling \
-d $GENOME \
-q /yaaminiv/killifish-hypoxia-RRBS/output/03-mapping/mapped/*.bam \
-o /scratch/04-calling/called
INFO]	Tue Apr 19, 11:25:46, 2022	BAT_calling v0.1 started
[INFO]	Tue Apr 19, 11:25:46, 2022	Checking flags
[INFO]	Tue Apr 19, 11:25:46, 2022	Convert BAM to gzipped SAM
[INFO]	Tue Apr 19, 11:36:7, 2022	Build query index
[INFO]	Tue Apr 19, 13:52:2, 2022	Start methylation calling (time intense step)

At some point after the calls started being extracted, I stopped the command from continuing and started filtering one sample. Based on conversations I had with Neel, I filtered out CG methylation that had 10-100 reads and a mininum methylation rate of 0.1.

BAT_filter_vcf \
--vcf /scratch/04-calling/called/190626_I114_FCH7TVNBBXY_L2_20-N4.vcf.gz \
--out /scratch/04-calling/filtered/190626_I114_FCH7TVNBBXY_L2_20-N4_CG.vcf.gz \
--context CG \
--MDP_min 10 --MDP_max 100 \
--MR_min 0.1
[INFO]	Tue Apr 19, 16:6:59, 2022	BAT_filter_vcf v0.1 started
[INFO]	Tue Apr 19, 16:6:59, 2022	Checking flags
[INFO]	Tue Apr 19, 16:6:59, 2022	Checking input file
[INFO]	Tue Apr 19, 16:6:59, 2022	Reading input /scratch/04-calling/called/190626_I114_FCH7TVNBBXY_L2_20-N4.vcf.gz
[INFO]	Tue Apr 19, 16:8:50, 2022	Plot statistics to /scratch/04-calling/filtered/190626_I114_FCH7TVNBBXY_L2_20-N4_CG.pdf

This step was very quick! The command produced a bedgraph of filtered loci, a pdf of filtering information, and a final filtered file.

Setting up a SLURM script

While my test code was running, I started crafting the SLURM script I intended to run. Since I wanted to refer to prefixes of my mapped files, I needed to use the following array:

#Populate FASTQ array with prefixes of files to process
FASTQ=(190626_I114_FCH7TVNBBXY_L2_20-N4 190626_I114_FCH7TVNBBXY_L2_20-S1 190626_I114_FCH7TVNBBXY_L2_20-S3 190626_I114_FCH7TVNBBXY_L2_20-S4 190626_I114_FCH7TVNBBXY_L2_5-N1 190626_I114_FCH7TVNBBXY_L2_5-N2 190626_I114_FCH7TVNBBXY_L2_5-S3 190626_I114_FCH7TVNBBXY_L2_5-S4 190626_I114_FCH7TVNBBXY_L2_OC-S1 190626_I114_FCH7TVNBBXY_L3_20-N2 190626_I114_FCH7TVNBBXY_L3_5-N3 190626_I114_FCH7TVNBBXY_L3_5-S2 190626_I114_FCH7TVNBBXY_L3_OC-N1 190626_I114_FCH7TVNBBXY_L3_OC-N2 190626_I114_FCH7TVNBBXY_L3_OC-N4 190626_I114_FCH7TVNBBXY_L3_OC-N5 190626_I114_FCH7TVNBBXY_L3_OC-S2 190626_I114_FCH7TVNBBXY_L3_OC-S3 190626_I114_FCH7TVNBBXY_L4_20-N1 190626_I114_FCH7TVNBBXY_L4_20-S2 190626_I114_FCH7TVNBBXY_L4_5-S1 190626_I114_FCH7TVNBBXY_L4_OC-S5)

Based on my previous experience, I knew I needed to create a SLURM script that called a bash script when creating the singularity container. I created this SLURM script and used it to call this BAT_calling script and this BAT_filtering script. I also created this environmental variable file to define directories for mapped, called, and filtered files. Once I drafted the for loops I wanted to use for each command, I tested them using the interactive node:

for f in "${FASTQ[@]}"
do
  BAT_calling \
  -d $GENOME \
  -q ${MAPPED}/${f}.bam \
  -o ${CALLED}
done

for f in "${FASTQ[@]}"
do
  BAT_filter_vcf \
  --vcf ${CALLED}/${f}.vcf.gz \
  --out ${FILTERED}/${f}_CG.vcf.gz \
  --context CG \
  --MDP_min 10 --MDP_max 100 \
  --MR_min 0.1
done

The BAT_filtering code ran with no issue, but my BAT_calling loop failed because I incorrectly specified the location of the mapped files in $MAPPED! I fixed that in the variable file, and the loop proceeded.

I stopped all commands and exited the interactive node, and removed all files generated by my code testing. I then submitted the job to the scheduler. Let’s see how much of this will run in 24 hours! I could ask for unlimited time again, but I kept having to restart the script because it timed out after 24 hours no matter what. Seems like a hassle to go through IS for now.

Going forward

  1. Review BAT_calling and BAT_filtering results
  2. Review next BAT module
  3. Run BAT_analysis
Written on April 19, 2022