Killifish Hypoxia RRBS Part 7
Extracting methylation information
The next BAT module is methylation calling, which is similar to the methylation extraction step in bismark
. Methylation calls are extracted for all samples, then filtered based on user-set criteria.
Testing BAT_calling
and BAT_filtering
The first things I did was run code interactively to see if I could get the code to work with one sample:
BAT_calling \
-d $GENOME \
-q /yaaminiv/killifish-hypoxia-RRBS/output/03-mapping/mapped/*.bam \
-o /scratch/04-calling/called
INFO] Tue Apr 19, 11:25:46, 2022 BAT_calling v0.1 started
[INFO] Tue Apr 19, 11:25:46, 2022 Checking flags
[INFO] Tue Apr 19, 11:25:46, 2022 Convert BAM to gzipped SAM
[INFO] Tue Apr 19, 11:36:7, 2022 Build query index
[INFO] Tue Apr 19, 13:52:2, 2022 Start methylation calling (time intense step)
At some point after the calls started being extracted, I stopped the command from continuing and started filtering one sample. Based on conversations I had with Neel, I filtered out CG methylation that had 10-100 reads and a mininum methylation rate of 0.1.
BAT_filter_vcf \
--vcf /scratch/04-calling/called/190626_I114_FCH7TVNBBXY_L2_20-N4.vcf.gz \
--out /scratch/04-calling/filtered/190626_I114_FCH7TVNBBXY_L2_20-N4_CG.vcf.gz \
--context CG \
--MDP_min 10 --MDP_max 100 \
--MR_min 0.1
[INFO] Tue Apr 19, 16:6:59, 2022 BAT_filter_vcf v0.1 started
[INFO] Tue Apr 19, 16:6:59, 2022 Checking flags
[INFO] Tue Apr 19, 16:6:59, 2022 Checking input file
[INFO] Tue Apr 19, 16:6:59, 2022 Reading input /scratch/04-calling/called/190626_I114_FCH7TVNBBXY_L2_20-N4.vcf.gz
[INFO] Tue Apr 19, 16:8:50, 2022 Plot statistics to /scratch/04-calling/filtered/190626_I114_FCH7TVNBBXY_L2_20-N4_CG.pdf
This step was very quick! The command produced a bedgraph of filtered loci, a pdf of filtering information, and a final filtered file.
Setting up a SLURM script
While my test code was running, I started crafting the SLURM script I intended to run. Since I wanted to refer to prefixes of my mapped files, I needed to use the following array:
#Populate FASTQ array with prefixes of files to process
FASTQ=(190626_I114_FCH7TVNBBXY_L2_20-N4 190626_I114_FCH7TVNBBXY_L2_20-S1 190626_I114_FCH7TVNBBXY_L2_20-S3 190626_I114_FCH7TVNBBXY_L2_20-S4 190626_I114_FCH7TVNBBXY_L2_5-N1 190626_I114_FCH7TVNBBXY_L2_5-N2 190626_I114_FCH7TVNBBXY_L2_5-S3 190626_I114_FCH7TVNBBXY_L2_5-S4 190626_I114_FCH7TVNBBXY_L2_OC-S1 190626_I114_FCH7TVNBBXY_L3_20-N2 190626_I114_FCH7TVNBBXY_L3_5-N3 190626_I114_FCH7TVNBBXY_L3_5-S2 190626_I114_FCH7TVNBBXY_L3_OC-N1 190626_I114_FCH7TVNBBXY_L3_OC-N2 190626_I114_FCH7TVNBBXY_L3_OC-N4 190626_I114_FCH7TVNBBXY_L3_OC-N5 190626_I114_FCH7TVNBBXY_L3_OC-S2 190626_I114_FCH7TVNBBXY_L3_OC-S3 190626_I114_FCH7TVNBBXY_L4_20-N1 190626_I114_FCH7TVNBBXY_L4_20-S2 190626_I114_FCH7TVNBBXY_L4_5-S1 190626_I114_FCH7TVNBBXY_L4_OC-S5)
Based on my previous experience, I knew I needed to create a SLURM script that called a bash
script when creating the singularity
container. I created this SLURM script and used it to call this BAT_calling
script and this BAT_filtering
script. I also created this environmental variable file to define directories for mapped, called, and filtered files. Once I drafted the for loops I wanted to use for each command, I tested them using the interactive node:
for f in "${FASTQ[@]}"
do
BAT_calling \
-d $GENOME \
-q ${MAPPED}/${f}.bam \
-o ${CALLED}
done
for f in "${FASTQ[@]}"
do
BAT_filter_vcf \
--vcf ${CALLED}/${f}.vcf.gz \
--out ${FILTERED}/${f}_CG.vcf.gz \
--context CG \
--MDP_min 10 --MDP_max 100 \
--MR_min 0.1
done
The BAT_filtering
code ran with no issue, but my BAT_calling
loop failed because I incorrectly specified the location of the mapped files in $MAPPED
! I fixed that in the variable file, and the loop proceeded.
I stopped all commands and exited the interactive node, and removed all files generated by my code testing. I then submitted the job to the scheduler. Let’s see how much of this will run in 24 hours! I could ask for unlimited time again, but I kept having to restart the script because it timed out after 24 hours no matter what. Seems like a hassle to go through IS for now.
Going forward
- Review
BAT_calling
andBAT_filtering
results - Review next BAT module
- Run
BAT_analysis