Killifish Hypoxia RRBS Part 3
Mapping RRBS data
I still don’t know what’s going on with my potentially truncated data, but I decided to take the samples I already trimmed and test mapping code. I’m using the Bisulfite Analysis Toolkit, and their mapping procedure has two steps: aligning reads and getting the mapping statics.
Understanding singularity
Based on the BAT_mapping usage page and examples, I put together this code. When I ran the shell script, it failed almost immediately!
/cm/local/apps/slurm/var/spool/job2370991/slurm_script: line 25: BAT_mapping: command not found
This was weird, because I was opening a singularity
container that had BAT_mapping
in the previous command. A quick Google search showed me that I should use singularity exec
instead of singularity run
. This would require a command right after I loaded a container.
singularity exec /vortexfs1/home/naluru/bat_latest.sif \
BAT_mapping \
-g /vortexfs1/home/naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel.fa.gz \
-q /vortexfs1/scratch/yaamini.venkataraman/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_1_val_1.fq.gz \
-p /vortexfs1/scratch/yaamini.venkataraman/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_2_val_2.fq.gz \
-i /vortexfs1/home/naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel \
-o /vortexfs1/scratch/yaamini.venkataraman/03-mapping/190626_I114_FCH7TVNBBXY_L2_20-N4_nondirectional \
-t 16 \
-F 2
Using singularity exec
solved my issue of having BAT_mapping
available, but then I ran into a new issue:
mkdir /vortexfs1/scratch: Read-only file system at /usr/local/bin/BAT_mapping line 106.
I shot off a quick email to Neel with the error to see if I knew what was happening. Neel suggested there was an issue with the output folder, and in running the code interactively I found the problem fixed when I used a folder in my home directory for the output prefix instead of the scratch directory. But of course, there was another issue:
##### AN ERROR has occurred: required option -g missing or nonexistent
The program couldn’t find my genome file! I knew it existed, but when I tried to find it within the singularity
container I couldn’t. Another quick search lead me to this page. Essentially, I access a singularity
container by “swapping” file systems with my host operating system, so I can’t access anything in my host system unless I bind it to my container. I was able to bind Neel’s home directory (with the genome files) and my scratch
directory (where all my files are located and where I want to put the output) to my container:
singularity run --bind /vortexfs1/home/naluru/:/naluru,/vortexfs1/scratch/yaamini.venkataraman:/scratch /vortexfs1/home/naluru/bat_latest.sif
Once I loaded the container, I ran the following code to test mapping parameters on one sample:
BAT_mapping \
-g /naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel.fa.gz \
-q /scratch/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_1_val_1.fq.gz \
-p /scratch/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_2_val_2.fq.gz \
-i /naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel \
-o /scratch/03-mapping/190626_I114_FCH7TVNBBXY_L2_20-N4_nondirectional \
-t 16 \
-F 2
I wanted to test one sample so I could build the genome indices and not have to go through that time-intensive step when I go through this process for all samples.
Testing mapping parameters
Once I built the genome indices, I returned to my SBATCH script to modify paths based on where I’m mounting directories. I also had to figure out one more singularity
issue: how to open a module within my script and run the commands. I settled on a loop with singularity exec
:
#Assuming non-directional (- F 2)
for f in $FASTQ
do
singularity exec --bind /vortexfs1/home/naluru/:/naluru,/vortexfs1/scratch/yaamini.venkataraman:/scratch /vortexfs1/home/naluru/bat_latest.sif \
BAT_mapping \
-g $GENOME \
-q ${f}_1_val_1.fq.gz \
-p ${f}_2_val_2.fq.gz \
-i $INDICES \
-o ${MAPPED}/${f} \
-t 16 \
-F 2
done
This should work when I’m ready to process all samples in a list $FASTQ
. I can use the same technique when running samples individually:
#Test sample 1
singularity exec --bind /vortexfs1/home/naluru/:/naluru,/vortexfs1/scratch/yaamini.venkataraman:/scratch /vortexfs1/home/naluru/bat_latest.sif \
BAT_mapping \
-g /naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel.fa.gz \
-q /scratch/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_1_val_1.fq.gz \
-p /scratch/02-trimgalore/190626_I114_FCH7TVNBBXY_L2_20-N4_2_val_2.fq.gz \
-i /naluru/Killifish/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.dna.toplevel \
-o /scratch/03-mapping/190626_I114_FCH7TVNBBXY_L2_20-N4_nondirectional \
-t 16 \
-F 2
I started running the script with 2 test samples for directional and non-directional mapping. Fingers crossed this gives good information about mapping parameters and if the libraries were directional or not!
Going forward
- Get mapping statistics for test samples
- Figure out what’s happening with sample 190626_I114_FCH7TVNBBXY_L4_OC-N3_1.fq.gz
- Trim sample 190626_I114_FCH7TVNBBXY_L4_OC-N3_1.fq.gz appropriately
- Start alignment with all samples