WGBS Analysis Part 11
trimgalore
output and fastqc
Last week, I started trimgalore
. My mox
script finished running, so I wanted to check the output before I started bismark
.
trimgalore
I checked the mox
directories to start transferring files onto gannet
. The trimming worked successfully, but there was no fastqc
output! This was weird because the script I used for these samples was the same as what I used for the Hawaii samples. Confused, I started this discussion with my scripts and slurm output to determine why I didn’t get any fastqc
output. Sam looked at the slurm output and saw that I had an error associated with my path:
>>> Now running FastQC on the validated data zr3616_8_R1_val_1.fq.gz<<<
Can't exec "fastqc": No such file or directory at /gscratch/srlab/programs/TrimGalore-0.6.6/trim_galore line 1525, <IN2> line 5536487816.
>>> Now running FastQC on the validated data zr3616_8_R2_val_2.fq.gz<<<
Can't exec "fastqc": No such file or directory at /gscratch/srlab/programs/TrimGalore-0.6.6/trim_galore line 1535, <IN2> line 5536487816.
Deleting both intermediate output files zr3616_8_R1_trimmed.fq.gz and zr3616_8_R2_trimmed.fq.gz
I’m not sure why fastqc
would disappear from my path after a few weeks. In any case, I used rsync
to transfer all the output to this gannet
folder, organized into various subfolders. Then, I followed Sam’s advice to run fastqc
separately to determine if it was truly a path issue.
[
fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
I created this script to run fastqc
on all my trimmed samples. In the script, I specified the fastqc
and multiqc
paths, then used the variables throught the script:
# Paths to programs
fastqc=/gscratch/srlab/programs/fastqc_v0.11.9/fastqc
multiqc=/gscratch/srlab/programs/anaconda3/bin/multiqc
To run fastqc
, I first specified files to analyze by including the absolute path to the directory. I changed the directory path for each trimming iteration:
# Populate array with FastQ files
fastq_array=(/gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore/*.fq.gz)
# Pass array contents to new variable
fastqc_list=$(echo "${fastq_array[*]}")
When running fastqc
, I also specified the outdir
so the output would be written to the same folder as the trimgalore
output.
# Run FastQC
# NOTE: Do NOT quote ${fastqc_list}
${fastqc} \
--threads ${threads} \
--outdir /gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore \
${fastqc_list}
Finally, I created new multiqc
reports:
#MultiQC
${multiqc} \
/gscratch/scrubbed/yaaminiv/Manchester/analyses/trimgalore/.
Unfortunately I didn’t include the -outdir
argument so the reports were written to the same directory as the slurm file. Next time! Once the script finished running, I moved all the fastqc
and multiqc
output files to gannet
, included the html reports in this output subdirectory, and my class repository. Tomorrow, I’ll review the output to make sure the trimming went well.
Going forward
- Update the repository README files
- Check trimming output
- Start
bismark
- Write methods
- Write results
- Identify DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted