WGBS Analysis Part 17
Identifying SNPs with BS-Snper
Now that I know how to use BS-Snper
, Iām going to identify merged and individual SNPs in the Manchester data! I set up this Jupyter notebook to start the process.
The first thing I needed to do was merge BAM files with samtools merge
:
%%bash
samtools merge \
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged-sorted-deduplicated.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_5_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_6_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_7_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_8_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
Once I had the merged BAM, I looked at the header with samtools view
:
I used default parameters, except for --mincov 5
, with the merged BAM to identify SNP variants in my dataset. Again, I saved all of my output to gannet
since it was large!
#Defaults: --minhetfreq 0.1 --minhomfreq 0.85 --minquali 15 --maxcover 1000 --minread2 2 --errorate 0.02 --mapvalue 20
#Saving output to gannet
!perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \
--fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \
--input /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged-sorted-deduplicated.bam \
--output /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/SNP-candidates.txt \
--methcg /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CpG-meth-info.tab \
--methchg /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CHG-meth-info.tab \
--methchh /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CHH-meth-info.tab \
--mincover 5 \
> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/SNP-results.vcf \
2> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged.ERR.log
I used a loop to run BS-Snper
on individual samples, making sure to set any variables necessary for the loop within the same cell. Before running the loop, I moved to this directory where the BAM files were located. After the loop finished running, I moved all the files this directory for BS-Snper
output.
%%bash
FILES=$(ls *deduplicated.sorted.bam)
echo ${FILES}
for file in ${FILES}
do
NAME=$(echo ${file} | awk -F "." '{print $1}')
echo ${NAME}
perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \
--fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \
--input ${NAME}.deduplicated.sorted.bam \
--output ${NAME}.SNP-candidates.txt \
--methcg ${NAME}.CpG-meth-info.tab \
--methchg ${NAME}.CHG-meth-info.tab \
--methchh ${NAME}.CHH-meth-info.tab \
--mincover 5 \
> ${NAME}.SNP-results.vcf 2> ${NAME}.ERR.log
done
mv *SNP-candidates.txt *meth-info.tab *SNP-results.vcf *ERR.log ../BS-Snper/
Going forward
- Write methods
- Obtain relatedness matrix and SNPs with EpiDiverse/snp
- Revise
methylKit
study design: separate tests for indeterminate and female oysters - Run R script on
mox
- Write results
- Identify genomic location of DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted