WGBS Analysis Part 17

Identifying SNPs with BS-Snper

Now that I know how to use BS-Snper, Iā€™m going to identify merged and individual SNPs in the Manchester data! I set up this Jupyter notebook to start the process.

The first thing I needed to do was merge BAM files with samtools merge:

%%bash

samtools merge \
/Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged-sorted-deduplicated.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_5_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_6_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_7_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/zr3616_8_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam

Once I had the merged BAM, I looked at the header with samtools view:

Screen Shot 2021-04-07 at 9 18 28 AM

I used default parameters, except for --mincov 5, with the merged BAM to identify SNP variants in my dataset. Again, I saved all of my output to gannet since it was large!

#Defaults: --minhetfreq 0.1 --minhomfreq 0.85 --minquali 15 --maxcover 1000 --minread2 2 --errorate 0.02 --mapvalue 20
#Saving output to gannet
!perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \
--fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \
--input /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged-sorted-deduplicated.bam \
--output /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/SNP-candidates.txt \
--methcg /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CpG-meth-info.tab \
--methchg /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CHG-meth-info.tab \
--methchh /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/CHH-meth-info.tab \
--mincover 5 \
> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/SNP-results.vcf \
2> /Volumes/web/spartina/project-gigas-oa-meth/output/BS-Snper/merged.ERR.log

I used a loop to run BS-Snper on individual samples, making sure to set any variables necessary for the loop within the same cell. Before running the loop, I moved to this directory where the BAM files were located. After the loop finished running, I moved all the files this directory for BS-Snper output.

%%bash

FILES=$(ls *deduplicated.sorted.bam)
echo ${FILES}

for file in ${FILES}
do
    NAME=$(echo ${file} | awk -F "." '{print $1}')
    echo ${NAME}

    perl /Users/Shared/Apps/BS-Snper-master/BS-Snper.pl \
    --fa /Volumes/web/spartina/project-oyster-oa/data/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa \
    --input ${NAME}.deduplicated.sorted.bam \
    --output ${NAME}.SNP-candidates.txt \
    --methcg ${NAME}.CpG-meth-info.tab \
    --methchg ${NAME}.CHG-meth-info.tab \
    --methchh ${NAME}.CHH-meth-info.tab \
    --mincover 5 \
    > ${NAME}.SNP-results.vcf 2> ${NAME}.ERR.log

done

mv *SNP-candidates.txt *meth-info.tab *SNP-results.vcf *ERR.log ../BS-Snper/

Going forward

  1. Write methods
  2. Obtain relatedness matrix and SNPs with EpiDiverse/snp
  3. Revise methylKit study design: separate tests for indeterminate and female oysters
  4. Run R script on mox
  5. Write results
  6. Identify genomic location of DML
  7. Determine if RNA should be extracted
  8. Determine if larval DNA/RNA should be extracted
Written on April 6, 2021