Hawaii Gigas Methylation Analysis Part 11

Merging BAM files

The plan

Based on Pubathon conversations, my plan is to call SNPs two different ways. First, I’ll call SNPs for each individual sample. If any sample has a SNP, that will certainly affect DML calls. Katherine also suggested I merge samples and call SNPs. Loci that are SNPs across samples but not present in an individual sample can still change the meaning of a DML (that DML could be more associated with genetic differences than treatment). Both Mac and Katherine implied that default parameters would be more than enough to identify C/T SNPs for my goals, so I won’t mess around with them.

Headaches with samtools installation

Which returns me to the original problem I had trying to merge BAM files: library versions. Sam suggested I update the htslib library based on these instructions, and I was able to do that successfully! But after updating this library, I still had the same samtools installation error!

config.status: creating config.mk
dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib
  Referenced from: /usr/local/bin/gawk
  Reason: image not found
./config.status: line 879: 83237 Done(141)               eval sed \"\$ac_sed_extra\" "$ac_file_inputs"
     83238 Abort trap: 6           | $AWK -f "$ac_tmp/subs.awk" > $ac_tmp/out
config.status: error: could not create config.mk

Since the installation was referencing a library from gawk, Sam suggested I update gawk. I ran brew update gawk and got an error that suggested I don’t have gawk installed on this machine:

Screen Shot 2021-03-30 at 11 47 14 AM

I decided to reinstall Homebrew just to be safe, and then ran brew install gawk. Once I had everything installed, I tried once more with my samtools installation and STILL got the same error!

Screen Shot 2021-03-30 at 1 32 40 PM

At this point, I started digging around for specific solutions for Homebrew potentially interfering with samtools installations. I found this issue that suggested running brew install xz prior to configuration. I still encountered the same error! I realized installing xz was specific to the error message the user got after trying to configure their program after reading this Stack Overflow question linked in the issue. Based on one of the suggestions, I ran brew search libmpfr (the library that wasn’t loaded in any error above): Error: No formulae or casks found for "libmpfr".

Then, I ran brew search mpfr:

Warning: mpfr 4.1.0 is already installed and up-to-date.
To reinstall 4.1.0, run:
  brew reinstall mpfr

I reinstalled mpfr with brew reinstall mpfr and see if that resolves the issue (and obviously it did not). Since mpfr is part of Homebrew, I realized I should just try installing samtools with brew install samtools! AND THAT FINALLY WORKED!

Screen Shot 2021-03-30 at 2 00 09 PM

If anything, graduate school has taught me how to conduct very specific Google searches until you find someone else who solves your problem for you. Now that I had samtools installed, I could get to work.

Merging BAM files

Before I could identify SNPs across all samples, I needed to merge my BAM files! Based on the samtools merge documentation, I set up the following code in my Jupyter notebook:

%%bash

samtools merge \
/Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/merged-sorted-deduplicated.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_1_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_2_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_3_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_4_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_5_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_6_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_7_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_8_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_9_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_10_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_11_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_12_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_13_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_14_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_15_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_16_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_17_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_18_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_19_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_20_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_21_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_22_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_23_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
/Volumes/web/spartina/project-oyster-oa/Haws/bismark-2/zr3644_24_R1_val_1_val_1_val_1_bismark_bt2_pe.deduplicated.sorted.bam

I decided to save the output file (merged-sorted-deduplicated.bam) to gannet because there was no way that file would be small enough for Github. I contemplated adding the -r to add an @RG tag for each alignment, but I thought I’d run the code with no options to see what happens first.

After a few hours (that’s what I get for not including a -threads specification), I had a merged file (found here). I looked at the output and saw there was no @RG tag (which makes sense because I didn’t specify where it should pull that information from). My next step is to run that file through BS-Snper to see if 1) it works and 2) the output is usable.

Going forward

  1. Try BS-Snper and EpiDiverse/snp for SNP extraction from WGBS data
  2. Run methylKit script on mox
  3. Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
  4. Test-run DSS and ramwas
  5. Transfer scripts used to a nextflow workflow
  6. Update methods
  7. Update results
Written on March 30, 2021