Killifish Hypoxia RRBS

Starting a new project!

I’m working with Neel to analyze RRBS and RNA-Seq data 6 month-old killifish liver tissue from 2018. These fish are from two populations: contaminant-susceptible and contaminant-resistant. The hypothesis is that hypoxia tolerance may be linked to contaminant resistance, with the resilient population being unable to respond to a secondary stressor. Neel finished RNA-Seq analysis, but since there’s a newer genome assembly now’s a great time to reanalyze both datasets.

Setting up

Before I could begin, I needed to install programs on WHOI’s HPC, Poseidon. When I logged in, I tried navigating to my scratch directory, but found it didn’t exist and I was unable to create one for myself!

After checking modules with module avail, I decided to install fastqc, cutadapt, multiqc, and trimgalore. To do this, I figured it would be easiest to install miniconda in my home directory, then install the remaining packages. I downloaded the Python 3.9 Linux 64-bit installer locally, then used rsync to move the file to Poseidon. I followed these instructions to install the program, then used conda activate to initialize the installation.

I then configured my environment by following these instructions from bioconda:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

I followed these instructions for cutadapt installation:

conda create -n cutadaptenv cutadapt #Create a new environment to install cutadapt
conda activate cutadaptenv #Need to be done every time I open a new shell before I use cut adapt
cutadapt --version #Check the installation completed

Used these instructions for multiqc installation:

conda install -c bioconda -c conda-forge multiqc

After I did this, I realized I could have just done the same thing with cutadapt so I didn’t need a new module environment every time I wanted to use the program. I used the same method for fastqc installation instead of downloading the zip file from the FastQC website and unzipping it for use:

conda install -c bioconda -c conda-forge cutadapt #Wanted to not have a separate environment

conda install -c bioconda -c conda-forge fastqc #Worked without the zip file!!

Finally, I installed trimgalore based on these Github instructions:

curl -fsSL -o trim_galore.tar.gz
tar xvzf trim_galore.tar.gz

Creating the script

I modified my script based on my previous one from my Manchester repository. I reviewed the script and ensured that I had the correct output directory for my work, the directory with the raw data, program paths, and python module. I saved my script here in my new project repository.

When I ran the script, I got the following error:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

The account error made me think that it may be related to my lack of a scratch directory, so I created a ticket with WHOI IS for these issues.

Running fastqc and multiqc

Turns out having a Poseidon account isn’t enough to run scripts: IS had to activate account permissions for me so I could submit scripts to SLURM. I encountered a few problems with the script when I realized I removed the fastqc and multiqc program paths. Once I added them back in, I was able to run fastqc on all samples. However, the script failed before creating checksums! This meant that multiqc didn’t run either. I manually created checksums for the fastqc output and ran multiqc on the samples.

Now came the really annoying part…transferring files off of Poseidon. I tried using rsync from my computer to move files off of Poseidon, but I kept getting the error that zsh did not recognize yaamini.venkataraman@poseidon:/path/. I moved all my files to my home directory, then mounted it on my computer, and used rsync to transfer files that way. All output can be found in this subdirectory.

Going forward

  1. Trim adapters with TrimGalore! and re-run FastQC
  2. Start alignment with Bisulfite Analysis Toolkit
Written on February 8, 2022