Killifish Hypoxia RRBS
Starting a new project!
I’m working with Neel to analyze RRBS and RNA-Seq data 6 month-old killifish liver tissue from 2018. These fish are from two populations: contaminant-susceptible and contaminant-resistant. The hypothesis is that hypoxia tolerance may be linked to contaminant resistance, with the resilient population being unable to respond to a secondary stressor. Neel finished RNA-Seq analysis, but since there’s a newer genome assembly now’s a great time to reanalyze both datasets.
Setting up
Before I could begin, I needed to install programs on WHOI’s HPC, Poseidon. When I logged in, I tried navigating to my scratch directory, but found it didn’t exist and I was unable to create one for myself!
After checking modules with module avail
, I decided to install fastqc
, cutadapt
, multiqc
, and trimgalore
. To do this, I figured it would be easiest to install miniconda
in my home directory, then install the remaining packages. I downloaded the Python 3.9 Linux 64-bit installer locally, then used rsync
to move the file to Poseidon. I followed these instructions to install the program, then used conda activate
to initialize the installation.
I then configured my environment by following these instructions from bioconda
:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
I followed these instructions for cutadapt
installation:
conda create -n cutadaptenv cutadapt #Create a new environment to install cutadapt
conda activate cutadaptenv #Need to be done every time I open a new shell before I use cut adapt
cutadapt --version #Check the installation completed
Used these instructions for multiqc
installation:
conda install -c bioconda -c conda-forge multiqc
After I did this, I realized I could have just done the same thing with cutadapt
so I didn’t need a new module environment every time I wanted to use the program. I used the same method for fastqc
installation instead of downloading the zip file from the FastQC website and unzipping it for use:
conda install -c bioconda -c conda-forge cutadapt #Wanted to not have a separate environment
conda install -c bioconda -c conda-forge fastqc #Worked without the zip file!!
Finally, I installed trimgalore
based on these Github instructions:
curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/0.6.6.tar.gz -o trim_galore.tar.gz
tar xvzf trim_galore.tar.gz
Creating the script
I modified my script based on my previous one from my Manchester repository. I reviewed the script and ensured that I had the correct output directory for my work, the directory with the raw data, program paths, and python module. I saved my script here in my new project repository.
When I ran the script, I got the following error:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
The account error made me think that it may be related to my lack of a scratch directory, so I created a ticket with WHOI IS for these issues.
Running fastqc
and multiqc
Turns out having a Poseidon account isn’t enough to run scripts: IS had to activate account permissions for me so I could submit scripts to SLURM. I encountered a few problems with the script when I realized I removed the fastqc
and multiqc
program paths. Once I added them back in, I was able to run fastqc
on all samples. However, the script failed before creating checksums! This meant that multiqc
didn’t run either. I manually created checksums for the fastqc
output and ran multiqc
on the samples.
Now came the really annoying part…transferring files off of Poseidon. I tried using rsync
from my computer to move files off of Poseidon, but I kept getting the error that zsh
did not recognize yaamini.venkataraman@poseidon:/path/
. I moved all my files to my home directory, then mounted it on my computer, and used rsync
to transfer files that way. All output can be found in this subdirectory.
Going forward
- Trim adapters with TrimGalore! and re-run FastQC
- Start alignment with Bisulfite Analysis Toolkit