WGBS Analysis Part 21
Running R scripts on mox
Alright, I have R installed, which is maybe a moot point but I couldn’t get methylKit
installed. Let’s see if I can actually run an R SLURM script today.
Installing packages (round 2)
To install methylKit
, I decided to use an older version of R. I first loaded the module:
module load r_3.6.0 #Load R version 3.6.0
R #Start running R
Once I had the older R version, I was able to run install devtools
!:
install.packages("devtools", lib = "/gscratch/srlab/rpackages") #Install devtools to the specified folder
require(devtools) #Load devtools
My next step was installing Bioconductor. I followed the installation instructions from the Bioconductor website:
install.packages("BiocManager", lib = "/gscratch/srlab/rpackages") #Install BiocManager to the specified folder
BiocManager::install(version = "3.10") #Install the correct version of BiocManager for the R version used
Turns out there are specific BiocManager
versions for each R version! I used this Bioconductor release guide to determine which BiocManager
version I needed to install. Since I was using R.3.6.0, I could use BiocManager
versions 3.9 or 3.10. I figured I’d use 3.10.
Finally, I installed methylKit
:
BiocManager::install("methylKit") #Install methylKit
The package started installing! However, I got a warning that I was using too much of the CPU. That’s when I realized I wasn’t on a build node! I stopped the package installation, quit R, and interrupted my mox
session. I then started a build node:
srun -p build --time=4:00:00 --mem=10G --pty /bin/bash #Request a build node for four hours
I loaded the R module again, then installed methylKit
:
require(BiocManager) #Load package
BiocManager::install("methylKit") #Install methylKit
require(methylKit) #Load package
It worked! The last package I needed (and almost forgot about) was dplyr
. I ran require(dplyr)
just to see what happened:
The package was already installed! I closed the Terminal window, logged in and requested another build node, and ran require(methylKit)
to ensure I wouldn’t have to install the package again in my SLURM script:
Since that worked too, I tried running sessionInfo()
. Hopefully this information would be saved into my slurm-out file.
I exited R and my build node to finish up my preparation.
File paths on mox
When working in R Studio, it’s a lot easier for me to save files to various places, or source the data from a different folder since I can set the working directory in a chunk. For the purpose of the R SLURM script, I think it’s easier to have all the data and output files in the same folder. I created a /gscratch/scrubbed/yaaminiv/Manchester/analyses/methylKit
folder to house all relevant files. Then, I navigated to that folder and copied the merged CpG coverage files from gannet
to mox
:
rsync --archive --progress --verbose yaamini@172.25.149.226:/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/*merged_CpG_evidence.cov .
The next thing I wanted to do was create a subdirectory structure that mirrored where I saved output files in this R Markdown script. I usually do this within the script itself since I can switch between bash and R, but I will not be able to do that in a SLURM script. I created:
/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/general-stats
for individual-sample and comparative analysis plots/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/DML
for DML lists/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/rand-test
for randomization test output
Running the R SLURM script
All that’s left to do was create the SLURM script! I copied my R Markdown script into this SLURM script. Then, I ran the script. When I checked the queue (squeue | grep "srlab"
), I found that my script wasn’t running! When I looked at the SLURM information at the top of the script, I saw SBATCH --mem=500G
. I changed it to SBATCH --mem=100G
, and ran the script again. Unfortunately, it timed out immediately!
When I looked at the slurm.out file, I saw the following error:
I then posted in this discussion to see where I should specify --save
, --no-save
, or --vanilla
. Sam responded and said my shebang should be #!/gscratch/srlab/programs/R-3.6.2/bin/Rscript
, and not #!/gscratch/srlab/programs/R-3.6.2/bin/R
! I changed the shebang and ran the script again.
Obviously, my script timed out again. Looking through the slurm.out, I confirmed a few things. One, any head()
command does print to the slurm.out. Second, I got an error that dplyr
was not available when I ran require(dplyr)
. Additionally, there were some packages attached to methylKit
that didn’t load. I opened another build node to install dplyr
:
install.packages("dplyr", lib = "/gscratch/srlab/rpackages") #Install dplyr
require(BiocManager) #Load BiocManager
install_github("al2na/methylKit", build_vignettes = FALSE, repos = BiocManager::repositories(), dependencies = TRUE) #Install more methylKit options
require(methylKit) #Check that all associated packages load
I then modified the script to load several packages at the top:
# Load packages
require(devtools)
require(BiocManager)
require(methylKit)
require(dplyr)
sessionInfo()
Once I ran this revised script, I ran into the same error! Based on the error messages, I think R was unable to find my specified packages.
I know I installed these packages, so I think they’re not being installed from their actual location. BiocManager
, devtools
, and dplyr
are in the /gscratch/srlab/rpackages/
directory:
methylKit
is installed in /gscratch/home/yaaminiv/R/x86_64-pc-linux-gnu-library/3.6/
:
I posted this discussion to see if there was a way to reference library locations in require()
. Why I posted this discussion before actually Googling I don’t know, but Sam and I arrived at the same conclusion: include lib.loc
in require to specify the library location. This is important especially because I have packages installed in two separate locations! I modified my script and ran it again and encountered a new error:
Interestingly, when I loaded packages in the SLURM script, R was unable to find dependencies, even when they were installed (like usethis
). I confirmed that these errors were precluding me from loading packages by running sessionInfo
:
This began a series of installing packages, running my R script, and finding out I needed to explicitly install another dependency:
…so this is where I quit for now.
Going forward
- Try different methods to run R script on
mox
- Write methods
- Obtain relatedness matrix and SNPs with EpiDiverse/snp
- Write results
- Identify genomic location of DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted