WGBS Analysis Part 22
Running R scripts on mox
(for real this time)
So, clearly things aren’t going well. I tried running an R script on mox
, but landed in a seemingly endless loop of installing dependencies, running the script, having it fail, and trying to install yet another dependency. My theory was that the mechanism of loading R packages in a SLURM script is different than loading packages in an R module. Time to test it out.
Sanity check with the build node
Since I was able to load packages in the build node, I thought I would see if I could run part of my code interactively as a sanity check. First, I opened a build node for four hours:
srun -p build --time=4:00:00 --mem=10G --pty /bin/bash
module load r_3.6.0 #Load R module
R #Open R
Then, I loaded the devtools
, methylKit
, and dplyr
packges, confirmed packages were loaded, and set my working directory to folder with my bismark
output:
require(devtools)
require(methylKit)
require(dplyr)
sessionInfo() #Confirm packages are loaded
getwd() #Confirm I am in my home directory
setwd("/gscratch/scrubbed/yaaminiv/Manchester/analyses/methylKit") #Change directory to where bismark output is
I then started running code to confirm that methylKit
and dplyr
R commands would work as long as the packages were loaded. I was able to quickly read files into R with methRead
:
I then successfully ran the following code chunk to process bismark
alignments and normalize coverage between samples!
processedFilteredFilesCov5 <- methylKit::filterByCoverage(processedFiles,
lo.count = 5, lo.perc = NULL,
high.count = NULL, high.perc = 99.9) %>%
methylKit::normalizeCoverage(.)
At this point, I saved my R data and knew that as long as I could reference my packages correctly, I could run my code.
Calling an R Script in a SLURM script
Clearly calling an R module worked better than changing the shebang and running an R script directly on mox
! I wanted to try another method: calling an R script within a SLURM script. First, I needed to put my R code in a separate script. I copied and pasted my code and created this R script. Based on the hyak documentation, I needed to create a SLURM script to call the R script. For this SLURM script, I used a 10 day walltime and 100 G memory node. Hopefully I won’t need more than that! Within the script, I needed two lines of code:
module load r_3.6.0 #Load R version 3.6.0
Rscript > output.txt 2>&1 /gscratch/home/yaaminiv/06-methylKit.R #Specify my standard error file (output.txt) and R script location (/gscratch/home/yaaminiv/06-methylKit.R)
I then ran my SLURM script. It ended after 20 minutes (so…a bit longer than the 18 minute run I was used to previously!) due to more package struggles. Thankfully they were different package problems than before! I was unable to load devtools
or dplyr
because R could not find the correct versions of dependency packages. However, methylKit
loaded with no issues:
I finagled with how I loaded packages in my R script and decided to run require(devtools)
with no lib.loc
argument. When I would load packages in the build node, I never specified where packages were found and did not encounter any error. I also tried require(tidyverse, lib.loc = "/gscratch/srlab/rpackages")
to see if loading tidyverse
would bypass any issues I had loading dplyr
. I was able to load devtools
with no problems, but still ran into issues with dplyr
and tidyverse
!
Since devtools
loaded without specifying a library location, I figured I could do the same for dplyr
. The final configuration of loading R packages that worked went as follows:
require(devtools) #Load devtools
require(methylKit, lib.loc = "/gscratch/home/yaaminiv/R/x86_64-pc-linux-gnu-library/3.6/") #Load methylKit. I was able to load with no issues including library location, so I didn't change it
require(dplyr) #Load devtools
sessionInfo() #Confirm packages are loaded
At this point, my script truly ran…for 30 minutes! I didn’t properly reference my covariate matrix in my calculateDiffMeth
command, but once I did that the command ran without any issues (so far). Guess we’ll wait and see if I can indeed run calculateDiffMeth
with a covariate matrix and overdispersion correction on mox
!
Going forward
- Write methods
- Write results
- Update
mox
handbook with R information - Obtain relatedness matrix and SNPs with EpiDiverse/snp
- Identify genomic location of DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted