WGBS Analysis Part 21
Running R scripts on
Alright, I have R installed, which is maybe a moot point but I couldn’t get
methylKit installed. Let’s see if I can actually run an R SLURM script today.
Installing packages (round 2)
methylKit, I decided to use an older version of R. I first loaded the module:
module load r_3.6.0 #Load R version 3.6.0 R #Start running R
Once I had the older R version, I was able to run install
install.packages("devtools", lib = "/gscratch/srlab/rpackages") #Install devtools to the specified folder require(devtools) #Load devtools
My next step was installing Bioconductor. I followed the installation instructions from the Bioconductor website:
install.packages("BiocManager", lib = "/gscratch/srlab/rpackages") #Install BiocManager to the specified folder BiocManager::install(version = "3.10") #Install the correct version of BiocManager for the R version used
Turns out there are specific
BiocManager versions for each R version! I used this Bioconductor release guide to determine which
BiocManager version I needed to install. Since I was using R.3.6.0, I could use
BiocManager versions 3.9 or 3.10. I figured I’d use 3.10.
Finally, I installed
BiocManager::install("methylKit") #Install methylKit
The package started installing! However, I got a warning that I was using too much of the CPU. That’s when I realized I wasn’t on a build node! I stopped the package installation, quit R, and interrupted my
mox session. I then started a build node:
srun -p build --time=4:00:00 --mem=10G --pty /bin/bash #Request a build node for four hours
I loaded the R module again, then installed
require(BiocManager) #Load package BiocManager::install("methylKit") #Install methylKit require(methylKit) #Load package
It worked! The last package I needed (and almost forgot about) was
dplyr. I ran
require(dplyr) just to see what happened:
The package was already installed! I closed the Terminal window, logged in and requested another build node, and ran
require(methylKit) to ensure I wouldn’t have to install the package again in my SLURM script:
Since that worked too, I tried running
sessionInfo(). Hopefully this information would be saved into my slurm-out file.
I exited R and my build node to finish up my preparation.
File paths on
When working in R Studio, it’s a lot easier for me to save files to various places, or source the data from a different folder since I can set the working directory in a chunk. For the purpose of the R SLURM script, I think it’s easier to have all the data and output files in the same folder. I created a
/gscratch/scrubbed/yaaminiv/Manchester/analyses/methylKit folder to house all relevant files. Then, I navigated to that folder and copied the merged CpG coverage files from
rsync --archive --progress --verbose firstname.lastname@example.org:/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/*merged_CpG_evidence.cov .
The next thing I wanted to do was create a subdirectory structure that mirrored where I saved output files in this R Markdown script. I usually do this within the script itself since I can switch between bash and R, but I will not be able to do that in a SLURM script. I created:
/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/general-statsfor individual-sample and comparative analysis plots
/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/DMLfor DML lists
/gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/rand-testfor randomization test output
Running the R SLURM script
All that’s left to do was create the SLURM script! I copied my R Markdown script into this SLURM script. Then, I ran the script. When I checked the queue (
squeue | grep "srlab"), I found that my script wasn’t running! When I looked at the SLURM information at the top of the script, I saw
SBATCH --mem=500G. I changed it to
SBATCH --mem=100G, and ran the script again. Unfortunately, it timed out immediately!
When I looked at the slurm.out file, I saw the following error:
I then posted in this discussion to see where I should specify
--vanilla. Sam responded and said my shebang should be
#!/gscratch/srlab/programs/R-3.6.2/bin/Rscript, and not
#!/gscratch/srlab/programs/R-3.6.2/bin/R! I changed the shebang and ran the script again.
Obviously, my script timed out again. Looking through the slurm.out, I confirmed a few things. One, any
head() command does print to the slurm.out. Second, I got an error that
dplyr was not available when I ran
require(dplyr). Additionally, there were some packages attached to
methylKit that didn’t load. I opened another build node to install
install.packages("dplyr", lib = "/gscratch/srlab/rpackages") #Install dplyr require(BiocManager) #Load BiocManager install_github("al2na/methylKit", build_vignettes = FALSE, repos = BiocManager::repositories(), dependencies = TRUE) #Install more methylKit options require(methylKit) #Check that all associated packages load
I then modified the script to load several packages at the top:
# Load packages require(devtools) require(BiocManager) require(methylKit) require(dplyr) sessionInfo()
Once I ran this revised script, I ran into the same error! Based on the error messages, I think R was unable to find my specified packages.
I know I installed these packages, so I think they’re not being installed from their actual location.
dplyr are in the
methylKit is installed in
I posted this discussion to see if there was a way to reference library locations in
require(). Why I posted this discussion before actually Googling I don’t know, but Sam and I arrived at the same conclusion: include
lib.loc in require to specify the library location. This is important especially because I have packages installed in two separate locations! I modified my script and ran it again and encountered a new error:
Interestingly, when I loaded packages in the SLURM script, R was unable to find dependencies, even when they were installed (like
usethis). I confirmed that these errors were precluding me from loading packages by running
This began a series of installing packages, running my R script, and finding out I needed to explicitly install another dependency:
…so this is where I quit for now.
- Try different methods to run R script on
- Write methods
- Obtain relatedness matrix and SNPs with EpiDiverse/snp
- Write results
- Identify genomic location of DML
- Determine if RNA should be extracted
- Determine if larval DNA/RNA should be extracted