WGBS Analysis Part 21

Running R scripts on mox

Alright, I have R installed, which is maybe a moot point but I couldn’t get methylKit installed. Let’s see if I can actually run an R SLURM script today.

Installing packages (round 2)

To install methylKit, I decided to use an older version of R. I first loaded the module:

module load r_3.6.0 #Load R version 3.6.0
R #Start running R

Once I had the older R version, I was able to run install devtools!:

 install.packages("devtools", lib = "/gscratch/srlab/rpackages") #Install devtools to the specified folder
 require(devtools) #Load devtools

My next step was installing Bioconductor. I followed the installation instructions from the Bioconductor website:

install.packages("BiocManager", lib = "/gscratch/srlab/rpackages") #Install BiocManager to the specified folder
BiocManager::install(version = "3.10") #Install the correct version of BiocManager for the R version used

Turns out there are specific BiocManager versions for each R version! I used this Bioconductor release guide to determine which BiocManager version I needed to install. Since I was using R.3.6.0, I could use BiocManager versions 3.9 or 3.10. I figured I’d use 3.10.

Finally, I installed methylKit:

BiocManager::install("methylKit") #Install methylKit

The package started installing! However, I got a warning that I was using too much of the CPU. That’s when I realized I wasn’t on a build node! I stopped the package installation, quit R, and interrupted my mox session. I then started a build node:

srun -p build --time=4:00:00 --mem=10G --pty /bin/bash #Request a build node for four hours

I loaded the R module again, then installed methylKit:

require(BiocManager) #Load package
BiocManager::install("methylKit") #Install methylKit
require(methylKit) #Load package

It worked! The last package I needed (and almost forgot about) was dplyr. I ran require(dplyr) just to see what happened:

Screen Shot 2021-04-21 at 10 34 00 AM

The package was already installed! I closed the Terminal window, logged in and requested another build node, and ran require(methylKit) to ensure I wouldn’t have to install the package again in my SLURM script:

Screen Shot 2021-04-21 at 10 36 27 AM

Since that worked too, I tried running sessionInfo(). Hopefully this information would be saved into my slurm-out file.

Screen Shot 2021-04-21 at 10 37 28 AM

I exited R and my build node to finish up my preparation.

File paths on mox

When working in R Studio, it’s a lot easier for me to save files to various places, or source the data from a different folder since I can set the working directory in a chunk. For the purpose of the R SLURM script, I think it’s easier to have all the data and output files in the same folder. I created a /gscratch/scrubbed/yaaminiv/Manchester/analyses/methylKit folder to house all relevant files. Then, I navigated to that folder and copied the merged CpG coverage files from gannet to mox:

rsync --archive --progress --verbose yaamini@172.25.149.226:/Volumes/web/spartina/project-gigas-oa-meth/output/bismark-roslin/*merged_CpG_evidence.cov .

The next thing I wanted to do was create a subdirectory structure that mirrored where I saved output files in this R Markdown script. I usually do this within the script itself since I can switch between bash and R, but I will not be able to do that in a SLURM script. I created:

  • /gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/general-stats for individual-sample and comparative analysis plots
  • /gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/DML for DML lists
  • /gscratch/scrubbed/yamainiv/Manchester/analyses/methylKit/rand-test for randomization test output

Running the R SLURM script

All that’s left to do was create the SLURM script! I copied my R Markdown script into this SLURM script. Then, I ran the script. When I checked the queue (squeue | grep "srlab"), I found that my script wasn’t running! When I looked at the SLURM information at the top of the script, I saw SBATCH --mem=500G. I changed it to SBATCH --mem=100G, and ran the script again. Unfortunately, it timed out immediately!

When I looked at the slurm.out file, I saw the following error:

Screen Shot 2021-04-21 at 9 27 29 PM

I then posted in this discussion to see where I should specify --save, --no-save, or --vanilla. Sam responded and said my shebang should be #!/gscratch/srlab/programs/R-3.6.2/bin/Rscript, and not #!/gscratch/srlab/programs/R-3.6.2/bin/R! I changed the shebang and ran the script again.

Obviously, my script timed out again. Looking through the slurm.out, I confirmed a few things. One, any head() command does print to the slurm.out. Second, I got an error that dplyr was not available when I ran require(dplyr). Additionally, there were some packages attached to methylKit that didn’t load. I opened another build node to install dplyr:

install.packages("dplyr", lib = "/gscratch/srlab/rpackages") #Install dplyr
require(BiocManager) #Load BiocManager
install_github("al2na/methylKit", build_vignettes = FALSE, repos = BiocManager::repositories(), dependencies = TRUE) #Install more methylKit options
require(methylKit) #Check that all associated packages load

I then modified the script to load several packages at the top:

# Load packages

require(devtools)
require(BiocManager)
require(methylKit)
require(dplyr)
sessionInfo()

Screen Shot 2021-04-22 at 9 43 51 AM

Screen Shot 2021-04-22 at 9 42 41 AM

Once I ran this revised script, I ran into the same error! Based on the error messages, I think R was unable to find my specified packages. Screen Shot 2021-04-22 at 9 43 51 AM

Screen Shot 2021-04-22 at 9 42 41 AM

I know I installed these packages, so I think they’re not being installed from their actual location. BiocManager, devtools, and dplyr are in the /gscratch/srlab/rpackages/ directory:

Screen Shot 2021-04-22 at 9 47 59 AM

methylKit is installed in /gscratch/home/yaaminiv/R/x86_64-pc-linux-gnu-library/3.6/:

Screen Shot 2021-04-22 at 9 50 32 AM

I posted this discussion to see if there was a way to reference library locations in require(). Why I posted this discussion before actually Googling I don’t know, but Sam and I arrived at the same conclusion: include lib.loc in require to specify the library location. This is important especially because I have packages installed in two separate locations! I modified my script and ran it again and encountered a new error:

Screen Shot 2021-04-23 at 10 51 03 AM

Screen Shot 2021-04-23 at 10 51 17 AM

Interestingly, when I loaded packages in the SLURM script, R was unable to find dependencies, even when they were installed (like usethis). I confirmed that these errors were precluding me from loading packages by running sessionInfo:

Screen Shot 2021-04-23 at 10 58 34 AM

This began a series of installing packages, running my R script, and finding out I needed to explicitly install another dependency:

Screen Shot 2021-04-23 at 11 31 20 AM

Screen Shot 2021-04-23 at 11 32 12 AM

Screen Shot 2021-04-26 at 9 30 36 AM

Screen Shot 2021-04-26 at 9 30 59 AM

…so this is where I quit for now.

Going forward

  1. Try different methods to run R script on mox
  2. Write methods
  3. Obtain relatedness matrix and SNPs with EpiDiverse/snp
  4. Write results
  5. Identify genomic location of DML
  6. Determine if RNA should be extracted
  7. Determine if larval DNA/RNA should be extracted
Written on April 20, 2021