WGBS Analysis Part 23

methylKit with R Studio server on mox

TL;DR I can run calculateDiffMeth and get DML! Please enjoy this saga of troubleshooting and not reading errors properly.

Alas, after running for five hours my R SLURM script failed cries. When I looked at the SLURM output, I saw the following error:

Screen Shot 2021-04-27 at 9 33 51 AM

calculateDiffMeth was giving me problems again! I posted this discussion to get input from Sam and Steven about what to do next.

Revisiting the build node

In the meantime, I decided to revisit the build node. I requested a build node and loaded R:

srun -p build --time=4:00:00 --mem=10G --pty /bin/bash #Request a build node for four hours
module load r_3.6.0 #Load R version 3.6.0
R #Start R

I loaded my R data and ran calculateDiffMeth without a covariate matrix or overdispersion correction. It finished running and I ran warnings(). They were all glm.fit errors, which I’ve encountered before. I was able to produce a methylDiff object, so that’s a good sign!

Screen Shot 2021-04-27 at 1 00 47 PM

Screen Shot 2021-04-27 at 1 01 23 PM

I then tried running calculateDiffMeth with the covariate matrix and overdispersion correction. Around the four hour mark, my command timed out. I initially thought it was related to calculateDiffMeth, but I then I realized it was probably because the four hour reservation ended! In any case, it was ready for me to move onto a different option: R Studio server.

Troubleshooting R Studio server

When I started running calculateDiffMeth on the build node, I also started working through the R Studio server Sam set up on mox. I followed the instructions he laid out in this discussion. To run R Studio server, I needed to change my R library directory information in ~/.Renviron, then run a SLURM script to get R Studio login credentials. I could then tunnel into mox in a separate Terminal window to run R Studio.

When I went to change my ~/.Renviron library location, I realized I didn’t have a ~/.Renviron file to begin with! Sam said I should make one, so I did.

Screen Shot 2021-04-28 at 1 28 24 PM

I created this script to get R Studio log in credentials. After running it, I was unable to tunnel into mox and access the R Studio server! I posted this discussion with the specific “connection refused” error I got:

Screen Shot 2021-04-27 at 1 02 18 PM

Turns out I misinterpreted Sam’s original script, and didn’t include any of the important things that started the R Studio server session! I added the script sections I needed to, then ran it again and was able to access R Studio server. I found my R Script and started running calculateDiffMeth. According to Sam, the session should still run even if I close the window.

Later at night, I decided to test that theory. When I closed the R Studio server browser on my local machine, I got an error in my Terminal window. I logged back into R Studio server on genefish, I saw the following error:

Screen Shot 2021-04-27 at 10 34 46 PM

It seemed like calculateDiffMeth finished running, but there was an “error writing to connection” that I didn’t understand. I set up calculateDiffMeth to run through the night. I saw that I was logged out after 60 minutes due to inactivity. When I logged back in, I saw a slew of error messages:

Screen Shot 2021-04-28 at 11 26 07 AM

  1. There were 50 or more warnings (use warnings() to see the first 50): Checked warnings(), they’re glm.fit errors (not related to the issue at hand). At least I know calculateDiffMeth ran!

Screen Shot 2021-04-28 at 11 26 30 AM

  1. Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'head': object 'differentialMethylationStatsTreatment' not found: My fault, didn’t reference the correct data frame. When I referenced the correct calculateDiffMeth output, the command worked!

  2. Same error writing to connection error! However, it my calculateDiffMeth command DID finished running, and my global environment was loaded again when I logged back into the R Studio server.

Sam investigated it further and saw that my home directory may be full, leading to errors saving any R Studio output, like ~/.rstudio or ~/.Rdata in that directory. I was saving my R data in /gscratch/scrubbed/yaaminiv/Manchester/analyses/methylKit, but it’s possible that the large methylKit objects were causing resulting in a ~/.rstudio file. Interestingly, my global environment was loading each time I logged into the R Studio server. I checked my login node quota and ruled out that it was a storage issue on my end! Since the R Studio server was still working for me and I was able to save output to my scrubbed directory, I kept going.

The bottom line

The force termination was likely because the build node timed out because I only requested one for four hours, and the head command error was because I was incorrectly calling the dataframe. Thanks to R Studio server, I could confirm that this was the case and ensure that the warnings were not devastating! I updated my “connection refused” discussion with explanations of my error messages, and posted an update in my original discussion about calculateDiffMeth.

But the best part was that I ran getMethylDiff AND I HAVE DML! I’ll post about that in a separate notebook entry.

So really, it was a typo all along.

Going forward

  1. Finish running methylKit on R Studio server
  2. Write methods
  3. Write results
  4. Update mox handbook with R information
  5. Obtain relatedness matrix and SNPs with EpiDiverse/snp
  6. Identify genomic location of DML
  7. Determine if RNA should be extracted
  8. Determine if larval DNA/RNA should be extracted
Written on April 27, 2021