Hawaii Gigas Methylation Analysis Part 13

Comparative analysis with `methylKit`

It’s time to (finally) run methylKit with the Hawaii samples!

Setting up R Studio server

I opened my R Markdown script and edited it based on my experiences with the Manchester samples. First, I removed the randomization test code, since I decided to run it separately without the R Studio server. I also updated my code to reshuffle treatments based on my new understanding of reorganize.

Once I had my code updated, I created this SLURM script to generate the R Studio session. Sam suggested I update the code that calls the Singularity container to the following to remove any timeouts, automatic sign-outs, and keep me signed in for 30 days:

# This example bind mounts the /gscratch/scrubbed/${USER} directory on the host into the Singularity container.
# By default the only host file systems mounted within the container are $HOME, /tmp, /proc, /sys, and /dev.
singularity exec \
--bind="$TMPDIR/var/lib:/var/lib/rstudio-server" \
--bind="$TMPDIR/var/run:/var/run/rstudio-server" \
--bind="$TMPDIR/tmp:/tmp" \
--bind=/gscratch/scrubbed/${USER} \
${container_path}/${container} \
rserver --www-port ${PORT} --auth-none=0 --auth-pam-helper-path=pam-helper --auth-stay-signed-in-days=30 --auth-timeout-minutes=0

I updated that code chunk, copied the coverage files from gannet to mox, and ran the SLURM script! I immediately encountered an error with the R Studio tunnel:

Screen Shot 2021-05-07 at 11 24 15 AM

I realized there was an error in my SLURM output that killed the script!

Screen Shot 2021-05-07 at 11 23 51 AM

I commented on this discussion to see if Sam could help. When I compared my Manchester R Studio SLURM script with this one, I realized that I was using {container}: a variable that did not exist in my script! I replaced ${container_path}/${container} with my actual container path and successfully established the tunnel.

Setting root directory

With my R markdown script on the R Studio server, I started with the processing the coverage files. When I tried running methRead, I realized my root directory was set to be where my R Markdown file was located: my home directory/login node. Since I was using an R Markdown file, I couldn’t just use setwd() to change my working directory to /gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit. After a quick Google Search, I learned I could change the root directory in my knitr set up chunk:

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = "/gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit/") #Set root directory

Once I had the root directory set, I could process the coverage file with methRead. This change also made my downstream code much cleaner, since I didn’t need a /gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit prefix before each file path!

Sample-specific descriptive statistics and comparative analysis

Before uniting the data, I examined percent CpG coverage and methylation in each sample (figures here). There are more differences in coverage between samples, but I didn’t see any samples that looked wildly different. Methylation distribution looked uniform for all samples.

Next, I united the data and created a correlation plot, clustering diagram, and PCA.

Screen Shot 2021-05-15 at 1 10 16 PM

Figure 1. Correlation plot

Screen Shot 2021-05-15 at 1 10 46 PM

Figure 2. Clustering diagram

Screen Shot 2021-05-15 at 1 10 58 PM

Figure 3. PCA

Correlations between samples were > 86% across the board, even between diploid and triploid oysters, and there wasn’t a clear separation of samples by pH treatment or ploidy when looking at the clustering diagram. There was a fair amount of chaining on the clustering diagram, but 3H-2 separated out early from the rest of the samples. When I looked at the PCA, 3H-2 looked like an outlier! 2H-3, 3H-3, and 2H-1 were also farther apart from the tight cluster of other samples. 3H-2 and 3H-3 separated from the main cluster along PC1, while 2H-1 and 2H-3 separated along PC2. I wonder if I can expect more variability in oysters from the high pH (control) treatment because there wasn’t a stressor present, but if that variation could be different depending on ploidy status. Each ploidy x pH combination had six samples, so a third of the samples from 2H and 3H seeming like outliers is interesting! I reviewed my bismark output to see if there were any mapping variables that potentially impacted the PCA. I noticed that 2H-1 had the lowest % mCpG, and 3H-2 and 3H-3 had similar % mCpG. However, I couldn’t pick out any differences in M C’s, % duplicates, or % aligned.

Obtaining DML (and more R Studio server issues)

I ran the following code chunk to calculate methylation differences at each locus…

differentialMethylationStatsPloidy <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5,
                                                                   covariates = covariatepH,
                                                                   overdispersion = "MN", test = "Chisq") #Calculate differential methylation statistics based on treatment indication from methRead. Include pH as a covariate. Use 40 cores.
head(differentialMethylationStatsPloidy) #Look at differential methylation statistics

and it worked! But then R Studio froze before I could save the data or extract DML information. The saga continues.

I tried closing the SSH tunnel, reopening it, and loading R Studio. I also tried cancelling my SLURM script, running it again, opening a new SSH tunnel, and opening R Studio. Each time I got the same result! My R Markdown script loaded and after a few seconds of not being able to interact with the session, I got the following error:

Screen Shot 2021-05-08 at 1 54 33 PM

I’ve tried pressing “Wait”, “Exit Page”, and doing nothing. No matter what, I’m kicked out of the session:

Screen Shot 2021-05-08 at 1 54 59 PM

This is similar to something I mentioned with the emu and roadrunner R studio servers in this discussion. So obviously I posted this discussion to record yet another struggle for posterity.

I tried again, and immediately closed my R Markdown script. After the script was closed, I was able to interact with the R Studio session! I deleted the R Markdown script and added it again to mox with rsync, and got the same “Page Unresponsive error”. When I created an analagous R script and opened it, I was able to interact with the R Studio session! I think I need a way to clear the R Studio server cache associated with the R Markdown file. Steven mentioned in local R Studio, you can change settings to not open prior scripts. I’ll investigate that option once I figure out if my code (now running on the R Studio server in my regular R script) runs, if I’m able to save my R data, and if I can interact with the server afterwards!

Going forward

Try EpiDiverse/snp for SNP extraction from WGBS data
Run methylKit script on mox
Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
Test-run DSS and ramwas
Transfer scripts used to a nextflow workflow
Update methods
Update results

Written on May 7, 2021