Hawaii Gigas Methylation Analysis Part 13
Comparative analysis with methylKit
It’s time to (finally) run methylKit
with the Hawaii samples!
Setting up R Studio server
I opened my R Markdown script and edited it based on my experiences with the Manchester samples. First, I removed the randomization test code, since I decided to run it separately without the R Studio server. I also updated my code to reshuffle treatments based on my new understanding of reorganize
.
Once I had my code updated, I created this SLURM script to generate the R Studio session. Sam suggested I update the code that calls the Singularity container to the following to remove any timeouts, automatic sign-outs, and keep me signed in for 30 days:
# This example bind mounts the /gscratch/scrubbed/${USER} directory on the host into the Singularity container.
# By default the only host file systems mounted within the container are $HOME, /tmp, /proc, /sys, and /dev.
singularity exec \
--bind="$TMPDIR/var/lib:/var/lib/rstudio-server" \
--bind="$TMPDIR/var/run:/var/run/rstudio-server" \
--bind="$TMPDIR/tmp:/tmp" \
--bind=/gscratch/scrubbed/${USER} \
${container_path}/${container} \
rserver --www-port ${PORT} --auth-none=0 --auth-pam-helper-path=pam-helper --auth-stay-signed-in-days=30 --auth-timeout-minutes=0
I updated that code chunk, copied the coverage files from gannet
to mox
, and ran the SLURM script! I immediately encountered an error with the R Studio tunnel:
I realized there was an error in my SLURM output that killed the script!
I commented on this discussion to see if Sam could help. When I compared my Manchester R Studio SLURM script with this one, I realized that I was using {container}
: a variable that did not exist in my script! I replaced ${container_path}/${container}
with my actual container path and successfully established the tunnel.
Setting root directory
With my R markdown script on the R Studio server, I started with the processing the coverage files. When I tried running methRead
, I realized my root directory was set to be where my R Markdown file was located: my home directory/login node. Since I was using an R Markdown file, I couldn’t just use setwd()
to change my working directory to /gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit
. After a quick Google Search, I learned I could change the root directory in my knitr
set up chunk:
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = "/gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit/") #Set root directory
Once I had the root directory set, I could process the coverage file with methRead
. This change also made my downstream code much cleaner, since I didn’t need a /gscratch/scrubbed/yaaminiv/Hawes/analyses/methylKit
prefix before each file path!
Sample-specific descriptive statistics and comparative analysis
Before uniting the data, I examined percent CpG coverage and methylation in each sample (figures here). There are more differences in coverage between samples, but I didn’t see any samples that looked wildly different. Methylation distribution looked uniform for all samples.
Next, I united the data and created a correlation plot, clustering diagram, and PCA.
Figure 1. Correlation plot
Figure 2. Clustering diagram
Figure 3. PCA
Correlations between samples were > 86% across the board, even between diploid and triploid oysters, and there wasn’t a clear separation of samples by pH treatment or ploidy when looking at the clustering diagram. There was a fair amount of chaining on the clustering diagram, but 3H-2 separated out early from the rest of the samples. When I looked at the PCA, 3H-2 looked like an outlier! 2H-3, 3H-3, and 2H-1 were also farther apart from the tight cluster of other samples. 3H-2 and 3H-3 separated from the main cluster along PC1, while 2H-1 and 2H-3 separated along PC2. I wonder if I can expect more variability in oysters from the high pH (control) treatment because there wasn’t a stressor present, but if that variation could be different depending on ploidy status. Each ploidy x pH combination had six samples, so a third of the samples from 2H and 3H seeming like outliers is interesting! I reviewed my bismark
output to see if there were any mapping variables that potentially impacted the PCA. I noticed that 2H-1 had the lowest % mCpG, and 3H-2 and 3H-3 had similar % mCpG. However, I couldn’t pick out any differences in M C’s, % duplicates, or % aligned.
Obtaining DML (and more R Studio server issues)
I ran the following code chunk to calculate methylation differences at each locus…
differentialMethylationStatsPloidy <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5,
covariates = covariatepH,
overdispersion = "MN", test = "Chisq") #Calculate differential methylation statistics based on treatment indication from methRead. Include pH as a covariate. Use 40 cores.
head(differentialMethylationStatsPloidy) #Look at differential methylation statistics
and it worked! But then R Studio froze before I could save the data or extract DML information. The saga continues.
I tried closing the SSH tunnel, reopening it, and loading R Studio. I also tried cancelling my SLURM script, running it again, opening a new SSH tunnel, and opening R Studio. Each time I got the same result! My R Markdown script loaded and after a few seconds of not being able to interact with the session, I got the following error:
I’ve tried pressing “Wait”, “Exit Page”, and doing nothing. No matter what, I’m kicked out of the session:
This is similar to something I mentioned with the emu and roadrunner R studio servers in this discussion. So obviously I posted this discussion to record yet another struggle for posterity.
I tried again, and immediately closed my R Markdown script. After the script was closed, I was able to interact with the R Studio session! I deleted the R Markdown script and added it again to mox
with rsync
, and got the same “Page Unresponsive error”. When I created an analagous R script and opened it, I was able to interact with the R Studio session! I think I need a way to clear the R Studio server cache associated with the R Markdown file. Steven mentioned in local R Studio, you can change settings to not open prior scripts. I’ll investigate that option once I figure out if my code (now running on the R Studio server in my regular R script) runs, if I’m able to save my R data, and if I can interact with the server afterwards!
Going forward
- Try EpiDiverse/snp for SNP extraction from WGBS data
- Run
methylKit
script onmox
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Test-run DSS and ramwas
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results