Hawaii Gigas Methylation Analysis Part 14

Messing around with `methylKit` DML settings

The R Studio server works!

Last time I mentioned how the R Studio server wasn’t playing well with my R Markdown script, and that I started running an R script for the analysis instead. Turns out making the switch is what I needed to run my code! When I was working on the Manchester analysis, I did end up using an R script instead of an R Markdown file. This makes me think that there’s some sort of issue using R Markdown on R Studio server, specifically with calculateDiffMeth. Something to investigate at a later date.

Evaluating `methylKit` DML output

After unite, I had 1,984,530 CpG loci with data in all 24 samples. Using this object with calculateDiffMeth (including covariates and an overdispersion correction), I didn’t obtain many ploidy- or pH-DML:

Table 1. Number of DML obtained from various methylation difference thresholds.

Contrast	25%	50%	75%
Ploidy	1	0	0
pH	4	1	0

Table 2. DML obtained at various percent methylation differences for ploidy or pH contrasts.

Contrast	chr	start	end	pvalue	qvalue	meth.diff
ploidy	NC_047566.1	46447078	46447080	8.71E-13	1.62E-06	37.3155448
pH	NC_047561.1	7843128	7843130	5.50E-08	0.0044294	-26.315789
pH	NC_047565.1	4762558	4762560	3.96E-08	0.00364724	-26.731667
pH	NC_047565.1	44578741	44578743	1.23E-07	0.00671761	-26.789645
pH	NC_047567.1	9113140	9113142	9.18E-08	0.00610252	50.1223071
pH	NC_047567.1	9113140	9113142	9.18E-08	0.00610252	50.1223071

I wasn’t expecting many DML from this analysis because of the covariate information, but I’m surprised I don’t have at least a hundred for one of the contrasts. It’s also interesting that all of the DML are found in one chromosome. I should look into that further when I have my final DML list.

Changing `min.per.group` within `unite`

Steven suggested I change the min.per.group parameter in the unite command to allow for loci without data in all samples to be included in the analysis. I started by allowing a CpG to be present in 9/12 samples per treatment (75%):

methylationInformationFilteredCov5 <- methylKit::unite(processedFilteredFilesCov5,
                                                       destrand = FALSE,
                                                       min.per.group = 9L) #Combine all processed files into a single table. Use destrand = TRUE to not destrand. Based with data in at least 9/12 samples peer treatment will be included

With this change, I had 4,557,452 CpG loci with data in at least 9 samples per treatment. I got more DML this time, but still less than 100! Looking at my reference script from another paper, it seemed like they tried out several min.per.group options. For reference, I have between 5,173,369 and 8,057,460 CpG loci with data with at least 5x coverage, depending on the sample. Since I’m getting 4,557,452 after unite with min.per.group = 9L, I’m not sure if it’s worth doing something similar to 1) count the number of CpG loci with data and 2) count the number of DML obtained with multiple settings. However, I decided to try min.per.group = 8L, which gave me 5,103,729 CpGs.

Table 3. Number of DML obtained from various methylation difference thresholds using min.per.group = 9L or min.per.group = 8L.

Contrast	min.per.group	25%	50%
Ploidy	9	19	0
Ploidy	8	29	1
pH	9	32	2
pH	8	42	3

Based on Table 3, I think I need to use a 25% threshold for calling DML. Again, I’m pretty sure my use of covariates is affecting the P- and q-values I get from methylKit. I think I’ll also use the min.per.group = 8L output, since I get marginally more DML.

Redoing comparative analysis with `min.per.group = 8L`

My previous comparative analysis used the default settings: CpGs must have data in all samples to be included in downstream analysis. I re-did the comparative analysis with the min.per.group = 8L parameter.

corr-plot

Figure 1. Correlation plot

clustering

Figure 2. Clustering diagram

PCA

Figure 3. PCA

The PCA and clustering diagram don’t look very different, but the correlations between samples are lower 0.01. Overall, nothing that rocks the boat.

Going forward

Test-run DSS and ramwas
Look at DML in IGV
Try EpiDiverse/snp for SNP extraction from WGBS data
Run methylKit randomization test on mox
Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
Characterize the general methylation landscape
Identify genomic locations of DML
Transfer scripts used to a nextflow workflow
Update methods
Update results

Written on May 12, 2021

Hawaii Gigas Methylation Analysis Part 14

Messing around with methylKit DML settings

The R Studio server works!

Evaluating methylKit DML output

Changing min.per.group within unite

Redoing comparative analysis with min.per.group = 8L

Going forward

Messing around with `methylKit` DML settings

Evaluating `methylKit` DML output

Changing `min.per.group` within `unite`

Redoing comparative analysis with `min.per.group = 8L`