Hawaii Gigas Methylation Analysis Part 14
Messing around with
methylKit DML settings
The R Studio server works!
Last time I mentioned how the R Studio server wasn’t playing well with my R Markdown script, and that I started running an R script for the analysis instead. Turns out making the switch is what I needed to run my code! When I was working on the Manchester analysis, I did end up using an R script instead of an R Markdown file. This makes me think that there’s some sort of issue using R Markdown on R Studio server, specifically with
calculateDiffMeth. Something to investigate at a later date.
methylKit DML output
unite, I had 1,984,530 CpG loci with data in all 24 samples. Using this object with
calculateDiffMeth (including covariates and an overdispersion correction), I didn’t obtain many ploidy- or pH-DML:
Table 1. Number of DML obtained from various methylation difference thresholds.
Table 2. DML obtained at various percent methylation differences for ploidy or pH contrasts.
I wasn’t expecting many DML from this analysis because of the covariate information, but I’m surprised I don’t have at least a hundred for one of the contrasts. It’s also interesting that all of the DML are found in one chromosome. I should look into that further when I have my final DML list.
Steven suggested I change the
min.per.group parameter in the
unite command to allow for loci without data in all samples to be included in the analysis. I started by allowing a CpG to be present in 9/12 samples per treatment (75%):
methylationInformationFilteredCov5 <- methylKit::unite(processedFilteredFilesCov5, destrand = FALSE, min.per.group = 9L) #Combine all processed files into a single table. Use destrand = TRUE to not destrand. Based with data in at least 9/12 samples peer treatment will be included
With this change, I had 4,557,452 CpG loci with data in at least 9 samples per treatment. I got more DML this time, but still less than 100! Looking at my reference script from another paper, it seemed like they tried out several
min.per.group options. For reference, I have between 5,173,369 and 8,057,460 CpG loci with data with at least 5x coverage, depending on the sample. Since I’m getting 4,557,452 after
min.per.group = 9L, I’m not sure if it’s worth doing something similar to 1) count the number of CpG loci with data and 2) count the number of DML obtained with multiple settings. However, I decided to try
min.per.group = 8L, which gave me 5,103,729 CpGs.
Table 3. Number of DML obtained from various methylation difference thresholds using
min.per.group = 9L or
min.per.group = 8L.
Based on Table 3, I think I need to use a 25% threshold for calling DML. Again, I’m pretty sure my use of covariates is affecting the P- and q-values I get from
methylKit. I think I’ll also use the
min.per.group = 8L output, since I get marginally more DML.
Redoing comparative analysis with
min.per.group = 8L
My previous comparative analysis used the default settings: CpGs must have data in all samples to be included in downstream analysis. I re-did the comparative analysis with the
min.per.group = 8L parameter.
Figure 1. Correlation plot
Figure 2. Clustering diagram
Figure 3. PCA
The PCA and clustering diagram don’t look very different, but the correlations between samples are lower 0.01. Overall, nothing that rocks the boat.
- Test-run DSS and ramwas
- Look at DML in IGV
- Try EpiDiverse/snp for SNP extraction from WGBS data
methylKitrandomization test on
- Investigate comparison mechanisms for samples with different ploidy in oysters and other taxa
- Characterize the general methylation landscape
- Identify genomic locations of DML
- Transfer scripts used to a nextflow workflow
- Update methods
- Update results