DML Analysis Part 32
Revised finalized general methylation trends
Based on feedback from Katie, Steven, and Alan, I went back to my code and revised C. virginica gonad methylation trends.
Revised timeline
But first…let’s revist my grand procalamtion: “I’m going to finish this paper by the end of the month.” (lol) I think I can finish my analyses by the end of this week and have some sort of draft discussion. Before the next E2O meeting, I’ll definitely have a draft paper ready.
Characterizing loci locations
In this Jupyter notebook, I characterized the location of all 5x CpGs I had data for, sparsely methylated loci, and unmethyalted loci. Katie suggested I make a table with this information for statistical analyses, so I did.
Table 1. Locations of 5x CpGs enriched by MBD treatment, methylated loci, sparsely methylated loci, and unmethylated loci. “Other” refers to any loci that did not overlap with exons, introns, transposable elements (all), and putative promoters.
Category | All 5x CpGs | Methylated | Sparsely methylated | Unmethylated |
---|---|---|---|---|
Total | 4,304,257 | 3,181,904 | 481,788 | 640,565 |
Unique genes | 54,619 | 44,505 | 47,243 | 47,584 |
mRNA coding regions | 3,140,744 | 2,437,901 | 303,890 | 398,953 |
Exons | 1,366,779 | 1,013,691 | 105,871 | 247,217 |
Introns | 1,811,271 | 1,448,786 | 201,553 | 160,932 |
Transposable elements (all) | 1,011,883 | 755,222 | 155,293 | 101,368 |
Transposable elements (Cg) | 767,604 | 610,208 | 108,858 | 48,538 |
Putative promoters | 203,376 | 134,534 | 27,443 | 41,399 |
Other | 627,257 | 386,003 | 86,923 | 154,331 |
Chi-squared tests
For this round of chi-squared tests, I updated my overlap proportions file. I used this code to conduct chi-squared tests for various groupings. Every single time, I found that my distributions were significantly different. However, I’m now doubting how effective chi-squared tests are as a statistical approach. Looking at the observed, expected, and residual values from chisq.test
from my test comparing all CG motifs with those enriched by MBD, I notice that my expected values are not what I want. In comparing these distributions, I want the background proportions (in this case, all CG motifs), to serve as the expected values, but that’s not the case:
I psoted this issue to get some clarification on my methods.
Revised figures
In the meantime, I revised my figures!
Figure 1. Distribution of CpGs enriched by MBD versus all CpGs in the C. virginica genome.
Figure 2. Distribution of methylated CpGs versus those enriched by MBD.
Figure 3. Distribution of methylated, sparsely methylated, and unmethylated CpGs versus CpGs enriched by MBD.
Figure 4. Distribution of DML versus methylated CpGs.
Once I sort out my statstical tests I can actually add significances to my figures.
Going forward
- Sort out statistical tests
- Describe (somehow) genes with DML in them
- Figure out what’s going on with the gene background
- Figure out what’s going on with DMR
- Work through gene-level analysis
- Update paper repository
- Start writing the discussion