DML Analysis Part 20
Proportion test results
So far, I’ve used
bedtools to find overlaps bewteen DML, DMR, the gene background, and various genome features (exons, introns, mRNA coding regions, and transposable elements). I calculated proportions between DML, DMR, and genome features in this Jupyter notebook, and overlap proportions between the gene background and genome features in this Jupyter notebook. The gene background refers to the output from
methylKit script for more information).
My next step was to see if these proportions were significantly different from eachother using
prop.test in this R Markdown file. I pulled the number of overlaps from my Jupyter notebooks and used that as the number of successes. The line counts for each genome feature file were used as totals. I compared all three proportions, but also did pairwise comparisons between the gene background and either DML or DMR. My
prop.test output can be found in this file and in Table 1 below.
Table 1. Results from
prop.test in R. Test results are organized first by the genomic feature overlaps being tested (ex. exons), then by the comparisons included. “All” refers to DML, DMR, and gene background proportions, “DML-GB” for only DML and gene background proportions, and “DMR-GB” for only DMR and gene background proportions. Significant p-values at the 0.05 level are bolded.
|Transposable Elements (All)||All||26.13||2||2.12e-06|
|Transposable Elements (All)||DML-GB||0.67||1||0.41|
|Transposable Elements (All)||DMR-GB||24.15||1||8.90e-07|
|Transposable Elements (Cg)||All||14.62||2||0.0007|
|Transposable Elements (Cg)||DML-GB||8.18||1||0.004|
|Transposable Elements (Cg)||DMR-GB||5.48||1||0.02|
When comparing all three proportions, all proportions were significantly different from eachother. For the DML-GB tests, all comparisons were significant except for mRNA and transposable elements (all) overlaps. It was interesting that the overlap proportions were significantly different for exons and introns, but not mRNA. All DMR-GB comparisons were significant as well. The differences in significance between DML-GB and DMR-GB could be attributed to the way I calculated overlaps. Each overlapping region is listed as one line entry by
bedtools. DMR overlapping regions can be multiple base pairs long because each DMR is 100 bp. However, DML and gene background overlapping regions can only be one base pair because DML and the gene background are each listed locus by locus. It will be interesting to calculate the actual length of each DMR overlap, then use that in a proportion test.
For now, I can conclude that DML and DMR locations are different from the gene background’s location. That will be interesting to interpret in my paper!
- See how
min_cov, alignment stringency, or SNPs affect clustering
- Determine if a formal gene enrichment is necessary
- If necessary, select the most appropriate gene enrichment method
- Describe functions of most interesting genes with DML and DMR