DML Analysis Part 36

Reworking DMR

Changing methylKit parameters

One thing Mac mentioned to me at FROGER was the use of the cov.bases in tileMethylCounts. The argument cov.bases allows me to set the minimum number of bases to cover in a window. Looking at Mac’s salmon paper, I saw that she set cov.bases to 1, which is different than the default 0. In my R Markdown file, I also set cov.bases to 1 and created 100 bp, 500 bp, and 1000 bp DMR. All of the data and figures I generated are tagged with the date “2019-06-05” and can be found here.

Table 1. Number of DMR identified using different window sizes. Step size and window size were equal.

Window Size (bp) Number of DMR
100 71
500 12
1000 5

Visualizing DMR in IGV

My gut feeling was to go with the 100 bp DMR, just because it gives me a larger dataset to work with. Obviously gut feelings aren’t enough, so I visualized the different DMR sizes in IGV.

Screen Shot 2019-06-11 at 3 51 56 PM

Screen Shot 2019-06-11 at 3 52 13 PM

Screen Shot 2019-06-11 at 3 52 49 PM

Figures 1-3. 100 bp, 500 bp, and 1000 bp DMR tracks in IGV.

I found that the 100 bp DMR more consistently matched with the location of DML on various chromosomes (Figures 1-3). For example, there would be a genomic region with no DML, but a 500 bp DMR. When I looked closely at these DMR, I found that these were regions with one or two CpG loci with data for only a few samples. Some chromosomes did not have any DMR when looking at the 500 bp or 1000 bp tracks even though they had DML. After looking at the data in IGV, I trust the 100 bp DNMR more, so I’ll continue to use that for analyses. I quickly generated separate BEDfiles for hypermethylated and hypomethylated DMR so I could compare that to the breakdowns I had for hyper- and hypomethylated DML. Out of 71 total DMR, 37 are hypermethylated and 34 are hypomethylated.

Characterizing overlaps with DMR

I returned to this Jupyter notebook to characterize DMR overlaps with various genome feature tracks. I looked at overlaps for all DMR, as well as hyper- and hypomethylated DMR separately.

Table 2. Overlaps between DMR and various genome feature tracks.

Feature Hypermethylated DMR Hypomethylated DML All DMR
Genes 33 33 66
Unique Genes 33 33 65
Exons 19 19 38
Introns 27 24 51
Transposable Elements (All) 3 8 11
Transposable Elements (C. gigas only) 3 6 9
Putative promoters 1 7 8
Other 2 0 2

Correcting DML chi-squared tests

Before creating DMR figures, I decided to take a quick DML detour and address a comment Steven gave me. When I initially conducted chi-squared tests with DML, I set the methylated CpGs as the background. While this is an interesting comparison, the methylated CpGs are not the appropriate background, since methylKit pulls DML from MBD-enriched loci. In this R Markdown file, I conducted chi-squared tests for MBD-enriched vs. DML and found significantly different distributions (chi-squared statistic = 342.69, df = 4, p-value < 2.2e-16). I also created a figure for this comparison.

Screen Shot 2019-06-11 at 6 12 26 PM

Figure 4. Comparing overlap proportions between MBD-enriched loci and DML.

DMR overlap figures

Since DMR are 100 bp and loci are well…1 bp, I decided that comparing distribution of loci with distribution of DMR did not make sense. If I were to do a chi-squared tests, I’d need to use the appropriate background: all the tiles generated by methylKit in the sliding window analysis. These 100 bp windows are all possible DMR. I exported all the tiles from methylkit in this R Markdown file. I then returned to this Jupyter notebook to characterize the locations of the DMR background.

Table 3. Overlaps between DMR background and various genome feature tracks. There were 152,226 possible tiles.

Feature DMR Background
Genes 142153
Unique Genes 11578
Exons 92552
Introns 93707
Transposable Elements (All) 25117
Transposable Elements (C. gigas only) 20228
Putative promoters 8238
Other 4649

I added the background overlap and DMR overlap counts to this table. I found that the distribution of the DMR background and DMR themselves were not significantly different (chi-squared statistic = 5.8078, df = 4, p-value = 0.214). I did, however, get a warning that the chi-squared approximation may be incorrect.

While Mac didn’t do a chi-squared test with her salmon DMR, she did create plots that compared the proportion DMR in various genomic features with the DMR background. I decided to follow her precedent and do the same in this R Markdown file.

Screen Shot 2019-06-12 at 11 10 54 AM

Figure 5. Comparing overlap proportions between the DMR background and DMR. There were no significant differences in the distribution.

Going forward

  1. Create an annotated table of DML and DMR
  2. Conduct a gene enrichment for DML and DMR
  3. Work through gene-level analysis
  4. Update methods and results
  5. Update paper repository
  6. Outline the discussion
  7. Share draft paper at the next Eastern Oyster Project Meeting
  8. Write the discussion
  9. Write the introduction
  10. Revise my abstract
  11. Share the draft with collaborators and get feedback
  12. Post the paper on bioRXiv
  13. Prepare the manuscript for publication
Written on June 11, 2019