# Hawaii Gigas Methylation Analysis Part 20

## Figures and statistical analyses

It’s your favorite time and mine: figures and stats! These are the last things I need to finalize my existing methods and results sections. Afterwards, I’ll work on an enrichment analysis and dive into the discussion.

### `methylKit`

I opened this R Markdown script to make figures from the `methylKit` data. First, I imported coverage information I made in this Jupyter notebook. Then, I calculated the number of loci with higher or lower coverage in triploids:

``````sum(as.numeric(as.character(ploidyCoverage\$diploid)) < as.numeric(as.character(ploidyCoverage\$triploid)), na.rm = TRUE) #5395386 loci with higher coverage in triploids
sum(as.numeric(as.character(ploidyCoverage\$diploid)) >= as.numeric(as.character(ploidyCoverage\$triploid)), na.rm = TRUE) #6956658 loci with lower or equal coverage in triploids
``````

I then created histograms with coverage distributions for diploids and triploids. The plots themselves aren’t extremely informative because most of the loci are within the first histogram bin. In the future, I need to create a gapped axis. I did something similar and calculated the number of loci with higher or lower methylation in triploids:

``````sum(as.numeric(as.character(ploidyMethylation\$diploid)) < as.numeric(as.character(ploidyMethylation\$triploid)), na.rm = TRUE) #2735188 loci with higher methylation in triploids
sum(as.numeric(as.character(ploidyMethylation\$diploid)) >= as.numeric(as.character(ploidyMethylation\$triploid)), na.rm = TRUE) #6605916 loci with lower or equal coverage in triploids
``````

I created frequency distributions for diploids and triploids, made a PCA of global methylation information, then created a multipanel plot:

Figure 1. Average percent methylation and coverage for diploids and triploids, and PCA of global methylation profiles

### `DSS`

Next, I opened this R Markdown script to make heatmaps for each DML category. I merged the DML location information with the union 1x bedgraph to get sample percent methylation for each DML, then used that in my heatmaps. When I looked at the figures, I didn’t see any clear differential methylation signal between treatments or ploidy. Something to look into…

I also created a figure with the distribution of DML in chromosomes, normalized by the number of CpGs in each chromosome. I also included a line tracking the number of genes in each chromosome. I saved all of the individual plots in this folder. I combined this figure with the three heatmaps in an InDesign document to create a multipanel plot:

Figure 2. Heatmaps for different DML categories, and chromosomal distribution of DML

### Methylation landscape

I delved into understanding the methylation landscape in this R Markdown document. I performed a chi-squared test to understand the difference in CpG distribution in the genome between all CpGs in the genome and highly methylated CpGs, and found that the proportion of CpGs in exon UTR was significantly different between the genome and highly methylated CpGs. I created a stacked barplot to visualize the distributions of all CpGs in the genome, highly methylated CpGs, moderately methylated CpGs, and lowly methylated CpGs.

Figure 3. Proportion of CpGs in various genome features

### Genomic location of DML

I did something similar with the location of DML in the genome in this R Markdown script! I performed chi-squared tests for each DML category against highly methylated CpGs and saved the output (ploidy, pH, and ploidy:pH). I created a stacked barplot with information for each DML category and highly methylated CpGs as well.

Figure 4. Proportion of DML in various genome features

Next steps: finishing analyses and updating the manuscript.

### Going forward

1. Perform enrichment
2. Update methods
3. Update results
4. Outline discussion
5. Write discussion
6. Write introduction
7. Conduct randomization test with `DSS`
8. Try EpiDiverse/snp for SNP extraction from WGBS data
9. Transfer scripts used to a nextflow workflow
Written on June 7, 2021