CEABiGR Part 3
Methylation landscape analysis
We’re updating foundational methods and results for CEABiGR! I’m working on the methylation analysis section, and decided to do my standard methylation landscape characterization for the data we have. I’m going to characterize the methylation landscape for male and female samples separately, since we’re seeing sex-specific methylation and gene expression patterns.
Revising genome feature tracks
But first…I revised the C. virginica genome feature tracks. I made the original genome feature tracks in 2018, but I didn’t make it such that a feature was only included in one category. For example, there are overlaps between flanking regions and intergenic regions, and I think I only used Gnomon annotations. I created this Jupyter notebook to update the way I created C. virginica genome feature tracks. I also pulled the RepeatMasker output from NCBI itself, instead of using the version created by Sam. I re-created the CG motif track as well so the creation of all feature tracks were in one notebook, and counted the overlap between CG motifs and all genome feature tracks.
Table 1. Number of genome features and overlaps with CG motifs
Feature | Number of Unique Features | Overlaps with CG Motifs |
---|---|---|
CG Motifs | 14,458,736 | N/A |
Genes | 38,838 | 7,778,105 |
CDS | 645,368 | 1,728,303 |
Exon | 731,916 | 2,334,303 |
mRNA | 60,201 | 7,507,167 |
lncRNA | 4,750 | 281,715 |
Non-CDS | 337,305 | 12,138,514 |
Intron | 311,341 | 5,497,597 |
Exon UTR | 183,389 | 606,308 |
Upstream Flanks | 34,817 | 694,265 |
Downstream Flanks | 35,224 | 616,684 |
Intergenic Regions | 23,949 | 5,417,334 |
TE | 344,267 | 611,471 |
Methylation landscape
In this Jupyter notebook, I created union bedGraphs for males and females separately. I kept my code for the all-sample union bedGraph since I can’t remember if Katherine used it for any of her analyses. Once I had the union bedGraphs, I counted the number of highly, sparsely, and lowly methylated CpGs in each sample. I also counted the CpGs present in each genomic feature. I created this R Markdown script to create figures and perform chi-squared tests comparing the distribution of CpGs in the C. virginica genome with highly methylated CpGs. As expected, the distribution was significantly different. All output can be found in this gannet
folder, and the relevant count files, statistical output, and figures can be found on Github.
Figures 1-3. Genome feature overlaps for all 10x CpGs with data in at least one sample, highly methylated, moderately methylated, and lowly methylated CpGs for female and male union bedGraphs
Going forward
- Update foundation methods
- Update foundation results
- Revise with new expression data from Ariana
- Tune sPLS parameters
- Run sex-specific SPLS
- Identify drivers