Killifish Hypoxia RRBS Part 22
Global methylation differences by population
Previously, I showed that methylation levels are lower in Scorton Creek killifish. I wanted to revisit this analysis to make sure that 1) I used appropriate methods and 2) discern why there was less methylation in the Scorton Creek fish.
Revising methods
I returned to this Jupyter notebook for the analysis. One thing I noticed was that the number of CpGs with 10x coverage (5,413,381) was much greater than the number of CpGs in the output file for all populations (14,896). I think this is because BAT_summarize
only uses data for CpGs with coverage across all samples. I want to see if global methylation information differs when I allow for missing data.
To do this, I first created a union bedGraph:
#Create a union bedGraph
#Use N/A when there is no data for a CpG in a sample
#Define sample IDs
#Use sorted bedgraphs
#Cound the number of lines (CpGs) with data
!{bedtoolsDirectory}unionBedGraphs \
-header \
-filler N/A \
-names N_20-N4 N_5-N1 N_5-N2 N_20-N2 N_5-N3 N_20-N1 N_OC-N5 N_OC-N1 N_OC-N2 N_OC-N4 S_20-S1 S_20-S3 S_20-S4 S_5-S3 S_5-S4 S_5-S2 S_20-S2 S_5-S1 S_OC-S1 S_OC-S2 S_OC-S3 S_OC-S5 \
-i ../../04-calling/filtered/*sort.bedgraph \
> union_10x.bedgraph
Of 5,413,382 CpGs, 4,339,834 had non-zero methylation. I used pandas
to create a column of average methylation values, then calculated the average percent methylation for every locus and overall average methylation. When I averaged methylation across all loci, I calculated 58.9% of CpGs were methylated. This is much higher than the 20.7% genome methylation when considering CpGs with data in all samples! I don’t know how valid the 58.9% is, since many loci only had coverage in one sample. I’ll stick with the 20.7% figure.
I then revised the percent methylation calculations for each population. Instead of using the common dataset for all samples, I used the dataset with common CpGs for either New Bedford or Scorton Creek.
*Table 1. Revised methylation landscape information calculated using all common CpGs for each specific contrast.
Contrast | Methylated CpGs (%) | Unmethylated CpGs (%) | Average Methylation |
---|---|---|---|
All samples | 7275 (48.8%) | 7620 (51.2%) | 20.7% |
All NBH | 116821 (66.3%) | 59465 (33.7%) | 21.2% |
All SC | 57338 (69.7%) | 24967 (30.3%) | 16.3% |
Hypoxic NBH | 96857 (54.9%) | 79429 (45.1%) | 22.1% |
Normoxic NBH | 93895 (53.3%) | 82391 (46.7%) | 20.4% |
Hypoxic SC | 22694 (27.6%) | 59611 (72.4%) | 16.4% |
Normoxic SC | 45661 (55.5%) | 36644 (44.5%) | 16.2% |
Methylation differences
Based on this table, I think it’s interesting that there are less methylated CpGs in hypoxic Scorton Creek samples, but the average methylation remains constant. This makes me think that there are more highly methylated CpGs in hypoxic conditions. I wonder if intermediately methylated CpGs in normoxia are the ones that are more dynamic in hypoxic conditions.
I decided to plot methylation differences by population and treatment in this R Markdown script. I made density plots showing methylation and methylation difference for each population. I used scale_color_manual
to add a legend based on advice from this post, and I used cowplot
to create an inset based on this post.
Figure 1. Density plot of percent methylation for each population x treatment combination
Figure 2. Density plot of methylation difference between treatments for each population
I then thought it would be good to have a circos plot showing methylation across some chromosomes for each population and treatment. I picked two chromosomes with the most methylated positions for each population. I used this link to learn how to use circlize
and reformat the plot. However, I ran into an issue where some points were being plotted outside of the sector for each chromosome. I looked at the circlize
manual, old Github issue, and another issue but couldn’t find a solution that worked for my data. So, I posted my own. I at least was able to start the process of creating a circos plot, and hopefully my issue gets addressed soon!
Figure 3. Circos plot of DNA methylation at four different chromosomes.
Going forward
- Conduct pathway analysis for RNA-Seq data by population
- Identify known SNP/DMR overlaps
- Update methods and results
- Figure out why there is such low methylation with the new genome
- Continue BAT workflow with new genome
- Create OSF repository for all intermediate files