MethCompare Part 18
Formatting individual sample information
In this issue we’ve discussed how to revise our CpG methylation status and genomic location statistical analyses. We know we want to compare proportoins while investigating if sequencing method affects the proportion of different methylation statuses or in genomic locations. I posted some suggestions, but in the meantime I thought I could obtain individual-level proportion data.
Thankfully for me, most of the pipeline was already set up! In this Jupyter notebook I counted CpGs for each methylation status and in various genomic features. The only things I needed to modify were ensuring I used bedtools -u
, adding code for upstream and downstream flank overlaps, and adding the path to the explicit intragenic region tracks. I took the output files (line counts) and used them in this R Markdown script and used them to create summary tables:
M. capitata:
- Methylation status table
- Methylation status exploratory figure
- Genomic location counts
- Genomic location percents
P. acuta:
- Methylation status table
- Methylation status exploratory figure
- Genomic location counts
- Genomic location percents
Going forward
- Conduct statistical analysis
- Locate TE tracks
- [Characterize intersections between data and TE, and create summary tables]
- Look into program for mCpG identification