DML Analysis
Finding overlaps between DMLs and other things
In this Jupyter notebook, I used intersectBed
from bedtools
to find overlaps between DMLs I identified in methylKit
! I found the Genome Feature Tracks in the Genomic Resources wiki page. There were four feature tracks I could use:
- Exons: The coding regions
- Introns: Regions that are removed
- mRNA: The things that make proteins!
- CG motifs: Regions with CGs where methylation can occur
I used the following command to count the number of overlaps between my DMLs and the various feature tracks:
! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ #Path to program
-u \ #Just report the fact that one overlap was found in a region
-a ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed \ #Path to DML bed file
-b {} \ #Path to feature track
| wc -l #Count lines
I then used a similar code to write out the overlaps in a .txt document:
! /Users/Shared/bioinformatics/bedtools2/bin/intersectBed \ #Path to program
-wb \ #Write the entry information from File B
-a ../2018-05-29-MethylKit-Full-Samples/2018-05-30-DML-Locations.bed \ #Path to DML bed file
-b {} \ #Path to feature track
> 2018-06-11-DML-{}.txt #New .txt file created
I also found and counted overlaps between CG motifs and exons, introns, and mRNAs! The reason I did this was because if most of the CG motifs are in the exon regions, then we’d expect to find most of the methylation in the exons. All of my .txt files that weren’t too big for Github can be found in this folder.
One thing I noticed is that I found more overlaps than Steven did for all categories. My guess is that the default score_min
option I used lead to more overlaps. It will be interesting to look into the differences. Another important thing to mention is that there are more exon overlaps than mRNA overlaps. This does not make sense to me, since exons are what code for mRNA. I’ll post an issue with this question.
I think my next step has to do with gene ontology, but I’m not entirely sure what that entails. Guess I’ll post another issue.