CEABiGR

Genomic locations of DML

I did a first pass on DML genomic location in this notebook a while back, but at the end of a CEABiGR meeting I realized I needed to remove SNPs from the DML. To do this, I first redownloaded the BS-SNPer output from from gannet in this notebook. In this notebook, I then concatenated all C->T SNPs and identified unique SNPs to create a C->T SNP genome feature track. I used subtractBed to remove C->T SNPs from the putative female and male DML in this notebook, then determined the genomic locations of the DML. I used the DML genome location information to run chi-squared tests and determine if the number of DML in a genome feature differed significantly from the distribution of 10x CpGs.

Table 1. DML-feature overlaps and associated chi-squared P-values

Feature Female DML Female DML Chi-Squared P-value Male DML Male DML Chi-Squared P-value
Total 89 N/A 2916 N/A
Gene 74 (83.1%) N/A 2322 (79.6%) N/A
Exon UTR 8 (9.0%) 0.106115408445601 190 (6.5%) 4.93391020603575e-08
CDS 29 (32.6%) 2.13354244672506e-08 815 (27.9%) 1.96810864720674e-131
Intron 37 (41.6%) 0.607674107438834 1332 (45.7%) 1.15727576627379e-15
Upstream Flanks 4 (4.5%) 0.999999999982003 60 (2.1%) 2.01744854185524e-12
Downstream Flanks 4 (4.5%) 0.999999999982003 163 (5.6%) 0.00297688587400269
lncRNA 1 (1.1%) 0.999999999982003 52 (1.8%) 0.772437558180899
TE 1 (1.1%) 0.49408763658884 144 (4.9%) 0.186665654830967
Intergenic 7 (7.9%) 6.03052947450674e-07 279 (9.6%) 7.93581496923608e-147

It’s interesting that only the number of female DML in CDS and intergenic regions differed significantly from 10x CpGs, while the number of male DML in exon UTR, CDS, introns, upstream flanks, downstream flanks, and intergenic differed significantly from 10x CpGs.

Going forward

  1. Update methods and results
  2. Quantify methylation level of each gene for various treatment:sex combinations
  3. Read gene expression variability papers
  4. Continue with gene expression variability analysis
  5. Tune sPLS parameters
  6. Run sex-specific SPLS
  7. Identify drivers
  8. Revise with new expression data from Ariana
Written on July 18, 2022