Killifish Hypoxia RRBS Part 28
DMR-DEG overlaps
Picking up again! The next thing I want to do is examine overlaps and correlations between DMR and DEG. I’ll start by using the output from closestBed
.
But first, a sob story. I tried to use bash
to join
two files and it went horribly so instead I created this R Markdown script. I used to be so much better at bash
. Sigh.
The first thing I did was import the DMR and DEG data. A couple of notes:
- There were no DMR identified in any Scorton Creek contrast, so I’m only looking at New Bedford Harbor DMR between different oxygen conditions
- Neel did gene expression analysis with the hypoxia and normoxia conditions vs. outside control for both populations, but he didn’t send similar
edgeR
output looking at DEG between hypoxia and normoxia. Since I have DMR for this contrast in New Bedford Harbor, it may be nice to still get that DEG information.
As I imported the data, I made some minor changes to formatting:
NBH20v0CDMR <- read.delim("20_OC_N_DMR.closestGene.bed", header = FALSE,
col.names = c("chr", "DMR.start", "DMR.end", "DMR", "direction", "gene.chr", "Gnomon", "gene", "gene.start", "gene.end", "V11", "strands", "V13", "V14", "distance")) %>%
separate_wider_delim(., cols = "V14", delim = ";Dbxref", names = c("gene.name", "trash")) %>%
mutate(., gene.name = gsub(x = gene.name, pattern = "ID=gene-", replacement = "")) %>%
dplyr::select(chr, DMR.start, DMR.end, DMR, direction, gene.name, gene.start, gene.end, strands, distance)
#Import DMR data and add column names. Separate column with gene name by a specific delimiter, then remove characters before gene name. Retain only necessary columns.
NBH20vOCDEG <- read.csv("../../../data/NEW-n.NBHvso.NBH.csv") #Import DEG data
colnames(NBH20vOCDEG)[1] <- "gene.name"
head(NBH20vOCDEG)
I then used inner_join
to identify if there was accompanying edgeR
output for my genes of interest:
NBH20v0CDMRDEG <- inner_join(NBH20v0CDMR, NBH20vOCDEG, by = "gene.name") %>%
mutate(contrast = "20_OC_N") #Use inner join to identify DMR that overlap with DEG.
In total, I identified 16 genes between two contrasts that had accompanying edgeR
output . Each contrast had one gene that contained a DMR. The remaining genes are from the New Bedford Harbor 5 vs. outside control contrast. While they came up as beign the closest gene to a DMR, they do not overlap with the DMR. None of these 16 genes are differentially expressed (or even close). The output can be found here.
Going forward
- Update methods and results
- Try
BAT_correlating
- Create DMR figure
- Identify known SNP/DMR overlaps
- Conduct pathway analysis for RNA-Seq data by population
- Update OSF repository for all intermediate files