Killifish Hypoxia RRBS Part 23
Pathway analysis for RNA-Seq data
I’m continuing to dig into the RNA-Seq data so I can understand the results and create figures for the manuscript! For each population, I’m looking at differences between normoxia and hypoxia.
First things first, I wanted to visualize the DEG! Unfortunately, I couldn’t find a file with logCPM information for all the samples. The closest I got was one with 5 samples from each population. I figured this was good enough to mess with
pheatmap code. I created output pdfs just to ensure the code worked, but it’s not worth looking at since I’m pretty sure it’s just samples for hypoxia for each population.
The first thing I wanted to do was determine if there were any common DEG between New Bedford Harbor and Scorton Creek. In this R Markdown document, I used
inner_join to get common gene IDs and logFC for both contrasts. I saved the list of common DEG here and created a heatmap to visualize if there were common DEG that were differentially expressed in opposite directions.
Figure 1. Heatmap of DEG common to both New Bedford Harbor and Scorton Creek fish exposed to hypoxia. Higher logFC indicates higher expression in fish exposed to hypoxia.
There aren’t any genes differentially expressed in opposite directions. Based on the colors in the heatmap, I picked genes with the largest differences in FC:
- Funhe2EKm021334: NADH dehydrogenase subunit
- Funhe2EKm006719: RING finger protein
- Funhe2EKm002577: Class E basic helix loop helix protein 40
- Funhe2EKm039163: neuronal cell death-inducible putative kinase
- Funhe2EKm027081: uncharacterized protein
Before launching into a formal pathway analysis, I wanted to see if any genes from the HIF (related to treatment) or AHR (where the mutations that separate NBH and SC occured) pathways were differentially expressed.
Within the Scorton Creek population, I saw differential expression of HIF-PH1 (Funhe2EKm035892, upregulated) and HIF-PH4 (Funhe2EKm035088, downregulated), p300/CBP-associated factor (Funhe2EKm029335, downregulated), and HIF-1 responsive proteins (Funhe2EKm038583, upregulated). Prolyl hydroxylases are responsible for the first step of HIF degradation and p300/CBP-associated factor is a transcriptional co-activator, indicating regulation of the HIF pathway in low oxygen conditions for sensitive fish. Interestingly, no genes in the HIF pathway were differentially expressed in the tolerant New Bedford Harbor fish.
AHR is important for toxicant response, but also for hypoxia signaling! So I looked into the genes in the AHR pathway to see if there would be any differences between tolerant and sensitive fish. HSP90 (Funhe2EKm038380), CYP1A (Funhe2EKm038981), and TIPARP (Funhe2EKm033539) were all upregulated in sensitive fish exposed to low oxygen. HSP90 is a molecular chaperone that stabilizes HIF1A, and there’s some indication that it’s involved in regulating cell proliferation during hypoxia. TIPARP is important for turning off hypoxia-inducible factor. Again, no genes in the AHR pathway were DEG for New Bedford Harbor fish.
Setting up a pathway analysis
The next step was to set up a pathway analysis. I found the
rWikiPathways package which not only does a gene ontology analysis, but a pathway analysis! Since it requires Ensembl IDs and the killifish genome has those IDs, I thought it would be a good package to try. My first hurdle was installing the packages . The way the documentation is laid out has
rWikiPathways installed prior to
GSEABase, but there are some dependencies for
rWikiPathways block installation of
GSEABase. So I installed
GSEABase separately first, then went through the rest of the installation code in the manual.
The second hurdle (and where I’m currently stuck) is mapping the transcript IDs from RNA-Seq data to Ensembl gene IDs. My RNA-Seq data has transcript IDs that start with “Funhe2EKm”, but the genome files have Ensembl IDs that do not map to those transcript IDs! Neel suggested I use biomart to get Ensembl IDs. However, this tool would give me the Ensembl IDs I already have without a way to map them to the transcripts! I wondered if these IDs were created by STAR when Neel did the mapping. I looked in the Dropbox folder to see if I could find a mapping file, but no luck. I’ll email Neel when he gets back from vacation.
- Map STAR transcript names to Ensembl IDs from genome
- Conduct pathway analysis for RNA-Seq data by population
- Identify known SNP/DMR overlaps
- Update methods and results
- Figure out why there is such low methylation with the new genome
- Continue BAT workflow with new genome
- Create OSF repository for all intermediate files