Green Crab Experiment Part 30

(Somewhat) finalized pathway analysis methods

After attending SEB and talking to Ariana, I had a much better idea of how to approach my metabolomics pathway analysis. The goal was to do enough programmatic analytical things so I could have a bit more guidance before digging into various metabolite functions. All analyses were conducted in this R Markdown script.

VIP in modules

My first instinct was to determine which VIP were present in the WCNA modules. I figured I already had metabolites sorted into modules by abundance, so perhaps VIP in the same module would be involved in similar pathways:

VIP_module0 <- rbind(t.test_13v30_day22 %>%
                       mutate(., comparison = "13v30_day22") %>%
                       filter(., p.adj < 0.05) %>%
                       filter(., metabolite %in% module0_metabolites$Metabolite),
                     t.test_13v5_day22 %>%
                       mutate(., comparison = "13v5_day22") %>%
                       filter(., p.adj < 0.05) %>%
                       filter(., metabolite %in% module0_metabolites$Metabolite),
                     t.test_5_day3v22 %>%
                       mutate(., comparison = "5_day3v22") %>%
                       filter(., p.adj < 0.05) %>%
                       filter(., metabolite %in% module0_metabolites$Metabolite),
                     t.test_30_day3v22 %>%
                       mutate(., comparison = "30_day3v22") %>%
                       filter(., p.adj < 0.05) %>%
                       filter(., metabolite %in% module0_metabolites$Metabolite)) %>%
  mutate(module = "module0") #Identify VIP present in module 0 across all pairwise comparisons

Initially, I thought I would identify VIP from all pairwise comparisons in each module. However, certain modules had significant correlations with various treatment x day conditions. I then decided to only include VIP from pairwise comparisons that reflected these significant correlations:

  • Module 0: 5 and 30 at day 22
  • Module 2: 5 at day 3 and 30 at day 22
  • Module 3: 5 and 30 at day 22
  • Module 4: 30 at a day 22
  • Module 5: 5 at day 22, 30 at day 22, 30 at day 3
  • Module 6: 13 at day 22
  • Module 7: 13 and 5 at day 22

Since modules 1 and 8 were not significantly correlated with any treatment x day conditions, I decided not to check them for VIP. The list of VIP in each module can be found here. My next step was to make heatmaps for the VIP in each module:

#pdf("metaboanalyst/figures/module0_VIP.pdf", height = 8.5, width = 11)
pheatmap((transMetabData %>%
            dplyr::select(., c(crab.ID, any_of(VIP_module0$metabolite))) %>%
            column_to_rownames(var = "crab.ID") %>%
            t(.) %>% as.data.frame(.) %>% log(.)), cluster_row = TRUE, show_rownames = TRUE, cluster_cols = FALSE, show_colnames = FALSE, color = heatmapColors, annotation_col = metabolomicsAnnotation, annotation_colors = metabolomicsColors, name = "Abundance")
#dev.off()

Uh…..so these were raelly messy. I then looked at Ariana’s code from her coral metabolomics project and realized she created heatmaps with metabolite information aggregated by life stage. I decided to do something similar by averaging metabolite abundance across each treatment x day condition. Through this process, I learned how to use any_of within dplyr::select from this link.

#pdf("metaboanalyst/figures/module0_VIP-average.pdf", height = 8.5, width = 11)
pheatmap((transMetabData %>%
            dplyr::select(treatment_day, any_of(VIP_module0$metabolite)) %>%
            group_by(., treatment_day) %>%
            summarize_all(funs(mean(., na.rm = TRUE))) %>%
            column_to_rownames(var = "treatment_day") %>%
            t(.) %>% log(.)), cluster_row = TRUE, show_rownames = TRUE, cluster_cols = TRUE, show_colnames = FALSE, scale = "row", color = heatmapColors, annotation_col = metabolomicsAvgAnnotation, annotation_colors = metabolomicsColors, name = "Abundance")
#dev.off()

This looked much better! I dug through Ariana’s code more and saw that she plotted z-score in her heatmaps. To do something similar, I log transformed then scaled all values, using across with mutate (s/o to this link) to perform these operations on numerical columns:

#pdf("metaboanalyst/figures/module0_VIP-zscore.pdf", height = 8.5, width = 11)
pheatmap((transMetabData %>%
            dplyr::select(treatment_day, any_of(VIP_module0$metabolite)) %>%
            group_by(., treatment_day) %>%
            mutate(across(where(is.numeric), log)) %>%
            mutate(across(where(is.numeric), scale)) %>%
            group_by(., treatment_day) %>%
            summarize_all(funs(mean(., na.rm = TRUE))) %>%
            column_to_rownames(var = "treatment_day") %>%
            t(.)), cluster_row = TRUE, show_rownames = TRUE, cluster_cols = FALSE, show_colnames = FALSE, scale = "row", color = heatmapColors, annotation_col = metabolomicsAvgAnnotation, annotation_colors = metabolomicsColors, name = "Z-Score")
#dev.off()

All of the module-specific heatmaps can be found in this folder. I think the z-score plots look the cleanest. Clustering columns doesn’t make sense if I’m trying to compare heatmaps from different modules. I don’t think the module heatmaps are very useful in the text, but having these lists will be nice for when I am writing.

All VIP

When I met with Ariana, she mentioned that she had shifted to fully parsing through VIP instead of using any WCNA or MetaboAnalyst output. I think there’s a happy medium to both approaches, and I set off to find it.

The first thing I did was make heatmaps with my VIP:

Screenshot 2024-08-29 at 3 23 17 PM

Figure 1. Average metabolite abundance across all samples for VIP

Screenshot 2024-08-29 at 3 22 35 PM

Figure 2. Heatmap of z-scores for all VIP

Once again, I found the plot with z-scores averaged across treatment x day combination to be much easier to interpret. After making the figures, I decided to retry MetaboAnalyst. At SEB, everyone who was doing metabolomics analysis was using MetaboAnalyst. They liked the MetaboAnalyst workflow becuase the GUI was easy to navigate. None of them ran into issues with metabolite names not being recognized, but I wonder if this is because they were all doing targeted metabolomics. Additionally, I got a response to my OmicsForum post stating that I was attempting to use an enrichment analysis tool designed primarily for humans. I decided to dig a bit more into these methods to figure out if there was something useful I could get from MetaboAnalyst. I settled on the following workflow:

  • Uploaded list of metabolite compound names for significant VIP in the Pathway Analysis module
  • Matched to KEGG pathways from Strongylocentrotus purpuratus

Since the Pathway Analysis module integrates “enrichment analysis and pathway topology analysis,” I did not conduct any other analyses. A few compounds did not have matching annotations (name map here). The following four pathways had significant FDR:

  1. Valine, leucine, and isoleucine biosynthesis
  2. Arginine biosynthesis
  3. Glutathione metabolism
  4. Alaine, aspartate, and glutamate metabolism

The statistical output is in this table. Columns include pathway impact, which is the output of the pathway topology analysis, and p-values associated with enrichment. The pathway impact takes into account the position of metabolites in a given pathway (ie. node, bottleneck, or terminal metabolite). Maximum importance of a pathway can be 1. I manually calculated the enrichment ratio as well. I could also download pathway diagrams for the significant pathways. In the diagrams (found here), the VIP are in red. I then set out to make some figures akin to what you would get from the Pathway and Enrichment Analysis modules.

Screenshot 2024-08-29 at 3 24 44 PM

Figure 3. Pathway impact vs. -log10 enrichment p-value

Screenshot 2024-08-29 at 3 24 53 PM

Screenshot 2024-08-29 at 3 25 48 PM

Figures 4-5. Various figures showing enrichment ratio and FDR for the top 25 pathways

This version of the MetaboAnalyst analysis gives me a couple of pathways to consider when taking the whole dataset into account. I identified VIP in any of those pathways are present in WCNA modules and saved the output here.

TL;DR, I now liberally use %in% and I found the way MetaboAnalyst makes sense.

Going forward

  1. Update methods
  2. Update results
  3. Repeat lipidomics WCNA with temperature and day separately
  4. Repeat lipidomics WCNA with all lipid data
  5. Try lipid-specific enrichment with LION
  6. Determine if I can add physiology data to mixOmics analyses
  7. Integrate metabolomics and lipidomics data
Written on August 29, 2024