Green Crab Experiment Part 33
Lipidomics WCNA
I previously determined that temperature was the only factor significantly altering lipid profiles in my samples. Based on this information, I needed to fix my WCNA.
Data formatting
The first thing I did in this R Markdown script was reformat my data. I needed samples in columns and the 415 known lipids in rows:
knownTransLipidData %>%
dplyr::select(-c(treatment:treatment)) %>%
column_to_rownames(var = "crab.ID") %>%
t(.) %>% as.data.frame(.) %>%
log(.) %>%
mutate_if(is.numeric, list(~na_if(., -Inf))) %>%
replace(is.na(.), 0) #Take transposed lipid data for known lipids. Remove metadata and set crab ID as row names. Transpose and convert into a dataframe such that samples are columns and lipids are rows. Log-transform data. Convert -Inf to NA, then NA to 0.
I then performed pOverA
filtering. I used a pOverA
of 0.17. This is because I have have 69 samples with a minimum of n = 12 samples per temperature. Since I expect different lipids by temperature (s/o to the PLS-DA), I will accept lipids present in 12/69 = 0.17 of the samples. I also set the minimum value of lipids to 0.01 (median normalized), such that 17% of the samples must have a non-zero normalized metabolite presence in order for the metabolite to remain in the data set. With this filtering, I could keep all 415 known lipid features.
The last formatting step was to check for outliers. I created a clustering dendogram to visually inspect for outliers for lipids:
sampleTree <- hclust(dist(rowwiseLipidData), method = "average") #Perform clustering analysis
plot(sampleTree, main = "Sample clustering to detect outliers", sub = "", xlab = "", cex.lab = 0.8, cex.axis = 0.8, cex.main = 2) #Plot cluster dendogram
Figures 1-2. Dendograms for lipids with and without potential outlier
There is one lipid, PC 34:1 Isomer A, that looks like an outlier. Even though there are a lot of lipids, it’s evident that the clustering patterns don’t change with the removal of the potential outlier. Since the outlier doesn’t impact the other lipid relationships, I opted to keep it in the data.
I did something similar with the samples:
rowwiseLipidData <- t(rowwiseLipidData) %>% as.data.frame(.) #Transpose such that samples are in rows and lipids are in columns.
sampleTree <- hclust(dist(rowwiseLipidData), method = "average") #Perform clustering analysis
#pdf("WCNA/figures/sample-outliers.pdf", height = 8.5, width = 11)
plot(sampleTree, main = "Sample clustering to detect outliers", sub = "", xlab = "", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)
#dev.off()
Figure 3. Sample clustering
304, 303, and 606 looked like they could be outliers (all from 30ºC treatment at day 3). I did this analysis with and without those samples and it did not change the sample clustering, so I kept them in the analysis.
WCNA
With my data formatted, I could proceed to actually running the WCNA. I found that my soft thresholding power did not change at all with the newly formatted data. There were 16 resulting modules. Lipid module assignment can be found here.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 23 136 52 37 25 23 22 20 17 11 10 9 9 8 7 6
After a bunch of data reformatting and correlation calculations, I plotted my module-eigengene correlations.
Figure 4. Module-eigengene correlations
Wow, this is MUCH easier to intepret that the earlier version! Lipid profiles are vastly different at 30ºC than at 5ºC or 13ºC, which matches really well with the PLS-DA. There are some modules that are significantly correlated with 5ºC and seem to have different directionality than 13ºC, so that is an indication of slight differences there. It will be interesting to see if the pairwise 5ºC VIP lipids are in those modules.
Module eigengene abundance patterns
One of the last thing I did for this part of the analysis was plot module eigengene abundance patterns. I formatted and saved a table of module eigengene abundance along with metadata here. I took this table and made a boxplot of eigengene abundance and a dotplot:
mME_meta %>%
ggplot(aes(x = treatment, y = value, colour = treatment)) +
facet_wrap(~ Module) +
ylab("Module Expression") +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey")+
geom_boxplot(width = 0.5, position = position_dodge(width = 0.5), alpha = 0.7) +
geom_point(pch = 21, position = position_dodge(width = 0.5)) +
scale_fill_manual(name = "Temperature (ºC)",
values = rev(plotColors),
labels = c("5ºC", "13ºC", "30ºC")) +
scale_colour_manual(name = "Temperature (ºC)",
values = rev(plotColors),
labels = c("5ºC", "13ºC", "30ºC")) +
xlab("") + #Axis titles
theme_classic(base_size = 15) + theme(axis.text.x = element_text(color = "white"),
axis.ticks.x = element_blank())
mME_meta %>%
ggplot(aes(x = treatment, y = value)) +
facet_grid(~Module) +
ylab("Eigengene Expression") +
geom_hline(yintercept = 0, linetype ="dashed", color = "grey")+
geom_point(aes(fill = treatment, color = treatment), size = 3, pch = 21, position = position_dodge(width = 0.5)) +
geom_smooth(aes(group = 1), se = TRUE, show.legend = NA, color="black") +
scale_fill_manual(name = "Temperature (ºC)",
values = rev(plotColors),
labels = c("5ºC", "13ºC", "30ºC")) +
scale_colour_manual(name = "Temperature (ºC)",
values = rev(plotColors),
labels = c("5ºC", "13ºC", "30ºC")) +
xlab("") + #Axis titles
theme_classic(base_size = 15) + theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom",
legend.text = element_text(color = "black", size = 12),
strip.text.x = element_text(size = 14, color = "black", face="bold"))
Figures 5-6. Boxplot and dotplot of module eigengene abundance for lipids.
I think the temperature-based trends are clearer in the dot plots, but they’ll end up being supplementary figures if anything.
I also created a table of relative lipid abundance in relation to 13ºC in case that information is useful, and saved the table here.
Going forward
- Lipidomics pathway enrichment
- Try lipid-specific enrichment with LION
- Update lipidomics methods
- Update lipidomics results
- Determine if I can add physiology data to
mixOmics
analyses - Integrate metabolomics and lipidomics data