rCCA:Nutrimouse

Usage in mixOmics

The nutrimouse dataset is implemented in mixOmics via nutrimouse, and contains the following:

  • nutrimouse$gene: data frame with 40 samples and 120 gene expression variables.
  • nutrimouse$lipid: data frame with 40 samples and 21 lipids.
  • nutrimouse$diet: factor of 5 levels containing 40 labels for the diet factor.
  • nutrimouse$genotype: factor of 2 levels containing 40 labels for the diet factor.

Now, we will see how to analyze nutrimouse by using rCCA:

Correlation analysis

Canonical correlations analysis aims at highlighting correlations between two data sets. A first glance at the correlation matrices can be very meaningful. mixOmics proposes two ways to obtain this kind of representation, with either the expansion of the three correlation matrices or with the whole concatenated matrix X and Y as it is done below with the following code:

data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene

imgCor(X, Y, X.names = FALSE, Y.names = FALSE)

This graph provides more insight into the correlation structure of each data set, X and Y, together andseparately, provided that p and q are not too large. Correlation values are represented from blue (negative correlation) to red (positive correlation).

NEXT: rCCA and estimation of the penalization parameters