Methods

I want to explore one single data set (e.g. microarray data):

  • I would like to identify the trends or patterns in your data, experimental bias or, identify if your samples ‘naturally’ cluster according to the biological conditions: Principal Component Analysis (PCA)
  • In addition to the above, I would like to select the variables that contribute the most to the variance in the data set: sparse Principal Component Analysis (sPCA)

I want to want to unravel the information contained in two data sets, where two types of variables are measured on the same samples (e.g. metabolomics and transcriptomics data):

  • I would like to know if I can extract common information from the two data sets (or highlight thecorrelation between the two data sets)
    • The total number of variables is less than the number of samples: Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS) canonical mode
    • The total number of variables is greater than the number of samples: regularized Canonical Correlation Analysis (rCCA) or Partial Least Squares (PLS) canonical mode
  • I would like to model a uni-directional relationship between the two data sets, i.e. I would like to predictthe expression of the metabolites (Y) given the expression of transcripts (X)In addition to the above, I would like to select the variables from both data sets that covary (i.e. ‘change together’) across the different conditions: sparse Partial Least Squares (sPLS) with appropriate mode
    • Partial Least Squares (PLS), classic or regression mode

I have one single data set (e.g. microarray data) and I am interested in classifying my samples into known classes:

Here X = expression data and Y = vector indicating the classes of the samples

  • I would like to know how informative my data are to rightly classify my samples, as well as predicting the class of new samples: PLS-Discriminant Analysis (PLS-DA)
  • In addition to the above, I would like to select the variables that help classifying the samples: sparse PLS-DA (sPLS-DA)

I have one single data set (e.g. microarray data) and I have one continuous response variable or outcome for each sample. I would like to predict the response with my data:

Here X = expression data and Y = response vector

  • I would like to model a causal relationship between my data and the response vector and assess how informative my data are to predict such response: PLS-regression mode
  • In addition to the above, I would like to select the variables that best predict the response: sparse PLS-regression mode