sPLS-DA:srbct

Overview

The Small Round Blue Cell Tumors dataset from Khan et al., (2001) contains information of 63 samples and 2308 genes. The samples are distributed in four classes as follows: 8 Burkitt Lymphoma (BL), 23 Ewing Sarcoma (EWS), 12 neuroblastoma (NB), and 20 rhabdomyosarcoma (RMS).

Usage in mixOmics

The SRBCT dataset is implemented in mixOmics via srbct, and contains the following:

  • srbct$gene: data frame with 63 rows and 2308 columns. The expression measure of 2308 genes for the 63 subjects.
  • srbct$class: A class vector containing the class tumor of each case (4 classes in total).
  • srbct$gene.name: data frame with 2308 rows and 2 columns containing further information on the genes.

Now, we will see how to analyze srbct by using sPLS-DA. The aim of this analysis is to select the genes that can help predict the class of the samples.

Next: Preliminary analysis with PCA