Usage in mixOmics
The multidrug dataset is implemented in mixOmics via multidrug
, and contains the following:
multidrug$ABC.trans
: data matrix with 60 rows and 48 columns. The expression of the 48 human ABC transporters for the 60 cell lines.multidrug$compound
: data matrix with 60 rows and 1429 columns. The activity of 1429 drugs for the 60 cell lines.multidrug$comp.name
: character vector. The names or the NSC No. of the 1429 compounds.multidrug$cell.line
: a list containing two character vector components:Sample
the names of the 60 cell line which were analysed, andClass
the phenotypes of the 60 cell lines.
Now, let’s see how to analyse one of the data set from multidrug
using PCA:
data(multidrug) X <- multidrug$ABC.trans # Data set to explore dim(X) # [1] 60 48 -> 60 samples and 48 transporters sum(is.na(X)) ## [1] 1 -> one missing value, the PCA will use NIPALS instead of SVD decomposition ## Let's determine how many dimensions ncomp should be. If p is large, ## this may take some time to compute pcatune(X, center = TRUE, scale. = FALSE) ## output obtained: Eigenvalues for the first 10 principal components: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 25.178264 19.431011 17.495782 14.916417 12.650136 9.397698 8.783808 8.156746 6.695802 PC10 6.285582 Estimated proportion of explained variance for the first 10 principal components: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 0.14178774 0.10942291 0.09852496 0.08399964 0.07123740 0.05292177 0.04946474 0.04593353 PC9 PC10 0.03770644 0.03539634 Estimated cumulative proportion explained variance for the first 10 principal components: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 0.1417877 0.2512107 0.3497356 0.4337352 0.5049726 0.5578944 0.6073592 0.6532927 0.6909991 PC10 0.7263955
The pcatune
function will also generate a scree plot to represent the explained variance on each principal component. Note that in the case of missing values, the maximum number of principal components used for tuning will be min(p, n), this is what is represented on the scree plot.
In this specific case, the barplot seems to indicate that after 5 principal components, there is a drop in the amount of explained variance. However, this is up to the user to choose the number of principal components ncomp
. for the ease of interpretation, we set ncomp = 3
in the remaining analysis.