Usage in mixOmics
The multidrug dataset is implemented in mixOmics via multidrug, and contains the following:
multidrug$ABC.trans: data matrix with 60 rows and 48 columns. The expression of the 48 human ABC transporters for the 60 cell lines.multidrug$compound: data matrix with 60 rows and 1429 columns. The activity of 1429 drugs for the 60 cell lines.multidrug$comp.name: character vector. The names or the NSC No. of the 1429 compounds.multidrug$cell.line: a list containing two character vector components:Samplethe names of the 60 cell line which were analysed, andClassthe phenotypes of the 60 cell lines.
Now, let’s see how to analyse one of the data set from multidrug using PCA:
data(multidrug)
X <- multidrug$ABC.trans # Data set to explore
dim(X) # [1] 60 48 -> 60 samples and 48 transporters
sum(is.na(X))
## [1] 1 -> one missing value, the PCA will use NIPALS instead of SVD decomposition
## Let's determine how many dimensions ncomp should be. If p is large,
## this may take some time to compute
pcatune(X, center = TRUE, scale. = FALSE)
## output obtained:
Eigenvalues for the first 10 principal components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
25.178264 19.431011 17.495782 14.916417 12.650136 9.397698 8.783808 8.156746 6.695802
PC10
6.285582
Estimated proportion of explained variance for the first 10 principal components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
0.14178774 0.10942291 0.09852496 0.08399964 0.07123740 0.05292177 0.04946474 0.04593353
PC9 PC10
0.03770644 0.03539634
Estimated cumulative proportion explained variance for the first 10 principal components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
0.1417877 0.2512107 0.3497356 0.4337352 0.5049726 0.5578944 0.6073592 0.6532927 0.6909991
PC10
0.7263955

The pcatune function will also generate a scree plot to represent the explained variance on each principal component. Note that in the case of missing values, the maximum number of principal components used for tuning will be min(p, n), this is what is represented on the scree plot.
In this specific case, the barplot seems to indicate that after 5 principal components, there is a drop in the amount of explained variance. However, this is up to the user to choose the number of principal components ncomp. for the ease of interpretation, we set ncomp = 3 in the remaining analysis.