PCA:Multidrug

Usage in mixOmics

The multidrug dataset is implemented in mixOmics via multidrug, and contains the following:

  • multidrug$ABC.trans: data matrix with 60 rows and 48 columns. The expression of the 48 human ABC transporters for the 60 cell lines.
  • multidrug$compound: data matrix with 60 rows and 1429 columns. The activity of 1429 drugs for the 60 cell lines.
  • multidrug$comp.name: character vector. The names or the NSC No. of the 1429 compounds.
  • multidrug$cell.line: a list containing two character vector components: Sample the names of the 60 cell line which were analysed, and Class the phenotypes of the 60 cell lines.

Now, let’s see how to analyse one of the data set from multidrug using PCA:

data(multidrug)
X <- multidrug$ABC.trans # Data set to explore
dim(X) # [1] 60 48 -> 60 samples and 48 transporters
sum(is.na(X))
## [1] 1 -> one missing value, the PCA will use NIPALS instead of SVD decomposition

## Let's determine how many dimensions ncomp should be. If p is large,
## this may take some time to compute
pcatune(X, center = TRUE, scale. = FALSE)
## output obtained:
Eigenvalues for the first  10 principal components:
      PC1       PC2       PC3       PC4       PC5       PC6       PC7       PC8       PC9
25.178264 19.431011 17.495782 14.916417 12.650136  9.397698  8.783808  8.156746  6.695802
     PC10
 6.285582 

Estimated proportion of explained variance for the first  10 principal components:
       PC1        PC2        PC3        PC4        PC5        PC6        PC7        PC8
0.14178774 0.10942291 0.09852496 0.08399964 0.07123740 0.05292177 0.04946474 0.04593353
       PC9       PC10
0.03770644 0.03539634 

Estimated cumulative proportion explained variance for the first  10 principal components:
      PC1       PC2       PC3       PC4       PC5       PC6       PC7       PC8       PC9
0.1417877 0.2512107 0.3497356 0.4337352 0.5049726 0.5578944 0.6073592 0.6532927 0.6909991
     PC10
0.7263955

The pcatune function will also generate a scree plot to represent the explained variance on each principal component. Note that in the case of missing values, the maximum number of principal components used for tuning will be min(p, n), this is what is represented on the scree plot.

In this specific case, the barplot seems to indicate that after 5 principal components, there is a drop in the amount of explained variance. However, this is up to the user to choose the number of principal components ncomp. for the ease of interpretation, we set ncomp = 3 in the remaining analysis.

NEXT: Run PCA