Usage in mixOmics
data(data.simu); X.simu <- data.simu$X stimulation <- data.simu$stimu repeat.simu <- data.simu$sample result.1level <- multilevel(X.simu, cond = stimulation, sample = repeat.simu, ncomp = 3, keepX = c(200,200,200), tab.prob.gene = NULL, method = 'splsda'); plot3dIndiv(result.1level, col = as.numeric(data.simu$stimu), cex = 0.6); pheatmap.multilevel(result.1level, col_sample=as.numeric(repeat.simu), col_stimulation= unique(as.numeric(stimulation)), label_annotation=NULL, border=FALSE, clustering_method="ward", show_colnames = FALSE, show_rownames = TRUE, fontsize_row=2)
Tuning
A tuning function (‘tune.multilevel’) is proposed to tune the number of variables to select
- either using leave-one-out cross validation for sPLS-DA one factor analysis
- or by maximising the correlation between the latent variables for sPLS-DA two factors analysis or sPLS on the whole data set (applies when there are too many conditions and not enough samples).
# tuning parameters: the number of variables to select # ---- for splsda - one factor: with the simu data result.tune <- tune.multilevel(X.simu, cond = stimulation, sample = repeat.simu, ncomp=2, test.keepX=c(5, 10, 15), already.tested.X = c(50), method = 'splsda', dist = 'mahalanobis.dist', validation = 'loo') For a one-factor analysis, the tuning criterion is based on leave-one-out cross Number of variables selected on the first 1 component(s) was 50 result.tune $error var5 var10 var15 0.1875000 0.2291667 0.1875000
In this above example, 50 variables were already tuned and chosen for the first component. For the second component, one would choose the number of variables for which the estimated error rate is the lowest (here 5 or 15 variables).
Below is an example of sequential tuning (one component at a time) for sPLS-DA two factors analysis (maximising the correlation on the whole data set):
data(liver.toxicity) X.rat = as.matrix(liver.toxicity$gene) repeat.indiv = c(1,2, 1, 2, 1, 2, 1, 2, 3, 3, 4, 3, 4, 3, 4, 4, 5, 6, 5, 5, 6, 5, 6, 7, 7, 8, 6, 7, 8, 7, 8, 8, 9, 10, 9, 10, 11, 9, 9, 10, 11, 12, 12, 10, 11, 12, 11, 12, 13, 14, 13, 14, 13, 14, 13, 14, 15, 16, 15, 16, 15, 16, 15, 16) dose = liver.toxicity$treatment$Dose.Group time = liver.toxicity$treatment$Time.Group dose.time = cbind(dose, time) result.tune = tune.multilevel (X.rat, cond = dose.time, sample = repeat.indiv, ncomp=2, test.keepX=c(5, 10, 15), already.tested.X = c(50), method = 'splsda', ) result.tune $cor.value var5 var10 var15 0.9997513 0.9998078 0.9997054
In this above example, 50 variables were already tuned and chosen for the first component. For the second component, one would choose the number of variables for which the estimated correlation is the highest (here 15 variables).
See also Multilevel:Liver Toxicity case study.
References
Liquet, B., Lê Cao, K.-A., Hocini, H. and Thiebaut, R. A novel approach for biomarker selection and the integration of repeated measures experiments from two platforms.Submitted.