Publications

Papers :

  • De Castro, Y., Gadat, S., Marteau, C. and Maugis-Rabusseau, C. (2020)
    Supermix : sparse regularization for mixtures.
     Annals of Statistics, to appear. 
    [Hal-02190117]
  • Godichon-Baggioni, A., Maugis-Rabusseau, C. and Rau, A. (2020)
    Multi-view cluster aggregation and splitting with an application to multi-omic breast cancer data.
    Annals of Applied Statistics, 14(2), 752-767. 
    [Hal-01916941] [maskmeans]
  • Gadat, S. , Kahn, J., Marteau, C. and Maugis-Rabusseau, C. (2019)
    Parameter recovery in two-component contamination mixtures: the L2 strategy 
    Annales de l’IHP, 56 (2), 1391-1418.
    [arXiv:1604.00306] [Hal-01713035]
  • Godichon-Baggioni, A., Maugis-Rabusseau, C. and Rau, A. (2019)
    Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data.
    Journal of Applied Statistics, 46, 47-65.
    [arXiv:1704.06150] [Hal-01511574] [coseq]
  • Celeux, G., Maugis-Rabusseau, C. and Sedki, M. (2019). 
    Variable selection in model-based clustering and discriminant analysis with regularization approach.
    Advances in Data Analysis and Classification,13 (1), 259-278. 
    [Hal-01053784]
  • Laurent, B. Marteau, C. and Maugis-Rabusseau, C. (2018).
    Multidimensional two-component Gaussian mixtures detection.
    Annales de l’IHP (série B), 54(2), 842-865. 
    [arXiv:1509.09129] [Hal-01207072]
  • Rau, A. & Maugis-Rabusseau, C.(2018).
    Transformation and model choice for RNA-seq co-expression analysis. 
    Briefings in Bioinformatics, 19(3), 425-436.
    [bioRxiv, doi: http://dx.doi.org/10.1101/065607] [coseq]
  • G. Rigaill, S. Balzergue, V. Brunaud, E. Blondet, A. Rau, O. Rogier, J. Caius, C. Maugis-Rabusseau, L. Soubigou-Taconnat, S. Aubourg, C. Lurin, M.-L. Martin-Magniette and E. Delannoy(2018)
    Synthetic datasets for the identification of key ingredients for RNA-seq differential analysis.
    Briefings in bioinformatics, 19(1), 65-76.
  • Laurent, B.Marteau, C. and Maugis-Rabusseau, C. (2016). 
    Non asymptotic detection of two component mixtures with unknown means. 
    Bernoulli, Volume 22, Number 1, 242-274.
    [arXiv:1304.6924]
  • Papastamoulis, P., Martin-Magniette, M.-L. and Maugis-Rabusseau, C. (2016).
    On the estimation of mixtures of Poisson regression models with large numbers of components.
    Computational Statistics and Data Analysis, Volume 93, 97–106.
  • Rau, A., Maugis-Rabusseau, C. , Martin-Magniette, M.-L. and Celeux, G. (2015).
    Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31 (9), 1420-1427.
     [Hal-INRIA, RR-7786]            [R package HTSCluster].
  • Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. and Raftery, A. E.(2014). Comparing model selection and regularization approaches to variable selection in model-based clustering. 
    Journal de la SFdS, vol. 155, n°2, pp 57-71.
    [Article]
  • Maugis-Rabusseau, C. and Michel, B. (2013). 
    Adaptive density estimation using finite Gaussian mixtures. 
    ESAIM : P&S, Volume 17, pp 698 – 724.
    [Article]     [arXiv:1103.4253]
  • Maugis-Rabusseau, C., Martin-Magniette, M.-L. and Pelletier, S. (2012). 
    SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values. 
    Journal de la SFdS, vol. 153, n°2, pp 21-36.
    [Article]
  • Baudry, J.-P., Maugis, C. and Michel, B. (2012). 
    Slope Heuristics: overview and implementation. 
    Statistics and Computing, 22(2), p455-470.
    [Article]      [Hal-INRIA, RR-7223]      [package CAPUSHE]
  • Maugis, C. , Celeux, G. and Martin-Magniette, M.-L. (2011). 
    Variable selection in model-based discriminant analysis. 
    Journal of Multivariate Analysis, 102, 1374-1387.
    [Article] [Hal -INRIA, RR-7290]
  • Celeux, G.Martin-Magniette, M.-L., Maugis, C. and Raftery, A.E. (2011). 
    « Letter to the Editor », Journal of the American Statistical Association, 106, 383-383.
  • Maugis, C. and Michel, B. (2011). 
    Data-driven penalty calibration: A case study for Gaussian mixture model selection. 
    ESAIM : P&S, 15, p320-339.
    [Article][Hal-INRIA, RR-6550]
  • Maugis, C. and Michel, B. (2011)
    A non asymptotic penalized criterion for Gaussian mixture model selection. 
    ESAIM : P&S, 15, p41-68.
    [Article][Hal-INRIA, RR-6549] [Erratum]
  • Maugis, C., Martin-Magniette, M.-L., Tamby, J.-P., Renou, J.-P., Lecharny, A., Aubourg, S. and Celeux, G. (2009). 
    Sélection de variables pour la classification par mélanges gaussiens pour prédire la fonction des gènes orphelins. 
    MODULAD, 40.
    [Article]
  • Maugis, C., Celeux, G. and Martin-Magniette, M.-L. (2009). 
    Variable selection in model-based clustering: A general variable role modeling. 
    Computational Statistics and Data Analysis, 53, 3872-3882.
    [Article][hal-INRIA, RR-6744]
  • Maugis, C., Celeux, G. and Martin-Magniette, M.-L. (2009). 
    Variable selection for Clustering with Gaussian Mixture Models. 
    Biometrics, 65, 701-709.
    [Article][Hal-INRIA, RR-6211]

 Books :

  • C. Biernacki & C. Maugis-Rabusseau. Chapter 9: « High-dimensional clustering ». Model choice and model aggregation, sous la direction de F.BERTRAND, J-J. DROESBEKE, G. SAPORTA, C. THOMAS-AGNAN Edition Technip, septembre 2017.
  • Martin-Magniette, M.-L., Maugis-Rabusseau, C. and Rau, A. Chapter 10 : »Clustering of co-expressed genes ». Model choice and Model aggragation, sous la direction de F.BERTRAND, J-J. DROESBEKE, G. SAPORTA, C. THOMAS-AGNAN Edition Technip, septembre 2017.

Preprints :

  • Arabaciyan, S., Saint-Antoine, M., Maugis-Rabusseau, C., Francois, J.-M., Singh, A., Parrou, J.-L. and Capp, J.-P. (2020)
    Insights on the control of yeast single-cell growth variability by members of the Trehalose Phosphate
    Synthase (TPS) complex. Submitted.
  • Meynet, C. and Maugis-Rabusseau, C. (2012). 
    A sparse variable selection procedure in model-based clustering. 
    [Hal-00734316].

PhD-thesis :

I have defended my PhD Thesis on November 21, 2008 at University Paris-Sud 11. My PhD manuscript is available here.

  • Title: Variable selection for model-based clustering. Application for transcriptome data analysis.
  • Advisors: Gilles Celeux (INRIA Saclay Ile-de-France) and Marie-Laure Martin-Magniette (AgroParisTech et URGV, CR1 INRA)
  • Referees: Yannick Baraud (University of Nice Sophia-Antipolis, France) and Mark van der Laan (University of California at Berkeley)
  • Committee: Christophe Ambroise, Sébastien Aubourg,Yannick Baraud, Gilles Celeux, Marie-Laure Martin-Magniette and Pascal Massart
  • Abstract:
    We are interested in variable selection for clustering with Gaussian mixture models. This research is motivated by the clustering of genes described by transcriptome datasets in particular. In the two parts, this problem is regarded as a model selection problem in a model-based cluster analysis framework.
    In the first part, the proposed model, generalizing the one of Raftery and Dean (2006), specifies the variable role for the clustering process. The irrelevant clustering variables can be dependent to a relevant variable subset. Models are compared with a BIC-like criterion. The model identifiability is established and the consistency of the criterion is proved under regularity conditions. In practice, the variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for the clustering and the linear regression. The interest of this procedure is highlighted by a transcriptome dataset application especially. An improvement of the variable role modelling, consisting of partitioning the irrelevant variables according to their dependence or independence with some relevant clustering variables, is suggested to avoid an overpenalization of some models. Finally, the DNA microarray technology generating many missing values, an extension of our variable selection procedure taken into account the existence of missing entries is proposed. It avoids the missing entry imputation usually used in preprocessing.
    In the second part, specific Gaussian mixtures are considered and a non asymptotic penalized criterion is proposed to select the number of mixture components and the relevant clustering variable subset. A general model selection theorem for maximum likelihood estimation, proposed by Massart (2007), is used to obtain the penalty function form. This theorem requires to control the bracketing entropy of studied Gaussian mixture families. This criterion depending on unknown constants, the « slope heuristics » method is carried out to allow the practical use of this criterion.
  • Keywords: Variable Selection, Model-based Clustering, Gaussian Mixtures, Transcriptome Data.