Assessment of discriminant models in infrared imaging using constrained repeated random sampling - Cross validation

David Perez Guaita, Julia Kuligowski, Bernhard Lendl, Bayden R Wood, Guillermo Quintás

Research output: Contribution to journalArticleResearchpeer-review

7 Citations (Scopus)


Infrared (IR) imaging is an emerging and powerful approach for studying the molecular composition of cells and tissues. It is a non-destructive and phenotypic technique which combines label-free molecular specific information from cells and tissues provided by IR with spatial resolution, offering great potential in biochemical and biomedical research and routine applications. The application of multivariate discriminant analysis using bilinear models such as Partial Least Squares-Discriminant Analysis (PLS-DA) to IR images requires to unfold the spatial directions in a two-way matrix, resulting in a loss of spatial information and structure. In this article, first we evidence that internal validation methods such as repeated k-fold cross-validation (CV) can be overly optimistic when the pixel size of the image is lower than the lateral spatial resolution. Secondly, we propose a new approach for the unbiased internal evaluation of the model performance named COnstrained Repeated Random Subsampling-Cross Validation(CORRS-CV). This method is based on the generation of q training and test sub-sets using a constrained random sampling of n training pixels without replacement and it circumvents overly optimistic effects due to oversampling, providing more accurate and robust images. The approach can be applied in IR microscopy for the development of discriminant models to analyse underlying biochemical differences associated to anatomical and histopathological features in cells and tissues.
Original languageEnglish
Pages (from-to)156-164
Number of pages9
JournalAnalytica Chimica Acta
Publication statusPublished - 2018


  • Infrared hyperspectral imaging
  • Constrained repeated random sampling - cross validation
  • Partial least squares-discriminant analysis
  • Oversampling
  • Cross validation

Cite this