Optimal machine learning models for robust materials classification using ToF-SIMS data

Robert M.T. Madiona, David A. Winkler, Benjamin W. Muir, Paul J. Pigram

Research output: Contribution to journalArticleResearchpeer-review

25 Citations (Scopus)


Surface interactions largely control how biomaterials interact with biology, and how other materials function in industrial applications. Surface analysis methods are therefore very important in understanding the molecular properties of materials surfaces, and in establishing mechanisms and design rules for new materials. Surface analysis instrumentation is developing at a rapid rate, generating data of unprecedented accuracy and quantity. However, computational methods for extracting knowledge from these data are lagging far behind, with simple, linear PCA methods being used most commonly. Here we shown how nonlinear machine learning methods can be used to very effectively and rapidly analyse large and complex surface science (ToF-SIMS) data sets and how parameters used to generate these nonlinear classification models can be optimized. We show that coarse-grained representations of mass spectra coupled with relatively small self-organized map sizes provide surprisingly good performance in analysing spectra of closely related materials. Although finer-grained mass spectral representations perform better, they only do so with larger map sizes due to the increase in noise or less relevant signals in the data matrices used to train the machine learning models. These methods promise faster, easier, and more accurate analysis of the increasingly large and complex surface science data sets that are appearing at an accelerating rate.

Original languageEnglish
Pages (from-to)773-783
Number of pages11
JournalApplied Surface Science
Publication statusPublished - 1 Sept 2019


  • Materials informatics
  • Multivariate analysis (MVA)
  • Self-organising maps (SOMs)
  • Time-of-flight secondary ion mass spectrometry (ToF-SIMS)

Cite this