Surface interactions largely control how biomaterials interact with biology, and how other materials function in industrial applications. Surface analysis methods are therefore very important in understanding the molecular properties of materials surfaces, and in establishing mechanisms and design rules for new materials. Surface analysis instrumentation is developing at a rapid rate, generating data of unprecedented accuracy and quantity. However, computational methods for extracting knowledge from these data are lagging far behind, with simple, linear PCA methods being used most commonly. Here we shown how nonlinear machine learning methods can be used to very effectively and rapidly analyse large and complex surface science (ToF-SIMS) data sets and how parameters used to generate these nonlinear classification models can be optimized. We show that coarse-grained representations of mass spectra coupled with relatively small self-organized map sizes provide surprisingly good performance in analysing spectra of closely related materials. Although finer-grained mass spectral representations perform better, they only do so with larger map sizes due to the increase in noise or less relevant signals in the data matrices used to train the machine learning models. These methods promise faster, easier, and more accurate analysis of the increasingly large and complex surface science data sets that are appearing at an accelerating rate.
- Materials informatics
- Multivariate analysis (MVA)
- Self-organising maps (SOMs)
- Time-of-flight secondary ion mass spectrometry (ToF-SIMS)