Uncovering structure-property relationships of materials by subgroup discovery

Bryan R. Goldsmith, Mario Boley, Jilles Vreeken, Matthias Scheffler, Luca M. Ghiringhelli

Research output: Contribution to journalArticleResearchpeer-review

69 Citations (Scopus)


Subgroup discovery (SGD) is presented here as a data-mining approach to help find interpretable local patterns, correlations, and descriptors of a target property in materials-science data. Specifically, we will be concerned with data generated by density-functional theory calculations. At first, we demonstrate that SGD can identify physically meaningful models that classify the crystal structures of 82 octet binary (OB) semiconductors as either rocksalt or zincblende. SGD identifies an interpretable two-dimensional model derived from only the atomic radii of valence s and p orbitals that properly classifies the crystal structures for 79 of the 82 OB semiconductors. The SGD framework is subsequently applied to 24 400 configurations of neutral gas-phase gold clusters with 5-14 atoms to discern general patterns between geometrical and physicochemical properties. For example, SGD helps find that van der Waals interactions within gold clusters are linearly correlated with their radius of gyration and are weaker for planar clusters than for nonplanar clusters. Also, a descriptor that predicts a local linear correlation between the chemical hardness and the cluster isomer stability is found for the even-sized gold clusters.

Original languageEnglish
Article number013031
Number of pages14
JournalNew Journal of Physics
Publication statusPublished - Jan 2017
Externally publishedYes


  • big-data analytics
  • data mining
  • gold clusters
  • machine learning
  • octet binary semiconductors
  • pattern discovery

Cite this