Comparison of linear and nonlinear classification algorithms for the prediction of drug and chemical metabolism by human UDP-glucuronosyltransferase isoforms

Michael J Sorich, John O Miners, Ross A McKinnon, David A Winkler, Frank R Burden, Paul A Smith

Research output: Contribution to journalArticleResearchpeer-review

77 Citations (Scopus)


Partial least squares discriminant analysis (PLSDA), Bayesian regularized artificial neural network (BRANN), and support vector machine (SVM) methodologies were compared by their ability to classify substrates and nonsubstrates of 12 isoforms of human UDP-glucuronosyltransferase (UGT), an enzyme "superfamily" involved in the metabolism of drugs, nondrug xenobiotics, and endogenous compounds. Simple two-dimensional descriptors were used to capture chemical information. For each data set, 70% of the data were used for training, and the remainder were used to assess the generalization performance. In general, the SVM methodology was able to produce models with the best predictive performance, followed by BRANN and then PLSDA. However, a small number of data sets showed either equivalent or better predictability using PLSDA, which may indicate relatively linear relationships in these data sets. All SVM models showed predictive ability (>60% of test set predicted correctly) and five out of the 12 test sets showed excellent prediction (>80% prediction accuracy). These models represent the first use of pattern recognition methods to discriminate between substrates and nonsubstrates of human drug metabolizing enzymes and the first thorough assessment of three classification algorithms using multiple metabolic data sets.

Original languageEnglish
Pages (from-to)2019-2024
Number of pages6
JournalJournal of Chemical Information and Computer Sciences
Issue number6
Publication statusPublished - 2003

Cite this