Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity

Huilin Wang, Liubin Feng, Geoffrey I. Webb, Lukasz Kurgan, Jiangning Song, Donghai Lin

Research output: Contribution to journalArticleResearchpeer-review

Abstract

X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.

Original languageEnglish
Pages (from-to)838-852
Number of pages15
JournalBriefings in Bioinformatics
Volume19
Issue number5
DOIs
Publication statusPublished - 28 Sep 2018

Keywords

  • structural genomics
  • protein crystallization propensity
  • target selection
  • bioinformatics
  • sequence analysis
  • machine learning

Cite this

@article{ff1b959f7da34a15a1acfdf996a61f37,
title = "Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity",
abstract = "X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.",
keywords = "structural genomics, protein crystallization propensity, target selection, bioinformatics, sequence analysis, machine learning",
author = "Huilin Wang and Liubin Feng and Webb, {Geoffrey I.} and Lukasz Kurgan and Jiangning Song and Donghai Lin",
year = "2018",
month = "9",
day = "28",
doi = "10.1093/bib/bbx018",
language = "English",
volume = "19",
pages = "838--852",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford Univ Press",
number = "5",

}

Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity. / Wang, Huilin; Feng, Liubin; Webb, Geoffrey I.; Kurgan, Lukasz; Song, Jiangning; Lin, Donghai.

In: Briefings in Bioinformatics, Vol. 19, No. 5, 28.09.2018, p. 838-852.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity

AU - Wang, Huilin

AU - Feng, Liubin

AU - Webb, Geoffrey I.

AU - Kurgan, Lukasz

AU - Song, Jiangning

AU - Lin, Donghai

PY - 2018/9/28

Y1 - 2018/9/28

N2 - X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.

AB - X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.

KW - structural genomics

KW - protein crystallization propensity

KW - target selection

KW - bioinformatics

KW - sequence analysis

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85039870567&partnerID=8YFLogxK

U2 - 10.1093/bib/bbx018

DO - 10.1093/bib/bbx018

M3 - Article

VL - 19

SP - 838

EP - 852

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 5

ER -