Periscope: Quantitative prediction of soluble protein expression in the periplasm of Escherichia coli

Catherine Ching Han Chang, Chen Li, Geoff I. Webb, BengTi Tey, Jiangning Song, Ramakrishnan Nagasundara Ramanan

Research output: Contribution to journalArticleResearchpeer-review

8 Citations (Scopus)

Abstract

Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson’s correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
Original languageEnglish
Article number21844
Number of pages11
JournalScientific Reports
Volume6
DOIs
Publication statusPublished - 2 Mar 2016

Keywords

  • computational models
  • proteins

Cite this

@article{39d66da238b148a99d1f6fcb350ccbca,
title = "Periscope: Quantitative prediction of soluble protein expression in the periplasm of Escherichia coli",
abstract = "Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78{\%} and a Pearson’s correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.",
keywords = "computational models, proteins",
author = "Chang, {Catherine Ching Han} and Chen Li and Webb, {Geoff I.} and BengTi Tey and Jiangning Song and Ramanan, {Ramakrishnan Nagasundara}",
year = "2016",
month = "3",
day = "2",
doi = "10.1038/srep21844",
language = "English",
volume = "6",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

Periscope : Quantitative prediction of soluble protein expression in the periplasm of Escherichia coli. / Chang, Catherine Ching Han; Li, Chen; Webb, Geoff I.; Tey, BengTi; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara.

In: Scientific Reports, Vol. 6, 21844, 02.03.2016.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Periscope

T2 - Quantitative prediction of soluble protein expression in the periplasm of Escherichia coli

AU - Chang, Catherine Ching Han

AU - Li, Chen

AU - Webb, Geoff I.

AU - Tey, BengTi

AU - Song, Jiangning

AU - Ramanan, Ramakrishnan Nagasundara

PY - 2016/3/2

Y1 - 2016/3/2

N2 - Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson’s correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.

AB - Periplasmic expression of soluble proteins in Escherichia coli not only offers a much-simplified downstream purification process, but also enhances the probability of obtaining correctly folded and biologically active proteins. Different combinations of signal peptides and target proteins lead to different soluble protein expression levels, ranging from negligible to several grams per litre. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier determines which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson’s correlation coefficient (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.

KW - computational models

KW - proteins

U2 - 10.1038/srep21844

DO - 10.1038/srep21844

M3 - Article

VL - 6

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 21844

ER -