MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters

Meng Zhang, Fuyi Li, Tatiana T. Marquez-Lago, Andre Leier, Cunshuo Fan, Chee Keong Kwoh, Kuo-Chen Chou, Jiangning Song, Cangzhi Jia

Research output: Contribution to journalArticleResearchpeer-review

13 Citations (Scopus)

Abstract

MOTIVATION: Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS: In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION: The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Article numberbtz016
Pages (from-to)2957-2965
Number of pages9
JournalBioinformatics
Volume35
Issue number17
DOIs
Publication statusPublished - 1 Sep 2019

Cite this

Zhang, M., Li, F., Marquez-Lago, T. T., Leier, A., Fan, C., Kwoh, C. K., ... Jia, C. (2019). MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 35(17), 2957-2965. [btz016]. https://doi.org/10.1093/bioinformatics/btz016
Zhang, Meng ; Li, Fuyi ; Marquez-Lago, Tatiana T. ; Leier, Andre ; Fan, Cunshuo ; Kwoh, Chee Keong ; Chou, Kuo-Chen ; Song, Jiangning ; Jia, Cangzhi . / MULTiPly : a novel multi-layer predictor for discovering general and specific types of promoters. In: Bioinformatics. 2019 ; Vol. 35, No. 17. pp. 2957-2965.
@article{b3d785a344f747e29aa897fdbfe3c798,
title = "MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters",
abstract = "MOTIVATION: Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS: In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION: The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Meng Zhang and Fuyi Li and Marquez-Lago, {Tatiana T.} and Andre Leier and Cunshuo Fan and Kwoh, {Chee Keong} and Kuo-Chen Chou and Jiangning Song and Cangzhi Jia",
year = "2019",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/btz016",
language = "English",
volume = "35",
pages = "2957--2965",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press, USA",
number = "17",

}

Zhang, M, Li, F, Marquez-Lago, TT, Leier, A, Fan, C, Kwoh, CK, Chou, K-C, Song, J & Jia, C 2019, 'MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters', Bioinformatics, vol. 35, no. 17, btz016, pp. 2957-2965. https://doi.org/10.1093/bioinformatics/btz016

MULTiPly : a novel multi-layer predictor for discovering general and specific types of promoters. / Zhang, Meng; Li, Fuyi; Marquez-Lago, Tatiana T.; Leier, Andre; Fan, Cunshuo; Kwoh, Chee Keong; Chou, Kuo-Chen; Song, Jiangning; Jia, Cangzhi .

In: Bioinformatics, Vol. 35, No. 17, btz016, 01.09.2019, p. 2957-2965.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - MULTiPly

T2 - a novel multi-layer predictor for discovering general and specific types of promoters

AU - Zhang, Meng

AU - Li, Fuyi

AU - Marquez-Lago, Tatiana T.

AU - Leier, Andre

AU - Fan, Cunshuo

AU - Kwoh, Chee Keong

AU - Chou, Kuo-Chen

AU - Song, Jiangning

AU - Jia, Cangzhi

PY - 2019/9/1

Y1 - 2019/9/1

N2 - MOTIVATION: Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS: In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION: The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS: In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION: The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85062570672&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz016

DO - 10.1093/bioinformatics/btz016

M3 - Article

C2 - 30649179

AN - SCOPUS:85062570672

VL - 35

SP - 2957

EP - 2965

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

M1 - btz016

ER -

Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK et al. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics. 2019 Sep 1;35(17):2957-2965. btz016. https://doi.org/10.1093/bioinformatics/btz016