TY - JOUR
T1 - Global comparison of multiple-segmented viruses in 12-dimensional genome space
AU - Huang, Hsin Hsiung
AU - Yu, Chenglong
AU - Zheng, Hui
AU - Hernandez, Troy
AU - Yau, Shek Chung
AU - He, Rong Lucy
AU - Yang, Jie
AU - Yau, Stephen S.T.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - We have recently developed a computational approach in a vector space for genome-based virus classification. This approach, called the "Natural Vector (NV) representation", which is an alignment-free method, allows us to classify single-segmented viruses with high speed and accuracy. For multiple-segmented viruses, typically phylogenetic trees of each segment are reconstructed for discovering viral phylogeny. Consensus tree methods may be used to combine the phylogenetic trees based on different segments. However, consensus tree methods were not developed for instances where the viruses have different numbers of segments or where their segments do not match well. We propose a novel approach for comparing multiple-segmented viruses globally, even in cases where viruses contain different numbers of segments. Using our method, each virus is represented by a set of vectors in R12. The Hausdorff distance is then used to compare different sets of vectors. Phylogenetic trees can be reconstructed based on this distance. The proposed method is used for predicting classification labels of viruses with n-segments (n≥1). The correctness rates of our predictions based on cross-validation are as high as 96.5%, 95.4%, 99.7%, and 95.6% for Baltimore class, family, subfamily, and genus, respectively, which are comparable to the rates for single-segmented viruses only. Our method is not affected by the number or order of segments. We also demonstrate that the natural graphical representation based on the Hausdorff distance is more reasonable than the consensus tree for a recent public health threat, the influenza A (H7N9) viruses.
AB - We have recently developed a computational approach in a vector space for genome-based virus classification. This approach, called the "Natural Vector (NV) representation", which is an alignment-free method, allows us to classify single-segmented viruses with high speed and accuracy. For multiple-segmented viruses, typically phylogenetic trees of each segment are reconstructed for discovering viral phylogeny. Consensus tree methods may be used to combine the phylogenetic trees based on different segments. However, consensus tree methods were not developed for instances where the viruses have different numbers of segments or where their segments do not match well. We propose a novel approach for comparing multiple-segmented viruses globally, even in cases where viruses contain different numbers of segments. Using our method, each virus is represented by a set of vectors in R12. The Hausdorff distance is then used to compare different sets of vectors. Phylogenetic trees can be reconstructed based on this distance. The proposed method is used for predicting classification labels of viruses with n-segments (n≥1). The correctness rates of our predictions based on cross-validation are as high as 96.5%, 95.4%, 99.7%, and 95.6% for Baltimore class, family, subfamily, and genus, respectively, which are comparable to the rates for single-segmented viruses only. Our method is not affected by the number or order of segments. We also demonstrate that the natural graphical representation based on the Hausdorff distance is more reasonable than the consensus tree for a recent public health threat, the influenza A (H7N9) viruses.
KW - Natural graphical representation
KW - Natural vector
KW - Nucleotide sequence
KW - Phylogeny
UR - http://www.scopus.com/inward/record.url?scp=84925283361&partnerID=8YFLogxK
U2 - 10.1016/j.ympev.2014.08.003
DO - 10.1016/j.ympev.2014.08.003
M3 - Article
C2 - 25172357
AN - SCOPUS:84925283361
SN - 1055-7903
VL - 81
SP - 29
EP - 36
JO - Molecular Phylogenetics and Evolution
JF - Molecular Phylogenetics and Evolution
ER -