Sqn2Vec

learning sequence representation via sequential patterns with a gap constraint

Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II
EditorsMichele Berlingerio, Francesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim
Place of PublicationCham Switzerland
PublisherSpringer
Pages569-584
Number of pages16
ISBN (Electronic)9783030109288
ISBN (Print)9783030109271
DOIs
Publication statusPublished - 2019
EventEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases: ECML-PKDD 2018 - Dublin, Ireland
Duration: 10 Sep 201814 Sep 2018
http://www.ecmlpkdd2018.org/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11052
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases: ECML-PKDD 2018
Abbreviated titleECML-PKDD 2018
CountryIreland
CityDublin
Period10/09/1814/09/18
Internet address

Cite this

Nguyen, D., Luo, W., Nguyen, T. D., Venkatesh, S., & Phung, D. (2019). Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint. In M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, & G. Ifrim (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II (pp. 569-584). (Lecture Notes in Computer Science ; Vol. 11052 ). Cham Switzerland: Springer. https://doi.org/10.1007/978-3-030-10928-8_34
Nguyen, Dang ; Luo, Wei ; Nguyen, Tu Dinh ; Venkatesh, Svetha ; Phung, Dinh. / Sqn2Vec : learning sequence representation via sequential patterns with a gap constraint. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II. editor / Michele Berlingerio ; Francesco Bonchi ; Thomas Gärtner ; Neil Hurley ; Georgiana Ifrim. Cham Switzerland : Springer, 2019. pp. 569-584 (Lecture Notes in Computer Science ).
@inproceedings{ffe58fc093a64f2bb6f52cbd6e4b2ab3,
title = "Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint",
abstract = "When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.",
author = "Dang Nguyen and Wei Luo and Nguyen, {Tu Dinh} and Svetha Venkatesh and Dinh Phung",
year = "2019",
doi = "10.1007/978-3-030-10928-8_34",
language = "English",
isbn = "9783030109271",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "569--584",
editor = "Michele Berlingerio and Francesco Bonchi and Thomas G{\"a}rtner and Neil Hurley and Georgiana Ifrim",
booktitle = "Machine Learning and Knowledge Discovery in Databases",

}

Nguyen, D, Luo, W, Nguyen, TD, Venkatesh, S & Phung, D 2019, Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint. in M Berlingerio, F Bonchi, T Gärtner, N Hurley & G Ifrim (eds), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II. Lecture Notes in Computer Science , vol. 11052 , Springer, Cham Switzerland, pp. 569-584, European Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases: ECML-PKDD 2018, Dublin, Ireland, 10/09/18. https://doi.org/10.1007/978-3-030-10928-8_34

Sqn2Vec : learning sequence representation via sequential patterns with a gap constraint. / Nguyen, Dang; Luo, Wei; Nguyen, Tu Dinh; Venkatesh, Svetha; Phung, Dinh.

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II. ed. / Michele Berlingerio; Francesco Bonchi; Thomas Gärtner; Neil Hurley; Georgiana Ifrim. Cham Switzerland : Springer, 2019. p. 569-584 (Lecture Notes in Computer Science ; Vol. 11052 ).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Sqn2Vec

T2 - learning sequence representation via sequential patterns with a gap constraint

AU - Nguyen, Dang

AU - Luo, Wei

AU - Nguyen, Tu Dinh

AU - Venkatesh, Svetha

AU - Phung, Dinh

PY - 2019

Y1 - 2019

N2 - When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

AB - When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

UR - http://www.scopus.com/inward/record.url?scp=85061131741&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-10928-8_34

DO - 10.1007/978-3-030-10928-8_34

M3 - Conference Paper

SN - 9783030109271

T3 - Lecture Notes in Computer Science

SP - 569

EP - 584

BT - Machine Learning and Knowledge Discovery in Databases

A2 - Berlingerio, Michele

A2 - Bonchi, Francesco

A2 - Gärtner, Thomas

A2 - Hurley, Neil

A2 - Ifrim, Georgiana

PB - Springer

CY - Cham Switzerland

ER -

Nguyen D, Luo W, Nguyen TD, Venkatesh S, Phung D. Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint. In Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G, editors, Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II. Cham Switzerland: Springer. 2019. p. 569-584. (Lecture Notes in Computer Science ). https://doi.org/10.1007/978-3-030-10928-8_34