Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint

Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II
EditorsMichele Berlingerio, Francesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim
Place of PublicationCham Switzerland
PublisherSpringer
Pages569-584
Number of pages16
ISBN (Electronic)9783030109288
ISBN (Print)9783030109271
DOIs
Publication statusPublished - 2019
EventEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2018 - Dublin, Ireland
Duration: 10 Sep 201814 Sep 2018
http://www.ecmlpkdd2018.org/
https://link.springer.com/book/10.1007/978-3-030-10925-7 (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11052
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2018
Abbreviated titleECML-PKDD 2018
CountryIreland
CityDublin
Period10/09/1814/09/18
Internet address

Cite this

Nguyen, D., Luo, W., Nguyen, T. D., Venkatesh, S., & Phung, D. (2019). Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint. In M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, & G. Ifrim (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II (pp. 569-584). (Lecture Notes in Computer Science ; Vol. 11052 ). Springer. https://doi.org/10.1007/978-3-030-10928-8_34