DSCo

A language modeling approach for time series classification

Daoyuan Li, Li Li, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series-which transforms numeric time series data into texts-is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.

Original languageEnglish
Title of host publicationMachine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings
PublisherSpringer-Verlag London Ltd.
Pages294-310
Number of pages17
Volume9729
ISBN (Print)9783319419190
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016 - New York, United States of America
Duration: 16 Jul 201621 Jul 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9729
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016
CountryUnited States of America
CityNew York
Period16/07/1621/07/16

Cite this

Li, D., Li, L., Bissyandé, T. F., Klein, J., & Traon, Y. L. (2016). DSCo: A language modeling approach for time series classification. In Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings (Vol. 9729, pp. 294-310). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9729). Springer-Verlag London Ltd.. https://doi.org/10.1007/978-3-319-41920-6_22
Li, Daoyuan ; Li, Li ; Bissyandé, Tegawendé F. ; Klein, Jacques ; Traon, Yves Le. / DSCo : A language modeling approach for time series classification. Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. Vol. 9729 Springer-Verlag London Ltd., 2016. pp. 294-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{43f59510002447f4a517b32d1c48be91,
title = "DSCo: A language modeling approach for time series classification",
abstract = "Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series-which transforms numeric time series data into texts-is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.",
author = "Daoyuan Li and Li Li and Bissyand{\'e}, {Tegawend{\'e} F.} and Jacques Klein and Traon, {Yves Le}",
year = "2016",
doi = "10.1007/978-3-319-41920-6_22",
language = "English",
isbn = "9783319419190",
volume = "9729",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag London Ltd.",
pages = "294--310",
booktitle = "Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings",
address = "Germany",

}

Li, D, Li, L, Bissyandé, TF, Klein, J & Traon, YL 2016, DSCo: A language modeling approach for time series classification. in Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. vol. 9729, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9729, Springer-Verlag London Ltd., pp. 294-310, 12th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2016, New York, United States of America, 16/07/16. https://doi.org/10.1007/978-3-319-41920-6_22

DSCo : A language modeling approach for time series classification. / Li, Daoyuan; Li, Li; Bissyandé, Tegawendé F.; Klein, Jacques; Traon, Yves Le.

Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. Vol. 9729 Springer-Verlag London Ltd., 2016. p. 294-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9729).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - DSCo

T2 - A language modeling approach for time series classification

AU - Li, Daoyuan

AU - Li, Li

AU - Bissyandé, Tegawendé F.

AU - Klein, Jacques

AU - Traon, Yves Le

PY - 2016

Y1 - 2016

N2 - Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series-which transforms numeric time series data into texts-is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.

AB - Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series-which transforms numeric time series data into texts-is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.

UR - http://www.scopus.com/inward/record.url?scp=84978976230&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-41920-6_22

DO - 10.1007/978-3-319-41920-6_22

M3 - Conference Paper

SN - 9783319419190

VL - 9729

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 294

EP - 310

BT - Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings

PB - Springer-Verlag London Ltd.

ER -

Li D, Li L, Bissyandé TF, Klein J, Traon YL. DSCo: A language modeling approach for time series classification. In Machine Learning and Data Mining in Pattern Recognition - 12th International Conference, MLDM 2016, Proceedings. Vol. 9729. Springer-Verlag London Ltd. 2016. p. 294-310. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-41920-6_22