DSCo: a language modeling approach for time series classification

Daoyuan Li, Li Li, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

4 Citations (Scopus)


Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series-which transforms numeric time series data into texts-is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.

Original languageEnglish
Title of host publicationMachine Learning and Data Mining in Pattern Recognition
Subtitle of host publication12th International Conference, MLDM 2016 New York, NY, USA, July 16–21, 2016 Proceedings
EditorsPetra Perner
Place of PublicationCham Switzerland
Number of pages17
ISBN (Electronic)9783319419206
ISBN (Print)9783319419190
Publication statusPublished - 2016
Externally publishedYes
EventInternational Conference on Machine Learning and Data Mining in Pattern Recognition 2016 - New York, United States of America
Duration: 16 Jul 201621 Jul 2016
Conference number: 12th

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceInternational Conference on Machine Learning and Data Mining in Pattern Recognition 2016
Abbreviated titleMLDM 2016
Country/TerritoryUnited States of America
CityNew York
Internet address


  • Language Model
  • Time Series Data
  • Symbolic Representation
  • Dynamic Time Warping
  • Alphabet Size

Cite this