Indexing and classifying gigabytes of time series under time warping

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    Abstract

    Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task. NN compares each time series to be classified to every time series in the training database. With a training database of N time series of lengths L, each classification requires ν(N · L 2 ) computations. The databases used in almost all prior research have been relatively small (with less than 10; 000 samples) and much of the research has focused on making DTW's complexity linear with L, leading to a runtime complexity of O(N · L). As we demonstrate with an example in remote sensing, real-world time series databases are now reaching the million-to-billion scale. This wealth of training data brings the promise of higher accuracy, but raises a significant challenge because N is becoming the limiting factor. As DTW is not a metric, indexing objects induced by its space is extremely challenging. We tackle this task in this paper. We develop TSI, a novel algorithm for Time Series Indexing which combines a hierarchy of K-means clustering with DTW-based lower-bounding. We show that, on large databases, TSI makes it possible to classify time series orders of magnitude faster than the state of the art.
    Original languageEnglish
    Title of host publicationProceedings of the 17th SIAM International Conference on Data Mining
    Subtitle of host publicationHouston, Texas, USA, 27 – 29 April , 2017
    EditorsNitesh Chawla, Wei Wang
    Place of PublicationPhiladelphia, PA
    PublisherSociety for Industrial and Applied Mathematics SIAM Publications
    Pages282-290
    Number of pages9
    ISBN (Electronic)9781611974874, 9781611974881
    DOIs
    Publication statusPublished - 1 Jan 2017
    EventSIAM International Conference on Data Mining 2017 - Houston, United States of America
    Duration: 27 Apr 201729 Apr 2017
    Conference number: 17th

    Conference

    ConferenceSIAM International Conference on Data Mining 2017
    Abbreviated titleSDM 2017
    CountryUnited States of America
    CityHouston
    Period27/04/1729/04/17

    Keywords

    • Dynamic time warping
    • Time series classification
    • Time series indexing

    Cite this

    Tan, C. W., Webb, G. I., & Petitjean, F. (2017). Indexing and classifying gigabytes of time series under time warping. In N. Chawla, & W. Wang (Eds.), Proceedings of the 17th SIAM International Conference on Data Mining: Houston, Texas, USA, 27 – 29 April , 2017 (pp. 282-290). Philadelphia, PA: Society for Industrial and Applied Mathematics SIAM Publications. https://doi.org/10.1137/1.9781611974973.32
    Tan, Chang Wei ; Webb, Geoffrey I. ; Petitjean, Francois. / Indexing and classifying gigabytes of time series under time warping. Proceedings of the 17th SIAM International Conference on Data Mining: Houston, Texas, USA, 27 – 29 April , 2017. editor / Nitesh Chawla ; Wei Wang. Philadelphia, PA : Society for Industrial and Applied Mathematics SIAM Publications, 2017. pp. 282-290
    @inproceedings{6a178c488b6b4a77b2486f0c7446b469,
    title = "Indexing and classifying gigabytes of time series under time warping",
    abstract = "Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task. NN compares each time series to be classified to every time series in the training database. With a training database of N time series of lengths L, each classification requires ν(N · L 2 ) computations. The databases used in almost all prior research have been relatively small (with less than 10; 000 samples) and much of the research has focused on making DTW's complexity linear with L, leading to a runtime complexity of O(N · L). As we demonstrate with an example in remote sensing, real-world time series databases are now reaching the million-to-billion scale. This wealth of training data brings the promise of higher accuracy, but raises a significant challenge because N is becoming the limiting factor. As DTW is not a metric, indexing objects induced by its space is extremely challenging. We tackle this task in this paper. We develop TSI, a novel algorithm for Time Series Indexing which combines a hierarchy of K-means clustering with DTW-based lower-bounding. We show that, on large databases, TSI makes it possible to classify time series orders of magnitude faster than the state of the art.",
    keywords = "Dynamic time warping, Time series classification, Time series indexing",
    author = "Tan, {Chang Wei} and Webb, {Geoffrey I.} and Francois Petitjean",
    year = "2017",
    month = "1",
    day = "1",
    doi = "10.1137/1.9781611974973.32",
    language = "English",
    pages = "282--290",
    editor = "Chawla, {Nitesh } and Wang, {Wei }",
    booktitle = "Proceedings of the 17th SIAM International Conference on Data Mining",
    publisher = "Society for Industrial and Applied Mathematics SIAM Publications",

    }

    Tan, CW, Webb, GI & Petitjean, F 2017, Indexing and classifying gigabytes of time series under time warping. in N Chawla & W Wang (eds), Proceedings of the 17th SIAM International Conference on Data Mining: Houston, Texas, USA, 27 – 29 April , 2017. Society for Industrial and Applied Mathematics SIAM Publications, Philadelphia, PA, pp. 282-290, SIAM International Conference on Data Mining 2017, Houston, United States of America, 27/04/17. https://doi.org/10.1137/1.9781611974973.32

    Indexing and classifying gigabytes of time series under time warping. / Tan, Chang Wei; Webb, Geoffrey I.; Petitjean, Francois.

    Proceedings of the 17th SIAM International Conference on Data Mining: Houston, Texas, USA, 27 – 29 April , 2017. ed. / Nitesh Chawla; Wei Wang. Philadelphia, PA : Society for Industrial and Applied Mathematics SIAM Publications, 2017. p. 282-290.

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    TY - GEN

    T1 - Indexing and classifying gigabytes of time series under time warping

    AU - Tan, Chang Wei

    AU - Webb, Geoffrey I.

    AU - Petitjean, Francois

    PY - 2017/1/1

    Y1 - 2017/1/1

    N2 - Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task. NN compares each time series to be classified to every time series in the training database. With a training database of N time series of lengths L, each classification requires ν(N · L 2 ) computations. The databases used in almost all prior research have been relatively small (with less than 10; 000 samples) and much of the research has focused on making DTW's complexity linear with L, leading to a runtime complexity of O(N · L). As we demonstrate with an example in remote sensing, real-world time series databases are now reaching the million-to-billion scale. This wealth of training data brings the promise of higher accuracy, but raises a significant challenge because N is becoming the limiting factor. As DTW is not a metric, indexing objects induced by its space is extremely challenging. We tackle this task in this paper. We develop TSI, a novel algorithm for Time Series Indexing which combines a hierarchy of K-means clustering with DTW-based lower-bounding. We show that, on large databases, TSI makes it possible to classify time series orders of magnitude faster than the state of the art.

    AB - Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task. NN compares each time series to be classified to every time series in the training database. With a training database of N time series of lengths L, each classification requires ν(N · L 2 ) computations. The databases used in almost all prior research have been relatively small (with less than 10; 000 samples) and much of the research has focused on making DTW's complexity linear with L, leading to a runtime complexity of O(N · L). As we demonstrate with an example in remote sensing, real-world time series databases are now reaching the million-to-billion scale. This wealth of training data brings the promise of higher accuracy, but raises a significant challenge because N is becoming the limiting factor. As DTW is not a metric, indexing objects induced by its space is extremely challenging. We tackle this task in this paper. We develop TSI, a novel algorithm for Time Series Indexing which combines a hierarchy of K-means clustering with DTW-based lower-bounding. We show that, on large databases, TSI makes it possible to classify time series orders of magnitude faster than the state of the art.

    KW - Dynamic time warping

    KW - Time series classification

    KW - Time series indexing

    UR - http://www.scopus.com/inward/record.url?scp=85027845117&partnerID=8YFLogxK

    U2 - 10.1137/1.9781611974973.32

    DO - 10.1137/1.9781611974973.32

    M3 - Conference Paper

    SP - 282

    EP - 290

    BT - Proceedings of the 17th SIAM International Conference on Data Mining

    A2 - Chawla, Nitesh

    A2 - Wang, Wei

    PB - Society for Industrial and Applied Mathematics SIAM Publications

    CY - Philadelphia, PA

    ER -

    Tan CW, Webb GI, Petitjean F. Indexing and classifying gigabytes of time series under time warping. In Chawla N, Wang W, editors, Proceedings of the 17th SIAM International Conference on Data Mining: Houston, Texas, USA, 27 – 29 April , 2017. Philadelphia, PA: Society for Industrial and Applied Mathematics SIAM Publications. 2017. p. 282-290 https://doi.org/10.1137/1.9781611974973.32