Generating synthetic time series to augment sparse datasets

Germain Forestier, Francois Petitjean, Hoang Anh Dau, Geoffrey I. Webb, Eamonn Keogh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

13 Citations (Scopus)

Abstract

In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.

Original languageEnglish
Title of host publicationProceedings
Subtitle of host publication17th IEEE International Conference on Data Mining
EditorsVijay Raghavan, Srinivas Aluru, George Karypis, Lucio Miele, Xindong Wu
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages865-870
Number of pages6
ISBN (Print)9781538638347
DOIs
Publication statusPublished - 15 Dec 2017
EventIEEE International Conference on Data Mining 2017 - New Orleans, United States of America
Duration: 18 Nov 201721 Nov 2017
Conference number: 17th
http://icdm2017.bigke.org/

Conference

ConferenceIEEE International Conference on Data Mining 2017
Abbreviated titleICDM 2017
CountryUnited States of America
CityNew Orleans
Period18/11/1721/11/17
Internet address

Keywords

  • Data augmentation
  • Dynamic time warping
  • Time series classification

Cite this

Forestier, G., Petitjean, F., Dau, H. A., Webb, G. I., & Keogh, E. (2017). Generating synthetic time series to augment sparse datasets. In V. Raghavan, S. Aluru, G. Karypis, L. Miele, & X. Wu (Eds.), Proceedings: 17th IEEE International Conference on Data Mining (pp. 865-870). Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICDM.2017.106
Forestier, Germain ; Petitjean, Francois ; Dau, Hoang Anh ; Webb, Geoffrey I. ; Keogh, Eamonn. / Generating synthetic time series to augment sparse datasets. Proceedings: 17th IEEE International Conference on Data Mining. editor / Vijay Raghavan ; Srinivas Aluru ; George Karypis ; Lucio Miele ; Xindong Wu. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2017. pp. 865-870
@inproceedings{1a75ebdeed544bc6b0626c99d77b888e,
title = "Generating synthetic time series to augment sparse datasets",
abstract = "In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.",
keywords = "Data augmentation, Dynamic time warping, Time series classification",
author = "Germain Forestier and Francois Petitjean and Dau, {Hoang Anh} and Webb, {Geoffrey I.} and Eamonn Keogh",
year = "2017",
month = "12",
day = "15",
doi = "10.1109/ICDM.2017.106",
language = "English",
isbn = "9781538638347",
pages = "865--870",
editor = "Vijay Raghavan and Srinivas Aluru and George Karypis and Lucio Miele and Xindong Wu",
booktitle = "Proceedings",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States of America",

}

Forestier, G, Petitjean, F, Dau, HA, Webb, GI & Keogh, E 2017, Generating synthetic time series to augment sparse datasets. in V Raghavan, S Aluru, G Karypis, L Miele & X Wu (eds), Proceedings: 17th IEEE International Conference on Data Mining. IEEE, Institute of Electrical and Electronics Engineers, Piscataway NJ USA, pp. 865-870, IEEE International Conference on Data Mining 2017, New Orleans, United States of America, 18/11/17. https://doi.org/10.1109/ICDM.2017.106

Generating synthetic time series to augment sparse datasets. / Forestier, Germain; Petitjean, Francois; Dau, Hoang Anh; Webb, Geoffrey I.; Keogh, Eamonn.

Proceedings: 17th IEEE International Conference on Data Mining. ed. / Vijay Raghavan; Srinivas Aluru; George Karypis; Lucio Miele; Xindong Wu. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2017. p. 865-870.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Generating synthetic time series to augment sparse datasets

AU - Forestier, Germain

AU - Petitjean, Francois

AU - Dau, Hoang Anh

AU - Webb, Geoffrey I.

AU - Keogh, Eamonn

PY - 2017/12/15

Y1 - 2017/12/15

N2 - In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.

AB - In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.

KW - Data augmentation

KW - Dynamic time warping

KW - Time series classification

UR - http://www.scopus.com/inward/record.url?scp=85044004839&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2017.106

DO - 10.1109/ICDM.2017.106

M3 - Conference Paper

AN - SCOPUS:85044004839

SN - 9781538638347

SP - 865

EP - 870

BT - Proceedings

A2 - Raghavan, Vijay

A2 - Aluru, Srinivas

A2 - Karypis, George

A2 - Miele, Lucio

A2 - Wu, Xindong

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Piscataway NJ USA

ER -

Forestier G, Petitjean F, Dau HA, Webb GI, Keogh E. Generating synthetic time series to augment sparse datasets. In Raghavan V, Aluru S, Karypis G, Miele L, Wu X, editors, Proceedings: 17th IEEE International Conference on Data Mining. Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. 2017. p. 865-870 https://doi.org/10.1109/ICDM.2017.106