On the use of cross-validation for time series predictor evaluation

Christoph Bergmeir, José M. Benítez

Research output: Contribution to journalArticleResearchpeer-review

Abstract

In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand. In traditional forecasting, it is common practice to reserve a part from the end of each time series for testing, and to use the rest of the series for training. Thus it is not made full use of the data, but theoretical problems with respect to temporal evolutionary effects and dependencies within the data as well as practical problems regarding missing values are eliminated. On the other hand, when evaluating machine learning and other regression methods used for time series forecasting, often cross-validation is used for evaluation, paying little attention to the fact that those theoretical problems invalidate the fundamental assumptions of cross-validation. To close this gap and examine the consequences of different model selection procedures in practice, we have developed a rigorous and extensive empirical study. Six different model selection procedures, based on (i) cross-validation and (ii) evaluation using the series' last part, are used to assess the performance of four machine learning and other regression techniques on synthetic and real-world time series. No practical consequences of the theoretical flaws were found during our study, but the use of cross-validation techniques led to a more robust model selection. To make use of the "best of both worlds", we suggest that the use of a blocked form of cross-validation for time series evaluation became the standard procedure, thus using all available information and circumventing the theoretical problems.

Original languageEnglish
Pages (from-to)192-213
Number of pages22
JournalInformation Sciences
Volume191
DOIs
Publication statusPublished - 15 May 2012
Externally publishedYes

Keywords

  • Cross-validation
  • Error measures
  • Machine learning
  • Predictor evaluation
  • Regression
  • Time series

Cite this

@article{9164e96000bb42b7a73d49094d179ce8,
title = "On the use of cross-validation for time series predictor evaluation",
abstract = "In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand. In traditional forecasting, it is common practice to reserve a part from the end of each time series for testing, and to use the rest of the series for training. Thus it is not made full use of the data, but theoretical problems with respect to temporal evolutionary effects and dependencies within the data as well as practical problems regarding missing values are eliminated. On the other hand, when evaluating machine learning and other regression methods used for time series forecasting, often cross-validation is used for evaluation, paying little attention to the fact that those theoretical problems invalidate the fundamental assumptions of cross-validation. To close this gap and examine the consequences of different model selection procedures in practice, we have developed a rigorous and extensive empirical study. Six different model selection procedures, based on (i) cross-validation and (ii) evaluation using the series' last part, are used to assess the performance of four machine learning and other regression techniques on synthetic and real-world time series. No practical consequences of the theoretical flaws were found during our study, but the use of cross-validation techniques led to a more robust model selection. To make use of the {"}best of both worlds{"}, we suggest that the use of a blocked form of cross-validation for time series evaluation became the standard procedure, thus using all available information and circumventing the theoretical problems.",
keywords = "Cross-validation, Error measures, Machine learning, Predictor evaluation, Regression, Time series",
author = "Christoph Bergmeir and Ben{\'i}tez, {Jos{\'e} M.}",
year = "2012",
month = "5",
day = "15",
doi = "10.1016/j.ins.2011.12.028",
language = "English",
volume = "191",
pages = "192--213",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier",

}

On the use of cross-validation for time series predictor evaluation. / Bergmeir, Christoph; Benítez, José M.

In: Information Sciences, Vol. 191, 15.05.2012, p. 192-213.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - On the use of cross-validation for time series predictor evaluation

AU - Bergmeir, Christoph

AU - Benítez, José M.

PY - 2012/5/15

Y1 - 2012/5/15

N2 - In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand. In traditional forecasting, it is common practice to reserve a part from the end of each time series for testing, and to use the rest of the series for training. Thus it is not made full use of the data, but theoretical problems with respect to temporal evolutionary effects and dependencies within the data as well as practical problems regarding missing values are eliminated. On the other hand, when evaluating machine learning and other regression methods used for time series forecasting, often cross-validation is used for evaluation, paying little attention to the fact that those theoretical problems invalidate the fundamental assumptions of cross-validation. To close this gap and examine the consequences of different model selection procedures in practice, we have developed a rigorous and extensive empirical study. Six different model selection procedures, based on (i) cross-validation and (ii) evaluation using the series' last part, are used to assess the performance of four machine learning and other regression techniques on synthetic and real-world time series. No practical consequences of the theoretical flaws were found during our study, but the use of cross-validation techniques led to a more robust model selection. To make use of the "best of both worlds", we suggest that the use of a blocked form of cross-validation for time series evaluation became the standard procedure, thus using all available information and circumventing the theoretical problems.

AB - In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand. In traditional forecasting, it is common practice to reserve a part from the end of each time series for testing, and to use the rest of the series for training. Thus it is not made full use of the data, but theoretical problems with respect to temporal evolutionary effects and dependencies within the data as well as practical problems regarding missing values are eliminated. On the other hand, when evaluating machine learning and other regression methods used for time series forecasting, often cross-validation is used for evaluation, paying little attention to the fact that those theoretical problems invalidate the fundamental assumptions of cross-validation. To close this gap and examine the consequences of different model selection procedures in practice, we have developed a rigorous and extensive empirical study. Six different model selection procedures, based on (i) cross-validation and (ii) evaluation using the series' last part, are used to assess the performance of four machine learning and other regression techniques on synthetic and real-world time series. No practical consequences of the theoretical flaws were found during our study, but the use of cross-validation techniques led to a more robust model selection. To make use of the "best of both worlds", we suggest that the use of a blocked form of cross-validation for time series evaluation became the standard procedure, thus using all available information and circumventing the theoretical problems.

KW - Cross-validation

KW - Error measures

KW - Machine learning

KW - Predictor evaluation

KW - Regression

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=84857652807&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2011.12.028

DO - 10.1016/j.ins.2011.12.028

M3 - Article

VL - 191

SP - 192

EP - 213

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -