A note on the validity of cross-validation for evaluating autoregressive time series prediction

Research output: Contribution to journalArticleResearchpeer-review

Abstract

One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often replaced by practitioners in favour of an out-of-sample (OOS) evaluation. It is shown that for purely autoregressive models, the use of standard K-fold CV is possible provided the models considered have uncorrelated errors. Such a setup occurs, for example, when the models nest a more appropriate model. This is very common when Machine Learning methods are used for prediction, and where CV can control for overfitting the data. Theoretical insights supporting these arguments are presented, along with a simulation study and a real-world example. It is shown empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.

LanguageEnglish
Pages70-83
Number of pages14
JournalComputational Statistics and Data Analysis
Volume120
DOIs
Publication statusPublished - 1 Apr 2018

Keywords

  • Autoregression
  • Cross-validation
  • Time series

Cite this

@article{681e70db676947cf93a67c5ebd4203d2,
title = "A note on the validity of cross-validation for evaluating autoregressive time series prediction",
abstract = "One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often replaced by practitioners in favour of an out-of-sample (OOS) evaluation. It is shown that for purely autoregressive models, the use of standard K-fold CV is possible provided the models considered have uncorrelated errors. Such a setup occurs, for example, when the models nest a more appropriate model. This is very common when Machine Learning methods are used for prediction, and where CV can control for overfitting the data. Theoretical insights supporting these arguments are presented, along with a simulation study and a real-world example. It is shown empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.",
keywords = "Autoregression, Cross-validation, Time series",
author = "Christoph Bergmeir and Hyndman, {Rob J.} and Bonsoo Koo",
year = "2018",
month = "4",
day = "1",
doi = "10.1016/j.csda.2017.11.003",
language = "English",
volume = "120",
pages = "70--83",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",

}

TY - JOUR

T1 - A note on the validity of cross-validation for evaluating autoregressive time series prediction

AU - Bergmeir, Christoph

AU - Hyndman, Rob J.

AU - Koo, Bonsoo

PY - 2018/4/1

Y1 - 2018/4/1

N2 - One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often replaced by practitioners in favour of an out-of-sample (OOS) evaluation. It is shown that for purely autoregressive models, the use of standard K-fold CV is possible provided the models considered have uncorrelated errors. Such a setup occurs, for example, when the models nest a more appropriate model. This is very common when Machine Learning methods are used for prediction, and where CV can control for overfitting the data. Theoretical insights supporting these arguments are presented, along with a simulation study and a real-world example. It is shown empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.

AB - One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often replaced by practitioners in favour of an out-of-sample (OOS) evaluation. It is shown that for purely autoregressive models, the use of standard K-fold CV is possible provided the models considered have uncorrelated errors. Such a setup occurs, for example, when the models nest a more appropriate model. This is very common when Machine Learning methods are used for prediction, and where CV can control for overfitting the data. Theoretical insights supporting these arguments are presented, along with a simulation study and a real-world example. It is shown empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.

KW - Autoregression

KW - Cross-validation

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=85036471177&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2017.11.003

DO - 10.1016/j.csda.2017.11.003

M3 - Article

VL - 120

SP - 70

EP - 83

JO - Computational Statistics and Data Analysis

T2 - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

ER -