GoGP

scalable geometric-based Gaussian process for online regression

Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen, Dinh Phung

Research output: Contribution to journalArticleResearchpeer-review

Abstract

One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

Original languageEnglish
Pages (from-to)197-226
Number of pages30
JournalKnowledge and Information Systems
DOIs
Publication statusPublished - 20 Jul 2019

Keywords

  • Gaussian process
  • Kernel methods
  • Online learning
  • Random feature
  • Regression

Cite this

@article{9e34d3d731d842a89e5504a037499b76,
title = "GoGP: scalable geometric-based Gaussian process for online regression",
abstract = "One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.",
keywords = "Gaussian process, Kernel methods, Online learning, Random feature, Regression",
author = "Trung Le and Khanh Nguyen and Vu Nguyen and Nguyen, {Tu Dinh} and Dinh Phung",
year = "2019",
month = "7",
day = "20",
doi = "10.1007/s10115-018-1239-1",
language = "English",
pages = "197--226",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer-Verlag London Ltd.",

}

GoGP : scalable geometric-based Gaussian process for online regression. / Le, Trung; Nguyen, Khanh; Nguyen, Vu; Nguyen, Tu Dinh; Phung, Dinh.

In: Knowledge and Information Systems, 20.07.2019, p. 197-226.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - GoGP

T2 - scalable geometric-based Gaussian process for online regression

AU - Le, Trung

AU - Nguyen, Khanh

AU - Nguyen, Vu

AU - Nguyen, Tu Dinh

AU - Phung, Dinh

PY - 2019/7/20

Y1 - 2019/7/20

N2 - One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

AB - One of the most challenging problems in Gaussian process regression is to cope with large-scale datasets and to tackle an online learning setting where data instances arrive irregularly and continuously. In this paper, we introduce a novel online Gaussian process model that scales efficiently with large-scale datasets. Our proposed GoGP is constructed based on the geometric and optimization views of the Gaussian process regression, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always offers a sparse solution, which can approximate the true optima up to any level of precision specified a priori. Moreover, to further speed up the GoGP accompanied with a positive semi-definite and shift-invariant kernel such as the well-known Gaussian kernel and also address the curse of kernelization problem, wherein the model size linearly rises with data size accumulated over time in the context of online learning, we proposed to approximate the original kernel using the Fourier random feature kernel. The model of GoGP with Fourier random feature (i.e., GoGP-RF) can be stored directly in a finite-dimensional random feature space, hence being able to avoid the curse of kernelization problem and scalable efficiently and effectively with large-scale datasets. We extensively evaluated our proposed methods against the state-of-the-art baselines on several large-scale datasets for online regression task. The experimental results show that our GoGP(s) delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

KW - Gaussian process

KW - Kernel methods

KW - Online learning

KW - Random feature

KW - Regression

UR - http://www.scopus.com/inward/record.url?scp=85050288691&partnerID=8YFLogxK

U2 - 10.1007/s10115-018-1239-1

DO - 10.1007/s10115-018-1239-1

M3 - Article

SP - 197

EP - 226

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

ER -