Approximation vector machines for large-scale online learning

Research output: Contribution to journalArticleResearchpeer-review

Abstract

One of the most challenging problems in kernel online learning is to bound the model size and to promote model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity-a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage sparsity and safeguard its risk in compromising the per-formance. In an online setting context, when an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neigh-bor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis for the common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classification task) and ℓ1,ℓ2, and.ϵ-insensitive (i.e., for the regression task) to characterize the gap between the approximation and optimal solutions. This gap crucially depends on two key factors including the frequency of approximation (i.e., how frequent the approximation operation takes place) and the predefined threshold. We conducted extensive experiments for classification and regression tasks in batch and online modes using several benchmark datasets. The quantitative results show that our proposed AVM obtained comparable pre-dictive performances with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

Original languageEnglish
Pages (from-to)1-55
Number of pages55
JournalJournal of Machine Learning Research
Volume18
Publication statusPublished - 1 Nov 2017
Externally publishedYes

Keywords

  • Big data
  • Convergence analysis
  • Core set
  • Kernel
  • Large-scale machine learning
  • Online learning
  • Sparsity
  • Stochastic gradient descent

Cite this

@article{6fc37c59f56e4b0b90d08e3720918663,
title = "Approximation vector machines for large-scale online learning",
abstract = "One of the most challenging problems in kernel online learning is to bound the model size and to promote model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity-a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage sparsity and safeguard its risk in compromising the per-formance. In an online setting context, when an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neigh-bor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis for the common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classification task) and ℓ1,ℓ2, and.ϵ-insensitive (i.e., for the regression task) to characterize the gap between the approximation and optimal solutions. This gap crucially depends on two key factors including the frequency of approximation (i.e., how frequent the approximation operation takes place) and the predefined threshold. We conducted extensive experiments for classification and regression tasks in batch and online modes using several benchmark datasets. The quantitative results show that our proposed AVM obtained comparable pre-dictive performances with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.",
keywords = "Big data, Convergence analysis, Core set, Kernel, Large-scale machine learning, Online learning, Sparsity, Stochastic gradient descent",
author = "Trung Le and Nguyen, {Tu Dinh} and Vu Nguyen and Dinh Phung",
year = "2017",
month = "11",
day = "1",
language = "English",
volume = "18",
pages = "1--55",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Journal of Machine Learning Research (JMLR)",

}

Approximation vector machines for large-scale online learning. / Le, Trung; Nguyen, Tu Dinh; Nguyen, Vu; Phung, Dinh.

In: Journal of Machine Learning Research, Vol. 18, 01.11.2017, p. 1-55.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Approximation vector machines for large-scale online learning

AU - Le, Trung

AU - Nguyen, Tu Dinh

AU - Nguyen, Vu

AU - Phung, Dinh

PY - 2017/11/1

Y1 - 2017/11/1

N2 - One of the most challenging problems in kernel online learning is to bound the model size and to promote model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity-a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage sparsity and safeguard its risk in compromising the per-formance. In an online setting context, when an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neigh-bor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis for the common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classification task) and ℓ1,ℓ2, and.ϵ-insensitive (i.e., for the regression task) to characterize the gap between the approximation and optimal solutions. This gap crucially depends on two key factors including the frequency of approximation (i.e., how frequent the approximation operation takes place) and the predefined threshold. We conducted extensive experiments for classification and regression tasks in batch and online modes using several benchmark datasets. The quantitative results show that our proposed AVM obtained comparable pre-dictive performances with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

AB - One of the most challenging problems in kernel online learning is to bound the model size and to promote model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity-a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage sparsity and safeguard its risk in compromising the per-formance. In an online setting context, when an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neigh-bor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis for the common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classification task) and ℓ1,ℓ2, and.ϵ-insensitive (i.e., for the regression task) to characterize the gap between the approximation and optimal solutions. This gap crucially depends on two key factors including the frequency of approximation (i.e., how frequent the approximation operation takes place) and the predefined threshold. We conducted extensive experiments for classification and regression tasks in batch and online modes using several benchmark datasets. The quantitative results show that our proposed AVM obtained comparable pre-dictive performances with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

KW - Big data

KW - Convergence analysis

KW - Core set

KW - Kernel

KW - Large-scale machine learning

KW - Online learning

KW - Sparsity

KW - Stochastic gradient descent

UR - http://www.scopus.com/inward/record.url?scp=85037731383&partnerID=8YFLogxK

M3 - Article

VL - 18

SP - 1

EP - 55

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -