Graph ensemble boosting for imbalanced noisy graph stream classification

Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.

Original languageEnglish
Article number6884853
Pages (from-to)940-954
Number of pages15
JournalIEEE Transactions on Cybernetics
Volume45
Issue number5
DOIs
Publication statusPublished - May 2015
Externally publishedYes

Keywords

  • Data streams
  • graph ensemble boosting (gEBoost)
  • graphs
  • imbalanced class distributions
  • noise

Cite this

Pan, Shirui ; Wu, Jia ; Zhu, Xingquan ; Zhang, Chengqi. / Graph ensemble boosting for imbalanced noisy graph stream classification. In: IEEE Transactions on Cybernetics. 2015 ; Vol. 45, No. 5. pp. 940-954.
@article{7323248b320045c7b896272ceddfd0bc,
title = "Graph ensemble boosting for imbalanced noisy graph stream classification",
abstract = "Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.",
keywords = "Data streams, graph ensemble boosting (gEBoost), graphs, imbalanced class distributions, noise",
author = "Shirui Pan and Jia Wu and Xingquan Zhu and Chengqi Zhang",
year = "2015",
month = "5",
doi = "10.1109/TCYB.2014.2341031",
language = "English",
volume = "45",
pages = "940--954",
journal = "IEEE Transactions on Cybernetics",
issn = "2168-2267",
number = "5",

}

Graph ensemble boosting for imbalanced noisy graph stream classification. / Pan, Shirui; Wu, Jia; Zhu, Xingquan; Zhang, Chengqi.

In: IEEE Transactions on Cybernetics, Vol. 45, No. 5, 6884853, 05.2015, p. 940-954.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Graph ensemble boosting for imbalanced noisy graph stream classification

AU - Pan, Shirui

AU - Wu, Jia

AU - Zhu, Xingquan

AU - Zhang, Chengqi

PY - 2015/5

Y1 - 2015/5

N2 - Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.

AB - Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.

KW - Data streams

KW - graph ensemble boosting (gEBoost)

KW - graphs

KW - imbalanced class distributions

KW - noise

UR - http://www.scopus.com/inward/record.url?scp=85027917439&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2014.2341031

DO - 10.1109/TCYB.2014.2341031

M3 - Article

VL - 45

SP - 940

EP - 954

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

SN - 2168-2267

IS - 5

M1 - 6884853

ER -