Finding the best not the most

regularized loss minimization subgraph selection for graph classification

Shirui Pan, Jia Wu, Xingquan Zhu, Guodong Long, Chengqi Zhang

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.

Original languageEnglish
Pages (from-to)3783-3796
Number of pages14
JournalPattern Recognition
Volume48
Issue number11
DOIs
Publication statusPublished - Nov 2015
Externally publishedYes

Keywords

  • Classification
  • Feature selection
  • Graph classification
  • Sparse learning

Cite this

Pan, Shirui ; Wu, Jia ; Zhu, Xingquan ; Long, Guodong ; Zhang, Chengqi. / Finding the best not the most : regularized loss minimization subgraph selection for graph classification. In: Pattern Recognition. 2015 ; Vol. 48, No. 11. pp. 3783-3796.
@article{bf52111f3a7b4558a23098beef527f61,
title = "Finding the best not the most: regularized loss minimization subgraph selection for graph classification",
abstract = "Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.",
keywords = "Classification, Feature selection, Graph classification, Sparse learning",
author = "Shirui Pan and Jia Wu and Xingquan Zhu and Guodong Long and Chengqi Zhang",
year = "2015",
month = "11",
doi = "10.1016/j.patcog.2015.05.019",
language = "English",
volume = "48",
pages = "3783--3796",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier",
number = "11",

}

Finding the best not the most : regularized loss minimization subgraph selection for graph classification. / Pan, Shirui; Wu, Jia; Zhu, Xingquan; Long, Guodong; Zhang, Chengqi.

In: Pattern Recognition, Vol. 48, No. 11, 11.2015, p. 3783-3796.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Finding the best not the most

T2 - regularized loss minimization subgraph selection for graph classification

AU - Pan, Shirui

AU - Wu, Jia

AU - Zhu, Xingquan

AU - Long, Guodong

AU - Zhang, Chengqi

PY - 2015/11

Y1 - 2015/11

N2 - Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.

AB - Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain.

KW - Classification

KW - Feature selection

KW - Graph classification

KW - Sparse learning

UR - http://www.scopus.com/inward/record.url?scp=84937814255&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2015.05.019

DO - 10.1016/j.patcog.2015.05.019

M3 - Article

VL - 48

SP - 3783

EP - 3796

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 11

ER -