CFOND: consensus factorization for co-clustering networked data

Ting Guo, Shirui Pan, Xingquan Zhu, Chengqi Zhang

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Networked data are common in domains where instances are characterized by both feature values and inter-dependency relationships. Finding cluster structures for networked instances and discovering representative features for each cluster represent a special co-clustering task usefully for many real-world applications, such as automatic categorization of scientific publications and finding representative key-words for each cluster. To date, although co-clustering has been commonly used for finding clusters for both instances and features, all existing methods are focused on instance-feature values, without leveraging valuable topology relationships between instances to help boost co-clustering performance. In this paper, we propose CFOND, a consensus factorization based framework for co-clustering networked data. We argue that feature values and linkages provide useful information from different perspectives, yet they are not always consistent and therefore need to be carefully aligned for best clustering results. In the paper, we advocate a consensus factorization principle, which simultaneously factorizes information from three aspects: network topology structures, instance-feature content relationships, and feature-feature correlations. The consensus factorization ensures that the final cluster structures are consistent across information from the three aspects with minimum errors. CFOND enjoys sound theoretical basis and proved convergence, and its performance is validated on real-world networks.

Original languageEnglish
Pages (from-to)706-719
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume31
Issue number4
DOIs
Publication statusPublished - 1 Apr 2019
Externally publishedYes

Keywords

  • Co-clustering
  • Couplings
  • Data mining
  • Linear programming
  • Manifolds
  • Merging
  • Network topology
  • Networked data
  • Networks
  • Nonnegative Matrix Factorization
  • Topology

Cite this

Guo, Ting ; Pan, Shirui ; Zhu, Xingquan ; Zhang, Chengqi. / CFOND : consensus factorization for co-clustering networked data. In: IEEE Transactions on Knowledge and Data Engineering. 2019 ; Vol. 31, No. 4. pp. 706-719.
@article{f7ef2486614943df970e6d258d89f028,
title = "CFOND: consensus factorization for co-clustering networked data",
abstract = "Networked data are common in domains where instances are characterized by both feature values and inter-dependency relationships. Finding cluster structures for networked instances and discovering representative features for each cluster represent a special co-clustering task usefully for many real-world applications, such as automatic categorization of scientific publications and finding representative key-words for each cluster. To date, although co-clustering has been commonly used for finding clusters for both instances and features, all existing methods are focused on instance-feature values, without leveraging valuable topology relationships between instances to help boost co-clustering performance. In this paper, we propose CFOND, a consensus factorization based framework for co-clustering networked data. We argue that feature values and linkages provide useful information from different perspectives, yet they are not always consistent and therefore need to be carefully aligned for best clustering results. In the paper, we advocate a consensus factorization principle, which simultaneously factorizes information from three aspects: network topology structures, instance-feature content relationships, and feature-feature correlations. The consensus factorization ensures that the final cluster structures are consistent across information from the three aspects with minimum errors. CFOND enjoys sound theoretical basis and proved convergence, and its performance is validated on real-world networks.",
keywords = "Co-clustering, Couplings, Data mining, Linear programming, Manifolds, Merging, Network topology, Networked data, Networks, Nonnegative Matrix Factorization, Topology",
author = "Ting Guo and Shirui Pan and Xingquan Zhu and Chengqi Zhang",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/TKDE.2018.2846555",
language = "English",
volume = "31",
pages = "706--719",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "4",

}

CFOND : consensus factorization for co-clustering networked data. / Guo, Ting; Pan, Shirui; Zhu, Xingquan; Zhang, Chengqi.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 31, No. 4, 01.04.2019, p. 706-719.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - CFOND

T2 - consensus factorization for co-clustering networked data

AU - Guo, Ting

AU - Pan, Shirui

AU - Zhu, Xingquan

AU - Zhang, Chengqi

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Networked data are common in domains where instances are characterized by both feature values and inter-dependency relationships. Finding cluster structures for networked instances and discovering representative features for each cluster represent a special co-clustering task usefully for many real-world applications, such as automatic categorization of scientific publications and finding representative key-words for each cluster. To date, although co-clustering has been commonly used for finding clusters for both instances and features, all existing methods are focused on instance-feature values, without leveraging valuable topology relationships between instances to help boost co-clustering performance. In this paper, we propose CFOND, a consensus factorization based framework for co-clustering networked data. We argue that feature values and linkages provide useful information from different perspectives, yet they are not always consistent and therefore need to be carefully aligned for best clustering results. In the paper, we advocate a consensus factorization principle, which simultaneously factorizes information from three aspects: network topology structures, instance-feature content relationships, and feature-feature correlations. The consensus factorization ensures that the final cluster structures are consistent across information from the three aspects with minimum errors. CFOND enjoys sound theoretical basis and proved convergence, and its performance is validated on real-world networks.

AB - Networked data are common in domains where instances are characterized by both feature values and inter-dependency relationships. Finding cluster structures for networked instances and discovering representative features for each cluster represent a special co-clustering task usefully for many real-world applications, such as automatic categorization of scientific publications and finding representative key-words for each cluster. To date, although co-clustering has been commonly used for finding clusters for both instances and features, all existing methods are focused on instance-feature values, without leveraging valuable topology relationships between instances to help boost co-clustering performance. In this paper, we propose CFOND, a consensus factorization based framework for co-clustering networked data. We argue that feature values and linkages provide useful information from different perspectives, yet they are not always consistent and therefore need to be carefully aligned for best clustering results. In the paper, we advocate a consensus factorization principle, which simultaneously factorizes information from three aspects: network topology structures, instance-feature content relationships, and feature-feature correlations. The consensus factorization ensures that the final cluster structures are consistent across information from the three aspects with minimum errors. CFOND enjoys sound theoretical basis and proved convergence, and its performance is validated on real-world networks.

KW - Co-clustering

KW - Couplings

KW - Data mining

KW - Linear programming

KW - Manifolds

KW - Merging

KW - Network topology

KW - Networked data

KW - Networks

KW - Nonnegative Matrix Factorization

KW - Topology

UR - http://www.scopus.com/inward/record.url?scp=85048557710&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2846555

DO - 10.1109/TKDE.2018.2846555

M3 - Article

VL - 31

SP - 706

EP - 719

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

ER -