Clustering social audiences in business information networks

Yu Zheng, Ruiqi Hu, Sai-fu Fung, Celina Yu, Guodong Long, Ting Guo, Shirui Pan

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Business information networks involve diverse users and rich content and have emerged as important platforms for enabling business intelligence and business decision making. A key step in an organizations business intelligence process is to cluster users with similar interests into social audiences and discover the roles they play within a business network. In this article, we propose a novel machine-learning approach, called CBIN, that co-clusters business information networks to discover and understand these audiences. The CBIN framework is based on co-factorization. The audience clusters are discovered from a combination of network structures and rich contextual information, such as node interactions and node-content correlations. Since what defines an audience cluster is data-driven, plus they often overlap, pre-determining the number of clusters is usually very difficult. Therefore, we have based CBIN on an overlapping clustering paradigm with a hold-out strategy to discover the optimal number of clusters given the underlying data. Experiments validate an outstanding performance by CBIN compared to other state-of-the-art algorithms on 13 real-world enterprise datasets.

Original languageEnglish
Number of pages12
JournalPattern Recognition
Volume100
DOIs
Publication statusPublished - Apr 2020

Keywords

  • Business information networks
  • Clustering
  • Machine learning
  • Social networks

Cite this

Zheng, Yu ; Hu, Ruiqi ; Fung, Sai-fu ; Yu, Celina ; Long, Guodong ; Guo, Ting ; Pan, Shirui. / Clustering social audiences in business information networks. In: Pattern Recognition. 2020 ; Vol. 100.
@article{77afe3c1f9dc47a4acea590846377a25,
title = "Clustering social audiences in business information networks",
abstract = "Business information networks involve diverse users and rich content and have emerged as important platforms for enabling business intelligence and business decision making. A key step in an organizations business intelligence process is to cluster users with similar interests into social audiences and discover the roles they play within a business network. In this article, we propose a novel machine-learning approach, called CBIN, that co-clusters business information networks to discover and understand these audiences. The CBIN framework is based on co-factorization. The audience clusters are discovered from a combination of network structures and rich contextual information, such as node interactions and node-content correlations. Since what defines an audience cluster is data-driven, plus they often overlap, pre-determining the number of clusters is usually very difficult. Therefore, we have based CBIN on an overlapping clustering paradigm with a hold-out strategy to discover the optimal number of clusters given the underlying data. Experiments validate an outstanding performance by CBIN compared to other state-of-the-art algorithms on 13 real-world enterprise datasets.",
keywords = "Business information networks, Clustering, Machine learning, Social networks",
author = "Yu Zheng and Ruiqi Hu and Sai-fu Fung and Celina Yu and Guodong Long and Ting Guo and Shirui Pan",
year = "2020",
month = "4",
doi = "10.1016/j.patcog.2019.107126",
language = "English",
volume = "100",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier",

}

Clustering social audiences in business information networks. / Zheng, Yu; Hu, Ruiqi; Fung, Sai-fu; Yu, Celina; Long, Guodong; Guo, Ting; Pan, Shirui.

In: Pattern Recognition, Vol. 100, 04.2020.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Clustering social audiences in business information networks

AU - Zheng, Yu

AU - Hu, Ruiqi

AU - Fung, Sai-fu

AU - Yu, Celina

AU - Long, Guodong

AU - Guo, Ting

AU - Pan, Shirui

PY - 2020/4

Y1 - 2020/4

N2 - Business information networks involve diverse users and rich content and have emerged as important platforms for enabling business intelligence and business decision making. A key step in an organizations business intelligence process is to cluster users with similar interests into social audiences and discover the roles they play within a business network. In this article, we propose a novel machine-learning approach, called CBIN, that co-clusters business information networks to discover and understand these audiences. The CBIN framework is based on co-factorization. The audience clusters are discovered from a combination of network structures and rich contextual information, such as node interactions and node-content correlations. Since what defines an audience cluster is data-driven, plus they often overlap, pre-determining the number of clusters is usually very difficult. Therefore, we have based CBIN on an overlapping clustering paradigm with a hold-out strategy to discover the optimal number of clusters given the underlying data. Experiments validate an outstanding performance by CBIN compared to other state-of-the-art algorithms on 13 real-world enterprise datasets.

AB - Business information networks involve diverse users and rich content and have emerged as important platforms for enabling business intelligence and business decision making. A key step in an organizations business intelligence process is to cluster users with similar interests into social audiences and discover the roles they play within a business network. In this article, we propose a novel machine-learning approach, called CBIN, that co-clusters business information networks to discover and understand these audiences. The CBIN framework is based on co-factorization. The audience clusters are discovered from a combination of network structures and rich contextual information, such as node interactions and node-content correlations. Since what defines an audience cluster is data-driven, plus they often overlap, pre-determining the number of clusters is usually very difficult. Therefore, we have based CBIN on an overlapping clustering paradigm with a hold-out strategy to discover the optimal number of clusters given the underlying data. Experiments validate an outstanding performance by CBIN compared to other state-of-the-art algorithms on 13 real-world enterprise datasets.

KW - Business information networks

KW - Clustering

KW - Machine learning

KW - Social networks

UR - http://www.scopus.com/inward/record.url?scp=85076010225&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2019.107126

DO - 10.1016/j.patcog.2019.107126

M3 - Article

VL - 100

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -