Leveraging meta information in short text aggregation

He Zhao, Lan Du, Guanfeng Liu, Wray Buntine

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Short texts such as tweets often contain insufficient word co-occurrence information for training conventional topic models. To deal with the insufficiency, we propose a generative model that aggregates short texts into clusters by leveraging the associated meta information. Our model can generate more interpretable topics as well as document clusters. We develop an effective Gibbs sampling algorithm favoured by the fully local conjugacy in the model. Extensive experiments demonstrate that our model achieves better performance in terms of document clustering and topic coherence.
Original languageEnglish
Title of host publicationProceedings of the 57th Annual Meeting of the Association for Computational Linguistics
EditorsAnna Korhonen, David Traum, Lluís Màrquez
Place of PublicationFlorence Italy
PublisherAssociation for Computational Linguistics (ACL)
Pages4042-4049
Number of pages8
DOIs
Publication statusPublished - Jul 2019
EventAnnual Meeting of the Association of Computational Linguistics 2019 - Florence, Italy
Duration: 28 Jul 20192 Aug 2019
Conference number: 57th
http://www.acl2019.org/EN/index.xhtml

Conference

ConferenceAnnual Meeting of the Association of Computational Linguistics 2019
Abbreviated titleACL 2019
CountryItaly
CityFlorence
Period28/07/192/08/19
Internet address

Cite this

Zhao, H., Du, L., Liu, G., & Buntine, W. (2019). Leveraging meta information in short text aggregation. In A. Korhonen, D. Traum, & L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4042-4049). [P19-1396] Florence Italy: Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P19-1396
Zhao, He ; Du, Lan ; Liu, Guanfeng ; Buntine, Wray. / Leveraging meta information in short text aggregation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. editor / Anna Korhonen ; David Traum ; Lluís Màrquez. Florence Italy : Association for Computational Linguistics (ACL), 2019. pp. 4042-4049
@inproceedings{ffb776d9c587401abd06f411bd184891,
title = "Leveraging meta information in short text aggregation",
abstract = "Short texts such as tweets often contain insufficient word co-occurrence information for training conventional topic models. To deal with the insufficiency, we propose a generative model that aggregates short texts into clusters by leveraging the associated meta information. Our model can generate more interpretable topics as well as document clusters. We develop an effective Gibbs sampling algorithm favoured by the fully local conjugacy in the model. Extensive experiments demonstrate that our model achieves better performance in terms of document clustering and topic coherence.",
author = "He Zhao and Lan Du and Guanfeng Liu and Wray Buntine",
year = "2019",
month = "7",
doi = "10.18653/v1/P19-1396",
language = "English",
pages = "4042--4049",
editor = "Korhonen, {Anna } and Traum, {David } and M{\`a}rquez, {Llu{\'i}s }",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics (ACL)",

}

Zhao, H, Du, L, Liu, G & Buntine, W 2019, Leveraging meta information in short text aggregation. in A Korhonen, D Traum & L Màrquez (eds), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics., P19-1396, Association for Computational Linguistics (ACL), Florence Italy, pp. 4042-4049, Annual Meeting of the Association of Computational Linguistics 2019, Florence, Italy, 28/07/19. https://doi.org/10.18653/v1/P19-1396

Leveraging meta information in short text aggregation. / Zhao, He; Du, Lan; Liu, Guanfeng; Buntine, Wray.

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ed. / Anna Korhonen; David Traum; Lluís Màrquez. Florence Italy : Association for Computational Linguistics (ACL), 2019. p. 4042-4049 P19-1396.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Leveraging meta information in short text aggregation

AU - Zhao, He

AU - Du, Lan

AU - Liu, Guanfeng

AU - Buntine, Wray

PY - 2019/7

Y1 - 2019/7

N2 - Short texts such as tweets often contain insufficient word co-occurrence information for training conventional topic models. To deal with the insufficiency, we propose a generative model that aggregates short texts into clusters by leveraging the associated meta information. Our model can generate more interpretable topics as well as document clusters. We develop an effective Gibbs sampling algorithm favoured by the fully local conjugacy in the model. Extensive experiments demonstrate that our model achieves better performance in terms of document clustering and topic coherence.

AB - Short texts such as tweets often contain insufficient word co-occurrence information for training conventional topic models. To deal with the insufficiency, we propose a generative model that aggregates short texts into clusters by leveraging the associated meta information. Our model can generate more interpretable topics as well as document clusters. We develop an effective Gibbs sampling algorithm favoured by the fully local conjugacy in the model. Extensive experiments demonstrate that our model achieves better performance in terms of document clustering and topic coherence.

U2 - 10.18653/v1/P19-1396

DO - 10.18653/v1/P19-1396

M3 - Conference Paper

SP - 4042

EP - 4049

BT - Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

A2 - Korhonen, Anna

A2 - Traum, David

A2 - Màrquez, Lluís

PB - Association for Computational Linguistics (ACL)

CY - Florence Italy

ER -

Zhao H, Du L, Liu G, Buntine W. Leveraging meta information in short text aggregation. In Korhonen A, Traum D, Màrquez L, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence Italy: Association for Computational Linguistics (ACL). 2019. p. 4042-4049. P19-1396 https://doi.org/10.18653/v1/P19-1396