A Word Embeddings Informed Focused Topic Model

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    Abstract

    In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.
    LanguageEnglish
    Title of host publication2017 Ninth Asian Conference on Machine Learning, ACML 2017
    Subtitle of host publication15-17 November 2017, Seoul, Korea, Proceedings
    EditorsMin-Ling Zhang, Yung-Kyun Noh
    PublisherPMLR
    Pages423-438
    Number of pages16
    Volume77
    Publication statusPublished - 2017
    EventAsian Conference on Machine Learning 2017 - Yonsei University, Seoul, Korea, Republic of (South)
    Duration: 15 Nov 201717 Nov 2017
    Conference number: 9th
    http://www.acml-conf.org/2017/

    Publication series

    NameProceedings of Machine Learning Research
    PublisherPMLR
    Volume77
    ISSN (Print)1938-7228

    Conference

    ConferenceAsian Conference on Machine Learning 2017
    Abbreviated titleACML 2017
    CountryKorea, Republic of (South)
    CitySeoul
    Period15/11/1717/11/17
    Internet address

    Cite this

    Zhao, H., Du, L., & Buntine, W. (2017). A Word Embeddings Informed Focused Topic Model. In M-L. Zhang, & Y-K. Noh (Eds.), 2017 Ninth Asian Conference on Machine Learning, ACML 2017: 15-17 November 2017, Seoul, Korea, Proceedings (Vol. 77, pp. 423-438). (Proceedings of Machine Learning Research; Vol. 77). PMLR.
    Zhao, He ; Du, Lan ; Buntine, Wray. / A Word Embeddings Informed Focused Topic Model. 2017 Ninth Asian Conference on Machine Learning, ACML 2017: 15-17 November 2017, Seoul, Korea, Proceedings. editor / Min-Ling Zhang ; Yung-Kyun Noh. Vol. 77 PMLR, 2017. pp. 423-438 (Proceedings of Machine Learning Research).
    @inproceedings{607dc780eb4a46ec8c52710681d5396e,
    title = "A Word Embeddings Informed Focused Topic Model",
    abstract = "In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.",
    author = "He Zhao and Lan Du and Wray Buntine",
    year = "2017",
    language = "English",
    volume = "77",
    series = "Proceedings of Machine Learning Research",
    publisher = "PMLR",
    pages = "423--438",
    editor = "Min-Ling Zhang and Yung-Kyun Noh",
    booktitle = "2017 Ninth Asian Conference on Machine Learning, ACML 2017",

    }

    Zhao, H, Du, L & Buntine, W 2017, A Word Embeddings Informed Focused Topic Model. in M-L Zhang & Y-K Noh (eds), 2017 Ninth Asian Conference on Machine Learning, ACML 2017: 15-17 November 2017, Seoul, Korea, Proceedings. vol. 77, Proceedings of Machine Learning Research, vol. 77, PMLR, pp. 423-438, Asian Conference on Machine Learning 2017, Seoul, Korea, Republic of (South), 15/11/17.

    A Word Embeddings Informed Focused Topic Model. / Zhao, He; Du, Lan; Buntine, Wray.

    2017 Ninth Asian Conference on Machine Learning, ACML 2017: 15-17 November 2017, Seoul, Korea, Proceedings. ed. / Min-Ling Zhang; Yung-Kyun Noh. Vol. 77 PMLR, 2017. p. 423-438 (Proceedings of Machine Learning Research; Vol. 77).

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    TY - GEN

    T1 - A Word Embeddings Informed Focused Topic Model

    AU - Zhao, He

    AU - Du, Lan

    AU - Buntine, Wray

    PY - 2017

    Y1 - 2017

    N2 - In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.

    AB - In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings. Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model. We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.

    M3 - Conference Paper

    VL - 77

    T3 - Proceedings of Machine Learning Research

    SP - 423

    EP - 438

    BT - 2017 Ninth Asian Conference on Machine Learning, ACML 2017

    A2 - Zhang, Min-Ling

    A2 - Noh, Yung-Kyun

    PB - PMLR

    ER -

    Zhao H, Du L, Buntine W. A Word Embeddings Informed Focused Topic Model. In Zhang M-L, Noh Y-K, editors, 2017 Ninth Asian Conference on Machine Learning, ACML 2017: 15-17 November 2017, Seoul, Korea, Proceedings. Vol. 77. PMLR. 2017. p. 423-438. (Proceedings of Machine Learning Research).