Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

Adham Beykikhoshk, Ognjen Arandjelović, Svetha Venkatesh, Dinh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

14 Citations (Scopus)

Abstract

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication19th Pacific-Asia Conference, PAKDD 2015 Ho Chi Minh City, Vietnam, May 19–22, 2015 Proceedings, Part I
EditorsTru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, Hiroshi Motoda
Place of PublicationCham Switzerland
PublisherSpringer
Pages550-562
Number of pages13
ISBN (Electronic)9783319180380
ISBN (Print)9783319180373
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2015 - Ho Chi Minh City, Vietnam
Duration: 19 May 201522 May 2015
Conference number: 19th
https://web.archive.org/web/20150429212339/http://www.pakdd2015.jvn.edu.vn/
https://link.springer.com/book/10.1007/978-3-319-18038-0

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume9077
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2015
Abbreviated titlePAKDD 2015
CountryVietnam
CityHo Chi Minh City
Period19/05/1522/05/15
Internet address

Cite this