Abstract
In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining |
Subtitle of host publication | 19th Pacific-Asia Conference, PAKDD 2015 Ho Chi Minh City, Vietnam, May 19–22, 2015 Proceedings, Part I |
Editors | Tru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, Hiroshi Motoda |
Place of Publication | Cham Switzerland |
Publisher | Springer |
Pages | 550-562 |
Number of pages | 13 |
ISBN (Electronic) | 9783319180380 |
ISBN (Print) | 9783319180373 |
DOIs | |
Publication status | Published - 2015 |
Externally published | Yes |
Event | Pacific-Asia Conference on Knowledge Discovery and Data Mining 2015 - Ho Chi Minh City, Vietnam Duration: 19 May 2015 → 22 May 2015 Conference number: 19th https://web.archive.org/web/20150429212339/http://www.pakdd2015.jvn.edu.vn/ https://link.springer.com/book/10.1007/978-3-319-18038-0 (Proceedings) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 9077 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Pacific-Asia Conference on Knowledge Discovery and Data Mining 2015 |
---|---|
Abbreviated title | PAKDD 2015 |
Country/Territory | Vietnam |
City | Ho Chi Minh City |
Period | 19/05/15 → 22/05/15 |
Internet address |