Regularizing topic discovery in EMRs with side information by using hierarchical bayesian models

Cheng Li, Santu Rana, Dinh Phung, Svetha Venkatesh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)


We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Original languageEnglish
Title of host publicationProceedings - 22nd International Conference on Pattern Recognition - ICPR 2014
Subtitle of host publication24–28 August 2014 Stockholm, Sweden
EditorsAnders Heyden, Denis Laurendeau, Michael Felsberg
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages6
ISBN (Electronic)9781479952083, 9781479952090
Publication statusPublished - 2014
Externally publishedYes
EventInternational Conference on Pattern Recognition 2014 - Stockholm, Sweden
Duration: 24 Aug 201428 Aug 2014
Conference number: 22nd (Proceedings)


ConferenceInternational Conference on Pattern Recognition 2014
Abbreviated titleICPR 2014
Internet address


  • Medical application
  • Readmission
  • Side information
  • Topic analysis
  • Tree structure
  • Words

Cite this