Effective identification of similar patients through sequential matching over ICD code embedding

Dang Nguyen, Wei Luo, Svetha Venkatesh, Dinh Phung

Research output: Contribution to journalArticleResearchpeer-review

8 Citations (Scopus)


Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

Original languageEnglish
Article number94
Number of pages13
JournalJournal of Medical Systems
Issue number5
Publication statusPublished - May 2018
Externally publishedYes


  • Cancer
  • Code embedding
  • Patient similarity matching
  • Sequential matching
  • Word2Vec

Cite this