Hierarchical dirichlet trees for information retrieval

Gholamreza Haffari, Yee Whye Teh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

5 Citations (Scopus)

Abstract

We propose a principled probabilisitc framework which uses trees over the vocabulary to capture similarities among terms in an information retrieval setting. This allows the retrieval of documents based not just on occurrences of specific query terms, but also on similarities between terms (an effect similar to query expansion). Additionally our principled generative model exhibits an effect similar to inverse document frequency. We give encouraging experimental evidence of the superiority of the hierarchical Dirichlet tree compared to standard baselines.

Original languageEnglish
Title of host publicationNAACL HLT 2009 - Human Language Technologies
Subtitle of host publicationThe 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference
EditorsMichael Collins, Shri Narayanan, Douglas W Oard, Lucy Vanderwende
Place of PublicationStroudsburg PA USA
PublisherAssociation for Computational Linguistics (ACL)
Pages173-181
Number of pages9
ISBN (Print)9781932432411
Publication statusPublished - 2009
Externally publishedYes
EventNorth American Association for Computational Linguistics 2009 - University of Colorado at Boulder, Boulder, United States of America
Duration: 31 May 20095 Jun 2009
Conference number: 10th
http://clear.colorado.edu/NAACLHLT2009/

Conference

ConferenceNorth American Association for Computational Linguistics 2009
Abbreviated titleNAACL HLT 2009
Country/TerritoryUnited States of America
CityBoulder
Period31/05/095/06/09
Internet address

Cite this