Abstract
We propose a principled probabilisitc framework which uses trees over the vocabulary to capture similarities among terms in an information retrieval setting. This allows the retrieval of documents based not just on occurrences of specific query terms, but also on similarities between terms (an effect similar to query expansion). Additionally our principled generative model exhibits an effect similar to inverse document frequency. We give encouraging experimental evidence of the superiority of the hierarchical Dirichlet tree compared to standard baselines.
Original language | English |
---|---|
Title of host publication | NAACL HLT 2009 - Human Language Technologies |
Subtitle of host publication | The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference |
Editors | Michael Collins, Shri Narayanan, Douglas W Oard, Lucy Vanderwende |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 173-181 |
Number of pages | 9 |
ISBN (Print) | 9781932432411 |
Publication status | Published - 2009 |
Externally published | Yes |
Event | North American Association for Computational Linguistics 2009 - University of Colorado at Boulder, Boulder, United States of America Duration: 31 May 2009 → 5 Jun 2009 Conference number: 10th http://clear.colorado.edu/NAACLHLT2009/ |
Conference
Conference | North American Association for Computational Linguistics 2009 |
---|---|
Abbreviated title | NAACL HLT 2009 |
Country/Territory | United States of America |
City | Boulder |
Period | 31/05/09 → 5/06/09 |
Internet address |