Abstract
We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space of all possible hierarchical rules; additionally our informed prior based on the lexical alignment probabilities biases the grammar to extract high quality rules leading to improved generalization and the automatic identification of commonly re-used rules. We show that our Bayesian model is able to extract minimal set of hierarchical phrase rules without impacting the translation quality as measured by the BLEU score.
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth Workshop on Statistical Machine Translation |
Editors | Chris Callison-Burch, Phillip Koehn, Christof Monz, Omar F Zaidan |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 533 - 541 |
Number of pages | 9 |
ISBN (Electronic) | 9781937284121 |
ISBN (Print) | 9781937284121 |
Publication status | Published - 2011 |
Event | Workshop on Statistical Machine Translation (WMT) 2011 - Edinburgh, United Kingdom Duration: 30 Jul 2011 → 31 Jul 2011 Conference number: 6th http://statmt.org/wmt11/ |
Workshop
Workshop | Workshop on Statistical Machine Translation (WMT) 2011 |
---|---|
Abbreviated title | WMT 2011 |
Country/Territory | United Kingdom |
City | Edinburgh |
Period | 30/07/11 → 31/07/11 |
Other | EMNLP 2011 SIXTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION |
Internet address |