Bayesian extraction of minimal SCFG rules for hierarchical phrase-based translation

Baskaran Sankaran, Gholamreza Haffari, Anoop Sarkar

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

9 Citations (Scopus)

Abstract

We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space of all possible hierarchical rules; additionally our informed prior based on the lexical alignment probabilities biases the grammar to extract high quality rules leading to improved generalization and the automatic identification of commonly re-used rules. We show that our Bayesian model is able to extract minimal set of hierarchical phrase rules without impacting the translation quality as measured by the BLEU score.

Original languageEnglish
Title of host publicationProceedings of the Sixth Workshop on Statistical Machine Translation
EditorsChris Callison-Burch, Phillip Koehn, Christof Monz, Omar F Zaidan
Place of PublicationStroudsburg PA USA
PublisherAssociation for Computational Linguistics (ACL)
Pages533 - 541
Number of pages9
ISBN (Electronic)9781937284121
ISBN (Print)9781937284121
Publication statusPublished - 2011
EventWorkshop on Statistical Machine Translation (WMT) 2011 - Edinburgh, United Kingdom
Duration: 30 Jul 201131 Jul 2011
Conference number: 6th
http://statmt.org/wmt11/

Workshop

WorkshopWorkshop on Statistical Machine Translation (WMT) 2011
Abbreviated titleWMT 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period30/07/1131/07/11
OtherEMNLP 2011 SIXTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION
Internet address

Cite this