Abstract
Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.
Original language | English |
---|---|
Title of host publication | ACL 2017 |
Subtitle of host publication | The 55th Annual Meeting of the Association for Computational Linguistics - Proceedings of the Conference, Vol. 1 (Long Papers) |
Editors | Regina Barzilay, Min-Yen Kan |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1457-1469 |
Number of pages | 13 |
ISBN (Print) | 9781945626753 |
DOIs | |
Publication status | Published - 2017 |
Event | Annual Meeting of the Association of Computational Linguistics 2017 - Vancouver, Canada Duration: 30 Jul 2017 → 4 Aug 2017 Conference number: 55th https://www.aclweb.org/anthology/events/acl-2017/ (Proceedings) |
Conference
Conference | Annual Meeting of the Association of Computational Linguistics 2017 |
---|---|
Abbreviated title | ACL 2017 |
Country/Territory | Canada |
City | Vancouver |
Period | 30/07/17 → 4/08/17 |
Internet address |
|