Abstract
Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective.However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence.The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles.Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
Editors | Ryan Cotterell, Maarten Sap, Lifu Huang |
Place of Publication | Kerrville TX USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 92-100 |
Number of pages | 9 |
Volume | 2 |
ISBN (Electronic) | 9798891761155 |
Publication status | Published - 2024 |
Event | North American Association for Computational Linguistics 2024 - Mexico City, Mexico Duration: 16 Jun 2024 → 21 Jun 2024 https://2024.naacl.org/ (Website) https://aclanthology.org/2024.naacl-short.0/ (Proceedings) https://aclanthology.org/volumes/2024.findings-naacl/ (Proceedings) |
Conference
Conference | North American Association for Computational Linguistics 2024 |
---|---|
Abbreviated title | NAACL 2024 |
Country/Territory | Mexico |
City | Mexico City |
Period | 16/06/24 → 21/06/24 |
Internet address |
|