Abstract
This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts (abbreviated as hybrid texts). Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries, e.g., single-boundary hybrid texts that begin with human-written content and end with machine-generated continuations. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 |
| Editors | Kate Larson |
| Place of Publication | Marina del Rey CA USA |
| Publisher | Association for the Advancement of Artificial Intelligence (AAAI) |
| Pages | 7545-7553 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781956792041 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | International Joint Conference on Artificial Intelligence 2024 - Jeju, Korea, South Duration: 3 Aug 2024 → 9 Aug 2024 Conference number: 33rd https://www.ijcai.org/Proceedings/2024/ (Proceedings) https://ijcai24.org/ (Website) |
Publication series
| Name | IJCAI International Joint Conference on Artificial Intelligence |
|---|---|
| Publisher | Association for the Advancement of Artificial Intelligence (AAAI) |
| ISSN (Print) | 1045-0823 |
Conference
| Conference | International Joint Conference on Artificial Intelligence 2024 |
|---|---|
| Abbreviated title | IJCAI 2024 |
| Country/Territory | Korea, South |
| City | Jeju |
| Period | 3/08/24 → 9/08/24 |
| Internet address |
|
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Natural Language Processing
- General
- Humans and AI
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver