Abstract
This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts (abbreviated as hybrid texts). Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries, e.g., single-boundary hybrid texts that begin with human-written content and end with machine-generated continuations. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
Original language | English |
---|---|
Title of host publication | Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 |
Editors | Kate Larson |
Place of Publication | Marina del Rey CA USA |
Publisher | International Joint Conferences on Artificial Intelligence |
Pages | 7545-7553 |
Number of pages | 9 |
ISBN (Electronic) | 9781956792041 |
DOIs | |
Publication status | Published - 2024 |
Event | International Joint Conference on Artificial Intelligence, IJCAI 2024 - Jeju, Korea, South Duration: 3 Aug 2024 → 9 Aug 2024 Conference number: 33rd https://www.ijcai.org/Proceedings/2024/ (Proceedings) https://ijcai24.org/ (Website) |
Publication series
Name | IJCAI International Joint Conference on Artificial Intelligence |
---|---|
Publisher | Association for the Advancement of Artificial Intelligence (AAAI) |
ISSN (Print) | 1045-0823 |
Conference
Conference | International Joint Conference on Artificial Intelligence, IJCAI 2024 |
---|---|
Abbreviated title | IJCAI 2024 |
Country/Territory | Korea, South |
City | Jeju |
Period | 3/08/24 → 9/08/24 |
Internet address |
|
Keywords
- Natural Language Processing
- General
- Humans and AI