Self supervised BERT for legal text classification

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

5 Citations (Scopus)

Abstract

Critical BERT-based text classification tasks, such as legal text classification, require huge amounts of accurately labeled data. Legal text classification faces two trivial problems: labeling legal data is a sensitive process and can only be carried out by skilled professionals, and legal text is prone to privacy issues hence not all the data can be made available in the public domain. This means that we have limited diversity in the textual data, and to account for this data paucity, we propose a self-supervision approach to train Legal-BERT classifiers. We use the BERT text classifier's knowledge of the class boundaries and perform gradient ascent w.r.t. class logits. Synthetic latent texts are generated through activation maximization. The main advantages over existing SOTAs are that our model: is easy to train, does not require much data but instead uses the synthesized data as fake samples; has less variance that helps to generate texts with good sample quality and diversity. We show the efficacy of the proposed method on the ECHR Violation (Multi-Label) Dataset and the Over-ruling Task Dataset.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
EditorsAggelos Pikrakis, Thomas Feuillen
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages12186-12190
Number of pages5
ISBN (Electronic)9781728163277
ISBN (Print)9781728163284
DOIs
Publication statusPublished - 2023
EventIEEE International Conference on Acoustics, Speech and Signal Processing 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023
https://ieeexplore.ieee.org/xpl/conhome/10094559/proceeding (Proceedings)
https://2023.ieeeicassp.org/ (Website)

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing 2023
Abbreviated titleICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23
Internet address

Keywords

  • BERT
  • Legal Text
  • Self-supervision
  • Text classification

Cite this