Genocide Transcript Corpus (GTC): Topic-Based Paragraph Classification in Genocide-Related Court Transcripts

Introduced by Schirmer et al. in A New Dataset for Topic-Based Paragraph Classification in Genocide-Related Court Transcripts

The Topic-Based Paragraph Classification in Genocide-Related Court Transcripts (GTC) dataset is the first reference corpus annotated with samples from genocide tribunals in different international criminal courts. It is made up of witness statements about violence experienced. The material consists of 1475 text passages with about 40 to 120 pages per transcript, covering 3 tribunals: the Extraordinary Chambers in the Courts of Cambodia (ECCC) - 438 pages, the International Criminal Tribunal for Rwanda (ICTR) - 566 pages, and the International Criminal Tribunal of the Former Yugoslavia (ICTY) - 416 pages. As no datasets of any kind containing genocide court transcripts have been published nor other forms of pre-structured or annotated text data in this field of research exist, the aim was to address this gap by providing a systematically annotated dataset.

Potential use cases include genocide-related inquiry conducted by those who need better to access, explore, and search through extensive documentation on these topics including researchers, lawyers and other practitioners. Broadly, its stated aim is to serve 3 purposes:

(1) to provide a first reference corpus for the community
(2) to establish benchmark performances (using state-of-the-art transformer-based approaches) for the new classification task of paragraph identification of violence-related witness statements
(3) to explore first steps towards transfer learning within the domain

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Genocide Transcript Corpus (GTC): Topic-Based Paragraph Classification in Genocide-Related Court Transcripts

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

Genocide Transcript Corpus (GTC): Topic-Based Paragraph Classification in Genocide-Related Court Transcripts

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages