Search Results for author: Ilias Chalkidis

Found 40 papers, 19 papers with code

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation • EMNLP 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

Paper
Code

Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs

1 code implementation • 20 Mar 2024 • Ilias Chalkidis, Stephanie Brandl

Instruction-finetuned Large Language Models inherit clear political leanings that have been shown to influence downstream task performance.

Paper
Code

On the Interplay between Fairness and Explainability

no code implementations • 25 Oct 2023 • Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis

In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable.

Fairness Multi Class Text Classification +2

Paper
Add Code

Rather a Nurse than a Physician -- Contrastive Explanations under Investigation

no code implementations • 18 Oct 2023 • Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl

A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans.

text-classification Text Classification

Paper
Add Code

Regulation and NLP (RegNLP): Taming Large Language Models

no code implementations • 9 Oct 2023 • Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordas, Gerasimos Spanakis

Regulation studies are a rich source of knowledge on how to systematically deal with risk and uncertainty, as well as with scientific evidence, to evaluate and compare regulatory options.

Ethics

Paper
Add Code

SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

2 code implementations • 15 Jun 2023 • Vishvaksenan Rasiah, Ronja Stern, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus

In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks).

Information Retrieval Language Modelling +2

Paper
Code

PokemonChat: Auditing ChatGPT for Pokémon Universe Knowledge

no code implementations • 5 Jun 2023 • Laura Cabello, Jiaang Li, Ilias Chalkidis

We then evaluate its ability to acquire new knowledge and include it in its reasoning process.

Information Retrieval Question Answering +1

Paper
Add Code

MultiLegalPile: A 689GB Multilingual Legal Corpus

no code implementations • 3 Jun 2023 • Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho

Large, high-quality datasets are crucial for training Large Language Models (LLMs).

Paper
Add Code

Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

no code implementations • 25 May 2023 • Daniel Saggau, Mina Rezaei, Bernd Bischl, Ilias Chalkidis

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines.

Contrastive Learning Information Retrieval +5

Paper
Add Code

Retrieval-augmented Multi-label Text Classification

no code implementations • 22 May 2023 • Ilias Chalkidis, Yova Kementchedjhieva

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution.

Multi Label Text Classification Multi-Label Text Classification +2

Paper
Add Code

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

1 code implementation • 12 May 2023 • Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard

To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs.

Knowledge Probing Language Modelling

Paper
Code

An Exploration of Encoder-Decoder Approaches to Multi-Label Classification for Legal and Biomedical Text

1 code implementation • 9 May 2023 • Yova Kementchedjhieva, Ilias Chalkidis

Standard methods for multi-label text classification largely rely on encoder-only pre-trained language models, whereas encoder-decoder models have proven more effective in other classification tasks.

Multi-Label Classification Multi Label Text Classification +2

Paper
Code

ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark

1 code implementation • 9 Mar 2023 • Ilias Chalkidis

Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3. 5 model, `gpt-3. 5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format.

Instruction Following

Paper
Code

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

1 code implementation • 30 Jan 2023 • Joel Niklaus, Veton Matoshi, Pooja Rani, Andrea Galassi, Matthias Stürmer, Ilias Chalkidis

To provide a fair comparison, we propose two aggregate scores, one based on the datasets and one on the languages.

XLM-R

Paper
Code

Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer

no code implementations • 2 Nov 2022 • Dimitris Mamakas, Petros Tsotsi, Ion Androutsopoulos, Ilias Chalkidis

Even sparse-attention models, such as Longformer and BigBird, which increase the maximum input length to 4, 096 sub-words, severely truncate texts in three of the six datasets of LexGLUE.

Document Classification

Paper
Add Code

Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models

no code implementations • 24 Oct 2022 • Stelios Maroudas, Sotiris Legkas, Prodromos Malakasiotis, Ilias Chalkidis

In the era of billion-parameter-sized Language Models (LMs), start-ups have to follow trends and adapt their technology accordingly.

Knowledge Distillation Model Compression +2

Paper
Add Code

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

no code implementations • 11 Oct 2022 • Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.

Document Classification

Paper
Add Code

An Empirical Study on Cross-X Transfer for Legal Judgment Prediction

2 code implementations • 25 Sep 2022 • Joel Niklaus, Matthias Stürmer, Ilias Chalkidis

We find that in both settings (legal areas, origin regions), models trained across all groups perform overall better, while they also have improved results in the worst-case scenarios.

Cross-Lingual Transfer Transfer Learning

315

Paper
Code

Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

no code implementations • 8 Jun 2022 • Stratos Xenouleas, Alexia Tsoukara, Giannis Panagiotakis, Ilias Chalkidis, Ion Androutsopoulos

We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset.

Topic Classification Translation +1

Paper
Add Code

Revisiting Transformer-based Models for Long Document Classification

1 code implementation • 14 Apr 2022 • Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott

The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).

Document Classification text-classification

Paper
Code

Challenges and Strategies in Cross-Cultural NLP

no code implementations • ACL 2022 • Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages.

Cultural Vocal Bursts Intensity Prediction Multilingual NLP

Paper
Add Code

Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting

1 code implementation • Findings (ACL) 2022 • Ilias Chalkidis, Anders Søgaard

In document classification for, e. g., legal and biomedical text, we often deal with hundreds of classes, including very infrequent ones, as well as temporal concept drift caused by the influence of real world events, e. g., policy changes, conflicts, or pandemics.

Document Classification Multi-Label Classification

Paper
Code

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

1 code implementation • ACL 2022 • Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, Anders Søgaard

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks.

Fairness

Paper
Code

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

1 code implementation • ACL 2022 • Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras

We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1. 1M sentences with gold XBRL tags.

TAG

Paper
Code

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

1 code implementation • ACL 2022 • Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras

Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text.

Ranked #1 on Natural Language Understanding on LexGLUE

Multi-class Classification Multi-Label Classification +3

162

Paper
Code

Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark

1 code implementation • EMNLP (NLLP) 2021 • Joel Niklaus, Ilias Chalkidis, Matthias Stürmer

We evaluate state-of-the-art BERT-based methods including two variants of BERT that overcome the BERT input (text) length limitation (up to 512 tokens).

Paper
Code

Multi-granular Legal Topic Classification on Greek Legislation

1 code implementation • EMNLP (NLLP) 2021 • Christos Papaloukas, Ilias Chalkidis, Konstantinos Athinaios, Despina-Athanasia Pantazi, Manolis Koubarakis

In this work, we study the task of classifying legal texts written in the Greek language.

text-classification Text Classification +3

Paper
Code

MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

1 code implementation • 2 Sep 2021 • Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target).

Document Classification Topic Classification +1

Paper
Code

Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases

no code implementations • NAACL 2021 • Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis

We also release a new dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales.

Paper
Add Code

Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

no code implementations • EACL 2021 • Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis

Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies.

domain classification Information Retrieval +2

Paper
Add Code

Neural Contract Element Extraction Revisited: Letters from Sesame Street

no code implementations • 12 Jan 2021 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos

Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance.

POS TAG

Paper
Add Code

Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations

no code implementations • EMNLP (spnlp) 2020 • Nikolaos Manginas, Ilias Chalkidis, Prodromos Malakasiotis

Although BERT is widely used by the NLP community, little is known about its inner workings.

General Classification Multilabel Text Classification +2

Paper
Add Code

LEGAL-BERT: The Muppets straight out of Law School

no code implementations • Findings of the Association for Computational Linguistics 2020 • Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains.

Paper
Add Code

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

1 code implementation • EMNLP 2020 • Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWANs.

Multi-Label Classification Multi Label Text Classification +5