Search Results for author: Fajri Koto

Found 30 papers, 18 papers with code

Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian

1 code implementation CSRR (ACL) 2022 Fajri Koto, Timothy Baldwin, Jey Han Lau

Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.

Cloze Test Sentence +1

Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature

no code implementations ALTA 2021 Fajri Koto, Biaoyan Fang

In this paper, we investigate the utility of modern pretrained language models for the evidence grading system in the medical literature based on the ALTA 2021 shared task.

Easy-First Bottom-Up Discourse Parsing via Sequence Labelling

no code implementations COLING (CODI, CRAC) 2022 Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin

We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).

Discourse Parsing

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces

no code implementations2 Apr 2024 Fajri Koto, Rahmad Mahendra, Nurul Aisyah, Timothy Baldwin

Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies on language models have predominantly centered on English cultures, potentially resulting in an Anglocentric bias.

Language Modelling

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

no code implementations3 Feb 2024 Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin

Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages.

Sentence Sentiment Analysis

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU

1 code implementation7 Oct 2023 Fajri Koto, Nurul Aisyah, Haonan Li, Timothy Baldwin

In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia.

Multi-task Language Understanding World Knowledge

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

1 code implementation15 Sep 2023 Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych

Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground.

Question Answering

CMMLU: Measuring massive multitask language understanding in Chinese

1 code implementation15 Jun 2023 Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.

Large Language Model

Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation

1 code implementation24 May 2023 Haonan Li, Fajri Koto, Minghao Wu, Alham Fikri Aji, Timothy Baldwin

However, research on multilingual instruction tuning has been limited due to the scarcity of high-quality instruction-response datasets across different languages.

Instruction Following

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation EMNLP 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation Findings (ACL) 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

Discourse Probing of Pretrained Language Models

1 code implementation NAACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

Sentence

Top-down Discourse Parsing via Sequence Labelling

1 code implementation EACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)

Discourse Parsing

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations27 Nov 2020 Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering Semantic Textual Similarity +2

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations COLING 2020 Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Benchmarking Language Modelling

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

1 code implementation PACLIC 2020 Fajri Koto, Ikhwan Koto

Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource.

Machine Translation Sentiment Analysis +2

Improved Document Modelling with a Neural Discourse Parser

1 code implementation ALTA 2019 Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization

no code implementations LREC 2016 Fajri Koto

In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization.

Cannot find the paper you are looking for? You can Submit a new open access paper.