Search Results for author: Holy Lovenia

Found 27 papers, 14 papers with code

Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • RepL4NLP (ACL) 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Zeng Min, Samuel Cahyawijaya, Dan Su, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition

1 code implementation • LREC 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

no code implementations • 9 Apr 2024 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes.

Decoder

Paper
Add Code

LLMs Are Few-Shot In-Context Low-Resource Language Learners

no code implementations • 25 Mar 2024 • Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages.

In-Context Learning

Paper
Add Code

Contrastive Learning for Inference in Dialogue

1 code implementation • 19 Oct 2023 • Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung

Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker.

Contrastive Learning

Paper
Code

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

1 code implementation • 13 Oct 2023 • Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Pascale Fung

We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning.

Dialogue State Tracking Informativeness +4

Paper
Code

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

no code implementations • 9 Oct 2023 • Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects.

Hallucination Object +2

Paper
Add Code

Survey of Social Bias in Vision-Language Models

no code implementations • 24 Sep 2023 • Nayeon Lee, Yejin Bang, Holy Lovenia, Samuel Cahyawijaya, Wenliang Dai, Pascale Fung

This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.

Fairness

Paper
Add Code

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

1 code implementation • 19 Sep 2023 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.

Document Translation Translation

Paper
Code

PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems

1 code implementation • 19 Sep 2023 • Bryan Wilie, Yan Xu, Willy Chung, Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses.

Hallucination Language Modelling +1

Paper
Code

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

1 code implementation • 26 Jun 2023 • Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly.

Data Augmentation Speech Emotion Recognition

Paper
Code

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

1 code implementation • 23 May 2023 • Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung

Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting.

Paper
Code

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

no code implementations • 23 Mar 2023 • Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji

While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research.

Paper
Add Code

Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue

1 code implementation • 28 Feb 2023 • Holy Lovenia, Samuel Cahyawijaya, Pascale Fung

The demand for multimodal dialogue systems has been rising in various domains, emphasizing the importance of interpreting multimodal inputs from conversational and situational contexts.

Paper
Code

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

1 code implementation • 8 Feb 2023 • Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung

It is, for example, better at deductive than inductive reasoning.

Code Generation Hallucination +4

Paper
Code

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

252

Paper
Code

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

1 code implementation • 25 Oct 2022 • Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, Pascale Fung

Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA.

Language Modelling

Paper
Code

What Did I Just Hear? Detecting Pornographic Sounds in Adult Videos Using Neural Networks

no code implementations • 8 Sep 2022 • Holy Lovenia, Dessi Puji Lestari, Rita Frieske

Audio-based pornographic detection enables efficient adult content filtering without sacrificing performance by exploiting distinct spectral characteristics.

Paper
Add Code

Every picture tells a story: Image-grounded controllable stylistic story generation

no code implementations • LaTeCHCLfL (COLING) 2022 • Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung

Generating a short story out of an image is arduous.

Image Captioning Story Generation

Paper
Add Code

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages

no code implementations • 21 Jul 2022 • Samuel Cahyawijaya, Alham Fikri Aji, Holy Lovenia, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Fajri Koto, David Moeljadi, Karissa Vincentio, Ade Romadhony, Ayu Purwarianti

At the center of the underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity.

Paper
Add Code

Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition

no code implementations • 1 Jun 2022 • Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura

Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.

blind source separation EEG +1

Paper
Add Code

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • 30 Mar 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya, Su Dan, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

1 code implementation • 11 Jan 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

1 code implementation • LREC 2022 • Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

2 code implementations • LREC 2022 • Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.

124

Paper
Code

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

no code implementations • 14 Sep 2021 • Samuel Cahyawijaya, Genta Indra Winata, Holy Lovenia, Bryan Wilie, Wenliang Dai, Etsuko Ishii, Pascale Fung

While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably.

Paper
Add Code

Nora: The Well-Being Coach

no code implementations • 1 Jun 2021 • Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung

The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions.

Natural Language Understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.