no code implementations • MTSummit 2021 • Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence.
no code implementations • Findings (NAACL) 2022 • Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i. e., generative commonsense reasoning).
no code implementations • ACL (WAT) 2021 • Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
Automatic speech recognition (ASR) is arguably the most critical component of such systems, as errors in speech recognition propagate to the downstream components and drastically degrade the user experience.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • CCGPK (COLING) 2022 • SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim
As a result of the experiment, we present that the FoCus model could not correctly blend the knowledge according to the input dialogue and that the dataset design is unsuitable for the multi-turn conversation.
no code implementations • LREC 2022 • Chanjun Park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
In recent years, there has been an increasing need for the restoration and translation of historical languages.
no code implementations • LREC 2022 • Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo, Heuiseok Lim
This study has several limitations, considering the data acquisition, because there is no official dataset for most language pairs.
no code implementations • LREC 2022 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim
We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models.
no code implementations • EMNLP (insights) 2021 • Chanjun Park, Sungjin Park, Seolhwa Lee, Taesun Whang, Heuiseok Lim
In the field of natural language processing, ensembles are broadly known to be effective in improving performance.
no code implementations • 25 Apr 2024 • Hyeonseok Moon, SeungYoon Lee, Seongtae Hong, Seungjun Lee, Chanjun Park, Heuiseok Lim
In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation.
no code implementations • 5 Apr 2024 • Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park
We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability.
1 code implementation • 1 Apr 2024 • Jihoo Kim, Wonho Song, Dahyun Kim, Yunsu Kim, Yungi Kim, Chanjun Park
This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework.
no code implementations • 28 Mar 2024 • Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park
As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important.
1 code implementation • 28 Mar 2024 • Hyunbyung Park, Sukyung Lee, Gyoungjin Gim, Yungi Kim, Dahyun Kim, Chanjun Park
To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core.
no code implementations • 4 Mar 2024 • Chanjun Park, Minsoo Khang, Dahyun Kim
This paper delves into the contrasting roles of data within academic and industrial spheres, highlighting the divergence between Data-Centric AI and Model-Agnostic AI approaches.
no code implementations • 26 Jan 2024 • Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
To effectively address this, it is imperative to consider both the speech-level, crucial for recognition accuracy, and the text-level, critical for user-friendliness.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 Jan 2024 • SeungYoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim
We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative.
2 code implementations • 23 Dec 2023 • Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
We introduce SOLAR 10. 7B, a large language model (LLM) with 10. 7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks.
no code implementations • 26 Jun 2023 • Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages.
no code implementations • 26 Jun 2023 • Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding.
no code implementations • 26 Jun 2023 • Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
Data-centric AI approach aims to enhance the model performance without modifying the model and has been shown to impact model performance positively.
no code implementations • 26 Jun 2023 • Damrin Kim, NamHyeok Kim, Chanjun Park, Harksoo Kim
This paper presents a novel approach of leveraging Inter-Annotator Agreement (IAA), traditionally used for assessing labeling consistency, to optimize Data Management Operations (DMOps).
no code implementations • 26 Jun 2023 • NamHyeok Kim, Chanjun Park
Inter-Annotator Agreement (IAA) is commonly used as a measure of label consistency in natural language processing tasks.
no code implementations • 20 Mar 2023 • Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting.
no code implementations • 2 Jan 2023 • Eujeong Choi, Chanjun Park
Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline.
no code implementations • COLING 2022 • Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing.
no code implementations • 14 Sep 2022 • Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Heuiseok Lim
This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution.
no code implementations • 10 Jan 2022 • Jungseob Lee, Midan Shim, Suhyune Son, Chanjun Park, Yujin Kim, Heuiseok Lim
BlenderBot 2. 0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session.
no code implementations • 8 Dec 2021 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim
We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models.
no code implementations • 24 Nov 2021 • Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim
Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions.
no code implementations • 1 Nov 2021 • Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
Building of data for quality estimation (QE) training is expensive and requires significant human labor.
no code implementations • 30 Oct 2021 • Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
Generative commonsense reasoning is the capability of a language model to generate a sentence with a given concept-set that is based on commonsense knowledge.
no code implementations • 30 Oct 2021 • Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
This paper proposes a tool for efficiently constructing high-quality parallel corpora with minimizing human labor and making this tool publicly available.
no code implementations • 28 Oct 2021 • Chanjun Park, Midan Shim, Sugyeong Eo, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT.
no code implementations • 29 Sep 2021 • Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
To the best of our knowledge, our approach is the first method to apply multi-task learning to the dialogue summarization task.
no code implementations • 27 Sep 2021 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim
In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities.
no code implementations • NAACL 2021 • Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
We derive an optimal subword tokenization result for Korean-English machine translation by conducting a case study that combines the subword tokenization method, morphological segmentation, and vocabulary method.