Search Results for author: Chanjun Park

Found 36 papers, 3 papers with code

Dealing with the Paradox of Quality Estimation

no code implementations MTSummit 2021 Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence.

Machine Translation Sentence +1

A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

no code implementations Findings (NAACL) 2022 Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i. e., generative commonsense reasoning).

Language Modelling Natural Language Understanding +2

BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text

no code implementations ACL (WAT) 2021 Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim

Automatic speech recognition (ASR) is arguably the most critical component of such systems, as errors in speech recognition propagate to the downstream components and drastically degrade the user experience.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?

no code implementations CCGPK (COLING) 2022 SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim

As a result of the experiment, we present that the FoCus model could not correctly blend the knowledge according to the input dialogue and that the dataset design is unsuitable for the multi-turn conversation.

Dialogue Generation Question Answering

FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

no code implementations LREC 2022 Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim

We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models.

Translation of Multifaceted Data without Re-Training of Machine Translation Systems

no code implementations25 Apr 2024 Hyeonseok Moon, SeungYoon Lee, Seongtae Hong, Seungjun Lee, Chanjun Park, Heuiseok Lim

In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation.

Machine Translation Question Generation +3

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

no code implementations5 Apr 2024 Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park

We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability.

Mathematical Reasoning

Evalverse: Unified and Accessible Library for Large Language Model Evaluation

1 code implementation1 Apr 2024 Jihoo Kim, Wonho Song, Dahyun Kim, Yunsu Kim, Yungi Kim, Chanjun Park

This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework.

Language Modelling Large Language Model

sDPO: Don't Use Your Data All at Once

no code implementations28 Mar 2024 Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park

As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important.

Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

1 code implementation28 Mar 2024 Hyunbyung Park, Sukyung Lee, Gyoungjin Gim, Yungi Kim, Dahyun Kim, Chanjun Park

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core.

Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism

no code implementations4 Mar 2024 Chanjun Park, Minsoo Khang, Dahyun Kim

This paper delves into the contrasting roles of data within academic and industrial spheres, highlighting the divergence between Data-Centric AI and Model-Agnostic AI approaches.

Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

no code implementations26 Jan 2024 SeungYoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim

We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative.

Specificity

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

no code implementations26 Jun 2023 Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages.

Machine Translation Prompt Engineering +2

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

no code implementations26 Jun 2023 Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding.

Text Generation

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

no code implementations26 Jun 2023 Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

Data-centric AI approach aims to enhance the model performance without modifying the model and has been shown to impact model performance positively.

Grammatical Error Correction

Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)

no code implementations26 Jun 2023 Damrin Kim, NamHyeok Kim, Chanjun Park, Harksoo Kim

This paper presents a novel approach of leveraging Inter-Annotator Agreement (IAA), traditionally used for assessing labeling consistency, to optimize Data Management Operations (DMOps).

Management

Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios

no code implementations26 Jun 2023 NamHyeok Kim, Chanjun Park

Inter-Annotator Agreement (IAA) is commonly used as a measure of label consistency in natural language processing tasks.

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

no code implementations20 Mar 2023 Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim

Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting.

DMOps: Data Management Operation and Recipes

no code implementations2 Jan 2023 Eujeong Choi, Chanjun Park

Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline.

Management

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

no code implementations14 Sep 2022 Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Heuiseok Lim

This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution.

Cross-Lingual Transfer Transfer Learning

There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

no code implementations10 Jan 2022 Jungseob Lee, Midan Shim, Suhyune Son, Chanjun Park, Yujin Kim, Heuiseok Lim

BlenderBot 2. 0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session.

FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

no code implementations8 Dec 2021 Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim

We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models.

A Self-Supervised Automatic Post-Editing Data Generation Tool

no code implementations24 Nov 2021 Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim

Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions.

Automatic Post-Editing

A New Tool for Efficiently Generating Quality Estimation Datasets

no code implementations1 Nov 2021 Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim

Building of data for quality estimation (QE) training is expensive and requires significant human labor.

Data Augmentation

Automatic Knowledge Augmentation for Generative Commonsense Reasoning

no code implementations30 Oct 2021 Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

Generative commonsense reasoning is the capability of a language model to generate a sentence with a given concept-set that is based on commonsense knowledge.

Language Modelling Sentence

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

no code implementations30 Oct 2021 Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim

This paper proposes a tool for efficiently constructing high-quality parallel corpora with minimizing human labor and making this tool publicly available.

Machine Translation NMT +1

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

no code implementations27 Sep 2021 Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim

In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities.

Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification

no code implementations NAACL 2021 Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

We derive an optimal subword tokenization result for Korean-English machine translation by conducting a case study that combines the subword tokenization method, morphological segmentation, and vocabulary method.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.