Search Results for author: Kyubyong Park

Found 12 papers, 10 papers with code

K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings

1 code implementation24 Oct 2023 Chaewon Park, Soohwan Kim, Kyubyong Park, Kunwoo Park

This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness.

Hate Speech Detection

A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models

no code implementations4 Jun 2023 Hyunwoong Ko, Kichang Yang, Minho Ryu, Taekyoon Choi, Seungmu Yang, jiwung Hyun, Sungho Park, Kyubyong Park

This paper presents our work in developing the Polyglot Korean models, which propose some steps towards addressing the non-English language performance gap in multilingual language models.

KoParadigm: A Korean Conjugation Paradigm Generator

1 code implementation28 Apr 2020 Kyubyong Park

Korean is a morphologically rich language.

An Empirical Study of Invariant Risk Minimization

1 code implementation10 Apr 2020 Yo Joong Choe, Jiyeon Ham, Kyubyong Park

Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments.

text-classification Text Classification

g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset

1 code implementation7 Apr 2020 Kyubyong Park, Seanie Lee

Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems.

Polyphone disambiguation

word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs

2 code implementations LREC 2020 Yo Joong Choe, Kyubyong Park, Dongwoo Kim

We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora.

Sentence Translation

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

1 code implementation27 Mar 2019 Kyubyong Park, Thomas Mulc

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages.

Cannot find the paper you are looking for? You can Submit a new open access paper.