1 code implementation • CODI 2021 • Zae Myung Kim, Vassilina Nikoulina, Dongyeop Kang, Didier Schwab, Laurent Besacier
This paper presents an interactive data dashboard that provides users with an overview of the preservation of discourse relations among 28 language pairs.
no code implementations • Findings (EMNLP) 2021 • Hwiyeol Jo, Dongyeop Kang, Andrew Head, Marti A. Hearst
Natural language models often fall short when understanding and generating mathematical notation.
no code implementations • 14 Apr 2024 • Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang
Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF).
no code implementations • 21 Feb 2024 • Karin de Langis, Ryan Koo, Dongyeop Kang
Style is an integral component of text that expresses a diverse set of information, including interpersonal dynamics (e. g. formality) and the author's emotions or attitudes (e. g. disgust).
no code implementations • 19 Feb 2024 • Anna Martin-Boyle, Aahan Tyagi, Marti A. Hearst, Dongyeop Kang
Numerous AI-assisted scholarly applications have been developed to aid different stages of the research process.
no code implementations • 18 Feb 2024 • Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang
Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks.
no code implementations • 16 Feb 2024 • Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang
With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred.
no code implementations • 16 Feb 2024 • Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim
II-MMR then analyzes this path to identify different reasoning cases in current VQA benchmarks by estimating how many hops and what types (i. e., visual or beyond-visual) of reasoning are required to answer the question.
no code implementations • 6 Feb 2024 • Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi
Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs).
1 code implementation • 29 Jan 2024 • Ritik Sachin Parkar, Jaehyung Kim, Jong Inn Park, Dongyeop Kang
However, how to select unlabelled instructions is not well-explored, especially in the context of LLMs.
no code implementations • 26 Jan 2024 • Debarati Das, Karin de Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee, Zae Myung Kim, Shirley Anugrah Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, Dongyeop Kang
This work delves into the expanding role of large language models (LLMs) in generating artificial data.
no code implementations • 28 Dec 2023 • Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang
In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training.
no code implementations • 16 Nov 2023 • Debarati Das, Ishaan Gupta, Jaideep Srivastava, Dongyeop Kang
Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints.
1 code implementation • 16 Nov 2023 • Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang
In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts.
1 code implementation • 13 Oct 2023 • Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo
Hence, our focus is to facilitate such multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought (CoT) reasoning.
1 code implementation • 29 Sep 2023 • Ryan Koo, Minhwa Lee, Vipul Raheja, Jong Inn Park, Zae Myung Kim, Dongyeop Kang
According to our findings, LLMs may still be unable to be utilized for automatic annotation aligned with human preferences.
1 code implementation • ICCV 2023 • Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences.
1 code implementation • 8 Jun 2023 • Jaehyung Kim, Jinwoo Shin, Dongyeop Kang
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
no code implementations • 6 Jun 2023 • Rose Neis, Karin de Langis, Zae Myung Kim, Dongyeop Kang
Capturing readers' engagement in fiction is a challenging but important aspect of narrative understanding.
1 code implementation • 30 May 2023 • Jaehyung Kim, Yekyung Kim, Karin de Langis, Jinwoo Shin, Dongyeop Kang
However, not all samples in these datasets are equally valuable for learning, as some may be redundant or noisy.
no code implementations • 24 May 2023 • Hao Zou, Zae Myung Kim, Dongyeop Kang
In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation.
1 code implementation • 24 May 2023 • Anna Martin-Boyle, Andrew Head, Kyle Lo, Risham Sidhu, Marti A. Hearst, Dongyeop Kang
We also introduce a new definition extraction method that masks mathematical symbols, creates a copy of each sentence for each symbol, specifies a target symbol, and predicts its corresponding definition spans using slot filling.
no code implementations • 24 May 2023 • Debarati Das, David Ma, Dongyeop Kang
This paper explores the impact of training data input diversity on the quality of the generated text from the multi-style transfer model.
1 code implementation • 24 May 2023 • London Lowmanstone, Ruyuan Wan, Risako Owan, Jaehyung Kim, Dongyeop Kang
In our analysis of the results, we found that the choice of imputation method significantly impacts soft label changes and distribution.
no code implementations • 23 May 2023 • Zae Myung Kim, David E. Taylor, Dongyeop Kang
Conversational implicatures are pragmatic inferences that require listeners to deduce the intended meaning conveyed by a speaker from their explicit utterances.
1 code implementation • 17 May 2023 • Vipul Raheja, Dhruv Kumar, Ryan Koo, Dongyeop Kang
We present a large language model fine-tuned on a diverse collection of task-specific instructions for text editing (a total of 82K instructions).
1 code implementation • 31 Mar 2023 • Ryan Koo, Anna Martin, Linghe Wang, Dongyeop Kang
We also provide ManuScript, an original dataset annotated with a simplified version of our taxonomy to show writer actions and the intentions behind them.
no code implementations • 25 Mar 2023 • Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld
Scholarly publications are key to the transfer of knowledge from scholars to others.
1 code implementation • 17 Feb 2023 • Taehee Jung, Joo-Kyung Kim, Sungjin Lee, Dongyeop Kang
For extreme multi-label classification (XMC), existing classification-based models poorly perform for tail labels and often ignore the semantic relations among labels, like treating "Wikipedia" and "Wiki" as independent and separate labels.
no code implementations • 12 Jan 2023 • Ruyuan Wan, Jaehyung Kim, Dongyeop Kang
Particularly, we extract disagreement labels from the annotators' voting histories in the five subjective datasets, and then fine-tune language models to predict annotators' disagreement.
no code implementations • 20 Dec 2022 • Risako Owan, Maria Gini, Dongyeop Kang
We observe that both frameworks have similar inter-annotator agreements, despite having different numbers of sense types (8 for Quirk and 3 for Palmer).
1 code implementation • 19 Dec 2022 • Karin de Langis, Dongyeop Kang
We develop a variety of methods to derive style saliency scores over text using the collected eye dataset.
1 code implementation • 2 Dec 2022 • Zae Myung Kim, Wanyu Du, Vipul Raheja, Dhruv Kumar, Dongyeop Kang
Leveraging datasets from other related text editing NLP tasks, combined with the specification of editable spans, leads our system to more accurately model the process of iterative text refinement, as evidenced by empirical results and human evaluations.
no code implementations • 26 Oct 2022 • Kyumin Park, Keon Lee, Daeyoung Kim, Dongyeop Kang
We present a novel speech dataset, RedPen, with human annotations on unnatural speech regions and their corresponding reasons.
1 code implementation • 14 Oct 2022 • Shirley Anugrah Hayati, Kyumin Park, Dheeraj Rajagopal, Lyle Ungar, Dongyeop Kang
Large pre-trained language models have achieved impressive results on various style classification tasks, but they often learn spurious domain-specific words to make predictions (Hayati et al., 2021).
no code implementations • 13 Oct 2022 • Haneul Yoo, Rifki Afina Putri, Changyoon Lee, Youngin Lee, So-Yeon Ahn, Dongyeop Kang, Alice Oh
Researchers have traditionally recruited native speakers to provide annotations for widely used benchmark datasets.
1 code implementation • In2Writing (ACL) 2022 • Wanyu Du, Zae Myung Kim, Vipul Raheja, Dhruv Kumar, Dongyeop Kang
Examining and evaluating the capability of large language models for making continuous revisions and collaborating with human writers is a critical step towards building effective writing assistants.
1 code implementation • ACL 2022 • Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, Dongyeop Kang
Writing is, by nature, a strategic, adaptive, and more importantly, an iterative process.
no code implementations • NeurIPS Workshop ICBINB 2021 • Dyah Adila, Dongyeop Kang
Despite machine learning models' success in Natural Language Processing (NLP) tasks, predictions from these models frequently fail on out-of-distribution (OOD) samples.
no code implementations • ICLR 2022 • Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, Jinwoo Shin
Remarkably, our method is more effective on the challenging low-data and class-imbalanced regimes, and the learned augmentation policy is well-transferable to the different tasks and models.
1 code implementation • EMNLP 2021 • Shirley Anugrah Hayati, Dongyeop Kang, Lyle Ungar
People convey their intention and attitude through linguistic styles of the text that they write.
1 code implementation • ICCV 2021 • Jinwoo Nam, Daechul Ahn, Dongyeop Kang, Seong Jong Ha, Jonghyun Choi
Understanding videos to localize moments with natural language often requires large expensive annotated video regions paired with language queries.
1 code implementation • ACL 2021 • Dongyeop Kang, Eduard Hovy
This paper provides the benchmark corpus (XSLUE) that combines existing datasets and collects a new one for sentence-level cross-style language understanding and evaluation.
1 code implementation • EMNLP (sdp) 2020 • Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, Marti A. Hearst
Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark.
no code implementations • EMNLP 2020 • Dongyeop Kang, Eduard Hovy
To address that, we propose a self-supervised text planner SSPlanner that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content.
2 code implementations • EMNLP (DeeLIO) 2020 • Steven Y. Feng, Varun Gangal, Dongyeop Kang, Teruko Mitamura, Eduard Hovy
We also examine the relationship between the amount of augmentation and the quality of the generated text.
1 code implementation • 29 Sep 2020 • Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, Marti A. Hearst
We introduce ScholarPhi, an augmented reading interface with four novel features: (1) tooltips that surface position-sensitive definitions from elsewhere in a paper, (2) a filter over the paper that "declutters" it to reveal how the term or symbol is used across the paper, (3) automatic equation diagrams that expose multiple definitions in parallel, and (4) an automatically generated glossary of important terms and symbols.
1 code implementation • EMNLP 2020 • Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, Zhou Yu
To better understand how humans make recommendations in communication, we design an annotation scheme related to recommendation strategies based on social science theories and annotate these dialogs.
1 code implementation • ACL 2020 • Taehee Jung, Dongyeop Kang, Hua Cheng, Lucas Mentch, Thomas Schaaf
Here we propose an end-to-end training procedure called posterior calibrated (PosCal) training that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities. We show that PosCal not only helps reduce the calibration error but also improve task performance by penalizing drops in performance of both objectives.
2 code implementations • 9 Nov 2019 • Dongyeop Kang, Eduard Hovy
This paper provides the benchmark corpus (xSLUE) that combines existing datasets and collects a new one for sentence-level cross-style language understanding and evaluation.
1 code implementation • IJCNLP 2019 • Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, Jason Weston
These issues can be alleviated by treating recommendation as an interactive dialogue task instead, where an expert recommender can sequentially ask about someone's preferences, react to their requests, and recommend more appropriate items.
1 code implementation • IJCNLP 2019 • Dongyeop Kang, Varun Gangal, Eduard Hovy
Stylistic variation in text needs to be studied with different aspects including the writer's personal traits, interpersonal relations, rhetoric, and more.
1 code implementation • IJCNLP 2019 • Taehee Jung, Dongyeop Kang, Lucas Mentch, Eduard Hovy
We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes.
1 code implementation • IJCNLP 2019 • Dongyeop Kang, Hiroaki Hayashi, Alan W. black, Eduard Hovy
In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e. g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves.
no code implementations • EMNLP 2018 • Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark
We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts.
1 code implementation • ACL 2018 • Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy
We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it.
1 code implementation • NAACL 2018 • Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz
In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline.
no code implementations • 26 Dec 2017 • Chu-Cheng Lin, Dongyeop Kang, Michael Gamon, Madian Khabsa, Ahmed Hassan Awadallah, Patrick Pantel
Emails in the workplace are often intentional calls to action for its recipients.
1 code implementation • EMNLP 2017 • Dongyeop Kang, Varun Gangal, Ang Lu, Zheng Chen, Eduard Hovy
Our quantitative and human analysis show empirical evidence that our method successfully extracts meaningful causality relationships between time series with textual features and generates appropriate explanation between them.