Search Results for author: Naoaki Okazaki

Found 105 papers, 31 papers with code

PatchBERT: Just-in-Time, Out-of-Vocabulary Patching

no code implementations EMNLP 2020 Sangwhan Moon, Naoaki Okazaki

Large scale pre-trained language models have shown groundbreaking performance improvements for transfer learning in the domain of natural language processing.

Transfer Learning

Word-level Perturbation Considering Word Length and Compositional Subwords

1 code implementation Findings (ACL) 2022 Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki

We present two simple modifications for word-level perturbation: Word Replacement considering Length (WR-L) and Compositional Word Replacement (CWR). In conventional word replacement, a word in an input is replaced with a word sampled from the entire vocabulary, regardless of the length and context of the target word. WR-L considers the length of a target word by sampling words from the Poisson distribution. CWR considers the compositional candidates by restricting the source of sampling to related words that appear in subword regularization. Experimental results showed that the combination of WR-L and CWR improved the performance of text classification and machine translation.

Machine Translation text-classification +2

IMPARA: Impact-Based Metric for GEC Using Parallel Data

1 code implementation COLING 2022 Koki Maeda, Masahiro Kaneko, Naoaki Okazaki

Correlations between IMPARA and human scores indicate that IMPARA is comparable or better than existing evaluation methods.

Grammatical Error Correction

Predicting Antonyms in Context using BERT

no code implementations INLG (ACL) 2021 Ayana Niwa, Keisuke Nishiguchi, Naoaki Okazaki

We address the task of antonym prediction in a context, which is a fill-in-the-blanks problem.

Debiasing Isn’t Enough! – on the Effectiveness of Debiasing MLMs and Their Social Biases in Downstream Tasks

no code implementations COLING 2022 Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki

We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for MLMs, and find that there exists only a weak correlation between these two types of evaluation measures.

OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation

no code implementations LREC 2022 Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki, Nam Soo Kim

As this problem originates from the conventional scheme used when creating a POS tagging corpus, we propose an improvement to the existing scheme, which makes it friendlier to generative tasks.

POS POS Tagging +1

Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

no code implementations28 Feb 2024 Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki

Given the accelerating progress of vision and language modeling, accurate evaluation of machine-generated image captions remains critical.

Image Captioning Language Modelling

Two Counterexamples to Tokenization and the Noiseless Channel

no code implementations22 Feb 2024 Marco Cognetta, Vilém Zouhar, Sangwhan Moon, Naoaki Okazaki

In Tokenization and the Noiseless Channel (Zouhar et al., 2023a), R\'enyi efficiency is suggested as an intrinsic mechanism for evaluating a tokenizer: for NLP tasks, the tokenizer which leads to the highest R\'enyi efficiency of the unigram distribution should be chosen.

Machine Translation

Knowledge of Pretrained Language Models on Surface Information of Tokens

no code implementations15 Feb 2024 Tatsuya Hiraoka, Naoaki Okazaki

Do pretrained language models have knowledge regarding the surface information of tokens?

SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks

no code implementations14 Nov 2023 Mengsay Loem, Masahiro Kaneko, Naoaki Okazaki

Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances.

GSM8K Math

How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection

1 code implementation14 Nov 2023 Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki

Furthermore, our analysis indicates that the high instruction-following ability of LLMs fosters the large impact of such constraints on detection performance.

Instruction Following Large Language Model +4

Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering

no code implementations9 Oct 2023 Trang Nguyen, Naoaki Okazaki

Besides, diverse interpretations of the input lead to various modes of answer generation, highlighting the role of causal reasoning between interpreting and answering steps in VQA.

Answer Generation Question Answering +1

Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction

1 code implementation20 Sep 2023 Masahiro Kaneko, Naoaki Okazaki

Generating explanations for GEC corrections involves aligning input and output tokens, identifying correction points, and presenting corresponding explanations consistently.

Grammatical Error Correction

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

1 code implementation18 Sep 2023 Panatchakorn Anantaprayoon, Masahiro Kaneko, Naoaki Okazaki

In Natural Language Inference (NLI), existing bias evaluation methods have focused on the prediction results of a specific label out of three labels, such as neutral.

Natural Language Inference

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

no code implementations16 Sep 2023 Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki

In this study, we compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets that containing female, male, and stereotypical words.

OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples

1 code implementation21 Jul 2023 Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki

Experiments in the domain of student essays show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41. 3 points F1-score.

Adversarial Attack Detection DeepFake Detection +5

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

1 code implementation6 Jun 2023 Zhishen Yang, Raj Dabre, Hideki Tanaka, Naoaki Okazaki

Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings.

Image Captioning Optical Character Recognition (OCR)

Reducing Sequence Length by Predicting Edit Operations with Large Language Models

no code implementations19 May 2023 Masahiro Kaneko, Naoaki Okazaki

Experiments show that the proposed method achieves comparable performance to the baseline in four tasks, paraphrasing, formality style transfer, GEC, and text simplification, despite reducing the length of the target text by as small as 21%.

Formality Style Transfer Grammatical Error Correction +2

Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach

1 code implementation19 May 2023 Masahiro Kaneko, Graham Neubig, Naoaki Okazaki

Humans work together to solve common problems by having discussions, explaining, and agreeing or disagreeing with each other.

Natural Language Inference

Semantic Specialization for Knowledge-based Word Sense Disambiguation

1 code implementation22 Apr 2023 Sakae Mizuki, Naoaki Okazaki

A promising approach for knowledge-based Word Sense Disambiguation (WSD) is to select the sense whose contextualized embeddings computed for its definition sentence are closest to those computed for a target word in a given sentence.

Language Modelling Sentence +1

DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction

1 code implementation17 Feb 2023 Youmi Ma, An Wang, Naoaki Okazaki

First, we propose DREEAM, a memory-efficient approach that adopts evidence information as the supervisory signal, thereby guiding the attention modules of the DocRE system to assign high weights to evidence.

Document-level Relation Extraction Relation

Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated Examples

no code implementations28 Jan 2023 Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki

Prior works have relied on human annotated examples to compare existing intrinsic bias evaluation measures.

Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks

no code implementations6 Oct 2022 Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki

We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for Masked Language Models (MLMs), and find that there exists only a weak correlation between these two types of evaluation measures.

Nearest Neighbor Non-autoregressive Text Generation

no code implementations26 Aug 2022 Ayana Niwa, Sho Takase, Naoaki Okazaki

In addition, the proposed method outperforms an NAR baseline on the WMT'14 En-De dataset.

Machine Translation Text Generation +1

Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention

no code implementations27 Jul 2022 Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki

Impressive performance of Transformer has been attributed to self-attention, where dependencies between entire input in a sequence are considered at every position.

Position

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

1 code implementation25 May 2022 Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang

However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data.

Table-to-Text Generation

Gender Bias in Meta-Embeddings

no code implementations19 May 2022 Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki

Different methods have been proposed to develop meta-embeddings from a given set of source embeddings.

Gender Bias in Masked Language Models for Multiple Languages

1 code implementation NAACL 2022 Masahiro Kaneko, Aizhan Imankulova, Danushka Bollegala, Naoaki Okazaki

Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race.

Attribute Sentence

Semi-Supervised Formality Style Transfer with Consistency Training

1 code implementation ACL 2022 Ao Liu, An Wang, Naoaki Okazaki

In this work, we propose a simple yet effective semi-supervised framework to better utilize source-side unlabeled sentences based on consistency training.

 Ranked #1 on Formality Style Transfer on GYAFC (using extra training data)

Formality Style Transfer Sentence

Interpretability for Language Learners Using Example-Based Grammatical Error Correction

1 code implementation ACL 2022 Masahiro Kaneko, Sho Takase, Ayana Niwa, Naoaki Okazaki

In this study, we introduce an Example-Based GEC (EB-GEC) that presents examples to language learners as a basis for a correction result.

Grammatical Error Correction

Learning How to Translate North Korean through South Korean

no code implementations LREC 2022 Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi

Training a model using North Korean data is the most straightforward approach to solving this problem, but there is insufficient data to train NMT models.

Machine Translation NMT +1

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

no code implementations NAACL (ACL) 2022 Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki

Through experiments, we show that ExtraPhrase improves the performance of abstractive summarization tasks by more than 0. 50 points in ROUGE scores compared to the setting without data augmentation.

Abstractive Text Summarization Data Augmentation +1

Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

no code implementations12 Dec 2021 Ao Liu, Congjian Luo, Naoaki Okazaki

We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table.

Data Augmentation Text Generation +1

Transformer-based Lexically Constrained Headline Generation

1 code implementation EMNLP 2021 Kosuke Yamada, Yuta Hitomi, Hideaki Tamori, Ryohei Sasano, Naoaki Okazaki, Kentaro Inui, Koichi Takeda

We also consider a new headline generation strategy that takes advantage of the controllable generation order of Transformer.

Headline Generation

Joint Optimization of Tokenization and Downstream Model

2 code implementations Findings (ACL) 2021 Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki

Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance.

Machine Translation text-classification +2

TextLearner at SemEval-2020 Task 10: A Contextualized Ranking System in Solving Emphasis Selection in Text

no code implementations SEMEVAL 2020 Zhishen Yang, Lars Wolfsteller, Naoaki Okazaki

This paper describes the emphasis selection system of the team TextLearner for SemEval 2020 Task 10: Emphasis Selection For Written Text in Visual Media.

Language Modelling

Image Caption Generation for News Articles

1 code implementation COLING 2020 Zhishen Yang, Naoaki Okazaki

In this paper, we address the task of news-image captioning, which generates a description of an image given the image and its article body as input.

Image Captioning

SWAGex at SemEval-2020 Task 4: Commonsense Explanation as Next Event Prediction

no code implementations SEMEVAL 2020 Wiem Ben Rim, Naoaki Okazaki

We describe the system submitted by the SWAGex team to the SemEval-2020 Commonsense Validation and Explanation Task.

Language Modelling

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

3 code implementations30 Nov 2020 Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing.

Multi-Task Learning for Cross-Lingual Abstractive Summarization

no code implementations LREC 2022 Sho Takase, Naoaki Okazaki

Experimental results indicate that Transum improves the performance from the strong baseline, Transformer, in Chinese-English, Arabic-English, and English-Japanese translation datasets.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +4

Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020

no code implementations23 Jun 2020 Tosho Hirasawa, Zhishen Yang, Mamoru Komachi, Naoaki Okazaki

Video-guided machine translation as one of multimodal neural machine translation tasks targeting on generating high-quality text translation by tangibly engaging both video and text.

Machine Translation Translation +1

Improving Truthfulness of Headline Generation

1 code implementation ACL 2020 Kazuki Matsumaru, Sho Takase, Naoaki Okazaki

Building a binary classifier that predicts an entailment relation between an article and its headline, we filter out untruthful instances from the supervision data.

Abstractive Text Summarization Headline Generation +1

Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword Tokenization

no code implementations LREC 2020 Sangwhan Moon, Naoaki Okazaki

In the context of multilingual language model pre-training, vocabulary size for languages with a broad set of potential characters is an unsolved problem.

Language Modelling

Evaluation Dataset for Zero Pronoun in Japanese to English Translation

no code implementations LREC 2020 Sho Shimazu, Sho Takase, Toshiaki Nakazawa, Naoaki Okazaki

Therefore, we present a hand-crafted dataset to evaluate whether translation models can resolve the zero pronoun problems in Japanese to English translations.

Machine Translation Translation

Enhancing Machine Translation with Dependency-Aware Self-Attention

1 code implementation ACL 2020 Emanuele Bugliarello, Naoaki Okazaki

Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism.

Machine Translation Translation

TokyoTech\_NLP at SemEval-2019 Task 3: Emotion-related Symbols in Emotion Detection

no code implementations SEMEVAL 2019 Zhishen Yang, Sam Vijlbrief, Naoaki Okazaki

This paper presents our contextual emotion detection system in approaching the SemEval2019 shared task 3: EmoContext: Contextual Emotion Detection in Text.

A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation

no code implementations WS 2019 Yuta Hitomi, Yuya Taguchi, Hideaki Tamori, Ko Kikuta, Jiro Nishitoba, Naoaki Okazaki, Kentaro Inui, Manabu Okumura

However, because there is no corpus of headlines of multiple lengths for a given article, previous research on controlling output length in headline generation has not discussed whether the system outputs could be adequately evaluated without multiple references of different lengths.

Headline Generation

Predicting Stances from Social Media Posts using Factorization Machines

no code implementations COLING 2018 Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui

This paper presents an approach to detect the stance of a user toward a topic based on their stances toward other topics and the social media posts of the user.

Decision Making Stance Detection

A Corpus of Deep Argumentative Structures as an Explanation to Argumentative Relations

no code implementations7 Dec 2017 Paul Reisert, Naoya Inoue, Naoaki Okazaki, Kentaro Inui

Our coverage result of 74. 6% indicates that argumentative relations can reasonably be explained by our small pattern set.

A Neural Language Model for Dynamically Representing the Meanings of Unknown Words and Entities in a Discourse

1 code implementation IJCNLP 2017 Sosuke Kobayashi, Naoaki Okazaki, Kentaro Inui

This study addresses the problem of identifying the meaning of unknown words or entities in a discourse with respect to the word embedding approaches used in neural language models.

Language Modelling Word Embeddings

Analyzing the Revision Logs of a Japanese Newspaper for Article Quality Assessment

no code implementations WS 2017 Hideaki Tamori, Yuta Hitomi, Naoaki Okazaki, Kentaro Inui

We address the issue of the quality of journalism and analyze daily article revision logs from a Japanese newspaper company.

Composing Distributed Representations of Relational Patterns

1 code implementation ACL 2016 Sho Takase, Naoaki Okazaki, Kentaro Inui

Learning distributed representations for relation instances is a central technique in downstream NLP applications.

General Classification Relation +1

Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization

no code implementations ACL 2017 Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui

We present in this paper our approach for modeling inter-topic preferences of Twitter users: for example, those who agree with the Trans-Pacific Partnership (TPP) also agree with free trade.

Stance Detection

Modeling Context-sensitive Selectional Preference with Distributed Representations

no code implementations COLING 2016 Naoya Inoue, Yuichiroh Matsubayashi, Masayuki Ono, Naoaki Okazaki, Kentaro Inui

This paper proposes a novel problem setting of selectional preference (SP) between a predicate and its arguments, called as context-sensitive SP (CSP).

Semantic Role Labeling

The Mechanism of Additive Composition

no code implementations26 Nov 2015 Ran Tian, Naoaki Okazaki, Kentaro Inui

Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell and Lapata, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words.

Cannot find the paper you are looking for? You can Submit a new open access paper.