Search Results for author: Taro Watanabe

Found 61 papers, 18 papers with code

What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation

no code implementations Findings (ACL) 2022 Zuchao Li, Yiran Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao, Taro Watanabe

Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT.

Language Modelling Machine Translation +2

Universal Dependencies Treebank for Tatar: Incorporating Intra-Word Code-Switching Information

no code implementations EURALI (LREC) 2022 Chihiro Taguchi, Sei Iwata, Taro Watanabe

Experimenting on NMCTT and the Turkish-German CS treebank (SAGT), we demonstrate that the proposed annotation scheme introduced in NMCTT can improve the performance of the subword-level language identification.

Language Identification POS +1

Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair

no code implementations18 Apr 2024 Yusuke Sakai, Mana Makinae, Hidetaka Kamigaito, Taro Watanabe

In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems.

Machine Translation Translation

JDocQA: Japanese Document Question Answering Dataset for Generative Language Models

no code implementations28 Mar 2024 Eri Onami, Shuhei Kurita, Taiki Miyanishi, Taro Watanabe

Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society.

Hallucination Question Answering +1

Cross-lingual Contextualized Phrase Retrieval

1 code implementation25 Mar 2024 Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe

In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information.

Contrastive Learning Language Modelling +4

Distilling Named Entity Recognition Models for Endangered Species from Large Language Models

no code implementations13 Mar 2024 Jesse Atuhurra, Seiveright Cargill Dujohn, Hidetaka Kamigaito, Hiroyuki Shindo, Taro Watanabe

Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge.

In-Context Learning Knowledge Distillation +5

Artwork Explanation in Large-scale Vision Language Models

no code implementations29 Feb 2024 Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

To address this issue, we propose a new task: the artwork explanation generation task, along with its evaluation dataset and metric for quantitatively assessing the understanding and utilization of knowledge about artworks.

Explanation Generation Text Generation

Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?

1 code implementation22 Feb 2024 Seiji Gobara, Hidetaka Kamigaito, Taro Watanabe

Experimental results on the Stack-Overflow dataset and the TSCC dataset, including multi-turn conversation show that LLMs can implicitly handle text difficulty between user input and its generated response.

Question Answering

Evaluating Image Review Ability of Vision Language Models

no code implementations19 Feb 2024 Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model.

Image Captioning

Centroid-Based Efficient Minimum Bayes Risk Decoding

no code implementations17 Feb 2024 Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama

Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation.

Translation

Generating Diverse Translation with Perturbed kNN-MT

no code implementations14 Feb 2024 Yuto Nishida, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe

Generating multiple translation candidates would enable users to choose the one that satisfies their needs.

Machine Translation Translation

knn-seq: Efficient, Extensible kNN-MT Framework

1 code implementation18 Oct 2023 Hiroyuki Deguchi, Hayate Hirano, Tomoki Hoshino, Yuto Nishida, Justin Vasselli, Taro Watanabe

We publish our knn-seq as an MIT-licensed open-source project and the code is available on https://github. com/naist-nlp/knn-seq .

Machine Translation NMT +1

Model-based Subsampling for Knowledge Graph Completion

1 code implementation17 Sep 2023 Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets.

Knowledge Graph Completion Knowledge Graph Embedding

Japanese Lexical Complexity for Non-Native Readers: A New Dataset

2 code implementations30 Jun 2023 Yusuke Ide, Masato Mita, Adam Nohejl, Hiroki Ouchi, Taro Watanabe

Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale.

Lexical Complexity Prediction

Second Language Acquisition of Neural Language Models

1 code implementation5 Jun 2023 Miyu Oba, Tatsuki Kuribayashi, Hiroki Ouchi, Taro Watanabe

With the success of neural language models (LMs), their language acquisition has gained much attention.

Cross-Lingual Transfer Language Acquisition

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

1 code implementation3 Jun 2023 Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity.

Image Generation

Arukikata Travelogue Dataset

no code implementations19 May 2023 Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, Taro Watanabe

We have constructed Arukikata Travelogue Dataset and released it free of charge for academic research.

Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

1 code implementation6 Dec 2022 Ukyo Honda, Taro Watanabe, Yuji Matsumoto

Discriminativeness is a desirable feature of image captions: captions should describe the characteristic details of input images.

Image Captioning reinforcement-learning +1

$N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model

1 code implementation26 Oct 2022 Huayang Li, Deng Cai, Jin Xu, Taro Watanabe

The combination of $n$-gram and neural LMs not only allows the neural part to focus on the deeper understanding of language but also provides a flexible way to customize an LM by switching the underlying $n$-gram model without changing the neural model.

Domain Adaptation Language Modelling +2

Adapting to Non-Centered Languages for Zero-shot Multilingual Translation

1 code implementation COLING 2022 Zhi Qu, Taro Watanabe

Multilingual neural machine translation can translate unseen language pairs during training, i. e. zero-shot translation.

Machine Translation Translation

Improved Decomposition Strategy for Joint Entity and Relation Extraction

no code implementations Journal of Natural Language Processing 2021 Van-Hien Tran, Van-Thuy Phi, Akihiko Kato, Hiroyuki Shindo, Taro Watanabe, Yuji Matsumoto

A recent study (Yu et al. 2020) proposed a novel decomposition strategy that splits the task into two interrelated subtasks: detection of the head-entity (HE) and identification of the corresponding tail-entity and relation (TER) for each extracted head-entity.

Joint Entity and Relation Extraction Relation +1

Transductive Data Augmentation with Relational Path Rule Mining for Knowledge Graph Embedding

no code implementations1 Nov 2021 Yushi Hirose, Masashi Shimbo, Taro Watanabe

For knowledge graph completion, two major types of prediction models exist: one based on graph embeddings, and the other based on relation path rule induction.

Data Augmentation Knowledge Graph Completion +2

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

1 code implementation EACL 2021 Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images.

Image Captioning image-sentence alignment +2

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

no code implementations WS 2018 Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, Ciprian Chelba

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet.

Denoising Machine Translation +2

Phrase-based Machine Translation using Multiple Preordering Candidates

no code implementations COLING 2016 Yusuke Oda, Taku Kudo, Tetsuji Nakagawa, Taro Watanabe

In this paper, we propose a new decoding method for phrase-based statistical machine translation which directly uses multiple preordering candidates as a graph structure.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.