Search Results for author: Josiah Wang

Found 21 papers, 5 papers with code

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

1 code implementation • LREC 2022 • Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia

The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi. org/10. 5281/zenodo. 5034604 under a Creative Commons licence.

Ranked #1 on Multimodal Text Prediction on MultiSubs

Multimodal Lexical Translation Multimodal Text Prediction +2

Paper
Code

Transformer-based Cascaded Multimodal Speech Translation

no code implementations • EMNLP (IWSLT) 2019 • Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia

Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Imperial College London Submission to VATEX Video Captioning Task

no code implementations • 16 Oct 2019 • Ozan Caglayan, Zixiu Wu, Pranava Madhyastha, Josiah Wang, Lucia Specia

This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features.

Decoder Video Captioning

Paper
Add Code

Phrase Localization Without Paired Training Examples

1 code implementation • ICCV 2019 • Josiah Wang, Lucia Specia

Localizing phrases in images is an important part of image understanding and can be useful in many applications that require mappings between textual and visual information.

Paper
Code

Predicting Actions to Help Predict Translations

no code implementations • 5 Aug 2019 • Zixiu Wu, Julia Ive, Josiah Wang, Pranava Madhyastha, Lucia Specia

The question we ask ourselves is whether visual features can support the translation process, in particular, given that this is a dataset extracted from videos, we focus on the translation of actions, which we believe are poorly captured in current static image-text datasets currently used for multimodal translation.

Translation

Paper
Add Code

VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions

no code implementations • ACL 2019 • Pranava Madhyastha, Josiah Wang, Lucia Specia

It estimates the faithfulness of a generated caption with respect to the content of the actual image, based on the semantic similarity between labels of objects depicted in images and words in the description.

Paper
Add Code

End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space

1 code implementation • WS 2018 • Pranava Swaroop Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn {`}distributional similarity{'} in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

Paper
Code

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

no code implementations • 11 Sep 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn `distributional similarity' in a multimodal feature space by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

Paper
Add Code

Defoiling Foiled Image Captions

1 code implementation • NAACL 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia

We address the task of detecting foiled image captions, i. e. identifying whether a caption contains a word that has been deliberately replaced by a semantically similar word, thus rendering it inaccurate with respect to the image being described.

Descriptive Image Captioning +1

Paper
Code

Object Counts! Bringing Explicit Detections Back into Image Captioning

no code implementations • NAACL 2018 • Josiah Wang, Pranava Madhyastha, Lucia Specia

The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding.

Image Captioning Language Modelling +1

Paper
Add Code

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

no code implementations • 9 Jan 2018 • Yu-Xing Tang, Josiah Wang, Xiaofang Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen

This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations.

Object object-detection +3

Paper
Add Code

What is image captioning made of?

1 code implementation • ICLR 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn ‘distributional similarity’ in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

Paper
Code

Sheffield MultiMT: Using Object Posterior Predictions for Multimodal Machine Translation

no code implementations • WS 2017 • Pranava Swaroop Madhyastha, Josiah Wang, Lucia Specia

Image Captioning Image Classification +4

Paper
Add Code

Don't Mention the Shoe! A Learning to Rank Approach to Content Selection for Image Description Generation

no code implementations • WS 2016 • Josiah Wang, Robert Gaizauskas

Image Retrieval Learning-To-Rank +1

Paper
Add Code

SHEF-Multimodal: Grounding Machine Translation on Images

no code implementations • WS 2016 • Kashif Shah, Josiah Wang, Lucia Specia

Multimodal Machine Translation Question Answering +2

Paper
Add Code

Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer

no code implementations • CVPR 2016 • Yu-Xing Tang, Josiah Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen

Object object-detection +3

Paper
Add Code

Cross-validating Image Description Datasets and Evaluation Metrics

no code implementations • LREC 2016 • Josiah Wang, Robert Gaizauskas

The task of automatically generating sentential descriptions of image content has become increasingly popular in recent years, resulting in the development of large-scale image description datasets and the proposal of various metrics for evaluating image description generation systems.

Sentence