Search Results for author: Jialin Wu

Found 23 papers, 7 papers with code

GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

no code implementations19 Dec 2023 Mehran Kazemi, Hamidreza Alvari, Ankit Anand, Jialin Wu, Xi Chen, Radu Soricut

In this paper, we evaluate the reasoning capabilities of VLMs along various axes through the lens of geometry problems.

Mathematical Reasoning

CausalLM is not optimal for in-context learning

1 code implementation14 Aug 2023 Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut

Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples.

In-Context Learning Language Modelling

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

no code implementations18 Oct 2022 Jialin Wu, Raymond J. Mooney

To address these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge.

Passage Retrieval Question Answering +2

Possibilities and Implications of the Multi-AI Competition

no code implementations1 Sep 2022 Jialin Wu

The possibility of super-AIs taking over the world has been intensively studied by numerous scholars.

Breaking Down Questions for Outside-Knowledge VQA

no code implementations29 Sep 2021 Jialin Wu, Ray Mooney

While general Visual Question Answering (VQA) focuses on querying visual content within an image, there is a recent trend towards Knowledge-Based VQA (KB-VQA) where a system needs to link some aspects of the question to different types of knowledge beyond the image, such as commonsense concepts and factual information.

Question Answering Visual Question Answering

Multi-Modal Answer Validation for Knowledge-Based VQA

1 code implementation23 Mar 2021 Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Question Answering Retrieval +1

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

no code implementations22 Jan 2021 Jung-Jun Kim, Dong-Gyu Lee, Jialin Wu, Hong-Gyu Jung, Seong-Whan Lee

We quantitatively and qualitatively evaluated the proposed method on the VQA v2 dataset and compared it with state-of-the-art methods in terms of answer prediction.

Question Answering Referring Expression +2

CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation

no code implementations COLING 2020 Jungjun Kim, Hanbin Ko, Jialin Wu

These highly-related neighbors are determined by an attentional ranking module, as complementary features, highlighting the discriminating aspects for the target object.

Object Referring Expression +1

Improving VQA and its Explanations \\ by Comparing Competing Explanations

no code implementations28 Jun 2020 Jialin Wu, Liyan Chen, Raymond J. Mooney

Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer distribution given the question and visual content.

Question Answering Visual Question Answering

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

no code implementations31 Oct 2019 Jialin Wu, Raymond J. Mooney

Most RNN-based image captioning models receive supervision on the output words to mimic human captions.

Image Captioning Sentence

Self-Critical Reasoning for Robust Visual Question Answering

1 code implementation NeurIPS 2019 Jialin Wu, Raymond J. Mooney

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution.

Question Answering Visual Question Answering

Image Score: How to Select Useful Samples

no code implementations ICLR 2019 Simiao Zuo, Jialin Wu

There has long been debates on how we could interpret neural networks and understand the decisions our models make.

Decision Making

Joint Image Captioning and Question Answering

no code implementations22 May 2018 Jialin Wu, Zeyuan Hu, Raymond J. Mooney

Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers.

Image Captioning Question Answering +1

Dynamic Filtering with Large Sampling Field for ConvNets

no code implementations ECCV 2018 Jialin Wu, Dai Li, Yu Yang, Chandrajit Bajaj, Xiangyang Ji

We propose a dynamic filtering strategy with large sampling field for ConvNets (LS-DFN), where the position-specific kernels learn from not only the identical position but also multiple sampled neighbor regions.

object-detection Object Detection +3

Action Recognition with Joint Attention on Multi-Level Deep Features

no code implementations9 Jul 2016 Jialin Wu, Gu Wang, Wukui Yang, Xiangyang Ji

We propose a novel deep supervised neural network for the task of action recognition in videos, which implicitly takes advantage of visual tracking and shares the robustness of both deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).

Action Recognition In Videos Temporal Action Localization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.