Search Results for author: Xin Eric Wang

Found 50 papers, 26 papers with code

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

no code implementations • ECCV 2020 • Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

no code implementations • 8 Apr 2024 • Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.

Image Generation Object

Paper
Add Code

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

no code implementations • 29 Jan 2024 • Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

Our evaluation shows that questions in the MultipanelVQA benchmark pose significant challenges to the state-of-the-art Large Vision Language Models (LVLMs) tested, even though humans can attain approximately 99\% accuracy on these questions.

Benchmarking Image Comprehension +4

Paper
Add Code

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

no code implementations • 9 Oct 2023 • Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

We categorize the problem of VCR into visual commonsense understanding (VCU) and visual commonsense inference (VCI).

Image Captioning Visual Commonsense Reasoning

Paper
Add Code

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

1 code implementation • 5 Oct 2023 • Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang

In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain.

Multiple-choice Question Answering

Paper
Code

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

1 code implementation • 3 Oct 2023 • Kaizhi Zheng, Xuehai He, Xin Eric Wang

The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding.

Image Generation multimodal generation +2

814

Paper
Code

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation

no code implementations • 1 Jun 2023 • Jialu Wang, Xinyue Gabby Liu, Zonglin Di, Yang Liu, Xin Eric Wang

In this work, we seek to measure more complex human biases exist in the task of text-to-image generations.

Text-to-Image Generation

Paper
Add Code

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

237

Paper
Code

R2H: Building Multimodal Navigation Helpers that Respond to Help Requests

no code implementations • 23 May 2023 • Yue Fan, Jing Gu, Kaizhi Zheng, Xin Eric Wang

Intelligent navigation-helper agents are critical as they can navigate users in unknown areas through environmental awareness and conversational ability, serving as potential accessibility tools for individuals with disabilities.

Benchmarking Language Modelling +3

Paper
Add Code

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations • 18 May 2023 • Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

Paper
Add Code

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

1 code implementation • NeurIPS 2023 • Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang

Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments.

Attribute Image Generation +2

112

Paper
Code

Discriminative Diffusion Models as Few-shot Vision and Language Learners

1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.

Image-text matching Text Matching +1

Paper
Code

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation • 2 May 2023 • Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Informativeness Text-to-Image Generation

Paper
Code

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

1 code implementation • 2 May 2023 • Zhen Zhang, Jialu Wang, Xin Eric Wang

Extensive experiments on XTD and Multi30K datasets, covering 11 languages under zero-shot, few-shot, and full-dataset learning scenarios, show that our framework significantly reduces the multilingual disparities among languages and improves cross-lingual transfer results, especially in low-resource scenarios, while only keeping and fine-tuning an extremely small number of parameters compared to the full model (e. g., Our framework only requires 0. 16\% additional parameters of a full-model for each language in the few-shot learning scenario).

Cross-Lingual Transfer Few-Shot Learning +1

Paper
Code

Multimodal Graph Transformer for Multimodal Question Answering

no code implementations • 30 Apr 2023 • Xuehai He, Xin Eric Wang

Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly.

Question Answering

Paper
Add Code

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

no code implementations • 30 Jan 2023 • Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang

Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments.

Efficient Exploration Language Modelling +2

Paper
Add Code

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

292

Paper
Code

Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning

no code implementations • 27 Nov 2022 • Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang

However, since the local data is inaccessible to the server under federated learning, attackers may easily poison the training data of the local client to build a backdoor in the agent without notice.

Federated Learning Navigate +1

Paper
Add Code

ComCLIP: Training-Free Compositional Image and Text Matching

1 code implementation • 25 Nov 2022 • Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang

Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text.

Image-text matching Retrieval +2

Paper
Code

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

Paper
Add Code

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.

Concept-To-Text Generation Image Generation +1

Paper
Code

Anticipating the Unseen Discrepancy for Vision and Language Navigation

no code implementations • 10 Sep 2022 • Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang

In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.

Data Augmentation Decision Making +3

Paper
Add Code

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.

Action Generation Common Sense Reasoning +1

Paper
Add Code

Understanding Instance-Level Impact of Fairness Constraints

1 code implementation • 30 Jun 2022 • Jialu Wang, Xin Eric Wang, Yang Liu

A variety of fairness constraints have been proposed in the literature to mitigate group-level statistical bias.

Fairness

Paper
Code

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

no code implementations • 17 Jun 2022 • Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang

We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.

Object

Paper
Add Code

Neuro-Symbolic Procedural Planning with Commonsense Prompting

no code implementations • 6 Jun 2022 • Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.

Graph Sampling

Paper
Add Code

Aerial Vision-and-Dialog Navigation

2 code implementations • 24 May 2022 • Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric Wang

To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.

Navigate

Paper
Code

Imagination-Augmented Natural Language Understanding

1 code implementation • NAACL 2022 • Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.

Natural Language Understanding

Paper
Code

Parameter-efficient Model Adaptation for Vision Transformers

2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.

Benchmarking Classification +2

Paper
Code

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

1 code implementation • 28 Mar 2022 • Kaiwen Zhou, Xin Eric Wang

Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world.

Privacy Preserving Vision and Language Navigation

Paper
Code

Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking

no code implementations • Findings (ACL) 2022 • Tianyi Luo, Rui Meng, Xin Eric Wang, Yang Liu

Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not.

Document Classification Sentence

Paper
Add Code

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Code

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

1 code implementation • ACL 2022 • Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.

Vision and Language Navigation

275

Paper
Code

Relational Graph Learning for Grounded Video Description Generation

no code implementations • 2 Dec 2021 • Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang

Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.

Graph Learning Hallucination +2

Paper
Add Code

Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search

1 code implementation • EMNLP 2021 • Jialu Wang, Yang Liu, Xin Eric Wang

Internet search affects people's cognition of the world, so mitigating biases in search results and learning fair models is imperative for social good.

Image Retrieval Natural Language Queries

521

Paper
Code

CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection

1 code implementation • 21 Jun 2021 • Swati Jindal, Xin Eric Wang

However, adopting such generative models to new domains while maintaining their ability to provide fine-grained control over different image attributes, \eg, gaze and head pose directions, has been a challenging problem.

Benchmarking gaze redirection +3

Paper
Code

Assessing Multilingual Fairness in Pre-trained Multimodal Representations

no code implementations • Findings (ACL) 2022 • Jialu Wang, Yang Liu, Xin Eric Wang

To answer these questions, we view language as the fairness recipient and introduce two new fairness notions, multilingual individual fairness and multilingual group fairness, for pre-trained multimodal models.

Fairness

Paper
Add Code

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

no code implementations • 10 Jun 2021 • Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.

nlg evaluation Text Generation

Paper
Add Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

Language-Driven Image Style Transfer

1 code implementation • 1 Jun 2021 • Tsu-Jui Fu, Xin Eric Wang, William Yang Wang

We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator.

Style Transfer

Paper
Code

M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers

no code implementations • CVPR 2022 • Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

LBVE contains two features: 1) the scenario of the source video is preserved instead of generating a completely different video; 2) the semantic is presented differently in the target video, and all changes are controlled by the given instruction.

Video Editing Video Understanding

Paper
Add Code

Diagnosing Vision-and-Language Navigation: What Really Matters

1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

Results show that indoor navigation agents refer to both object and direction tokens when making decisions.

Object Vision and Language Navigation

Paper
Code

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang

Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.

Image Captioning

Paper
Add Code

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

no code implementations • EMNLP 2020 • Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.

Text Generation

Paper
Add Code

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Jiannan Xiang, Xin Eric Wang, William Yang Wang

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments.

Ranked #3 on Vision and Language Navigation on Touchdown Dataset

Navigate Vision and Language Navigation +1

Paper
Add Code

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

1 code implementation • EMNLP 2020 • Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, William Yang Wang

In this paper, we introduce a Self-Supervised Counterfactual Reasoning (SSCR) framework that incorporates counterfactual thinking to overcome data scarcity.

counterfactual Counterfactual Reasoning

Paper
Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Paper
Code

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

1 code implementation • ECCV 2020 • Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e. g., following natural language instructions or dialog.

Ranked #8 on Visual Navigation on Cooperative Vision-and-Dialogue Navigation

Vision-Language Navigation

Paper
Code

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling

no code implementations • 17 Nov 2019 • Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

Cross-Lingual Vision-Language Navigation

2 code implementations • 24 Oct 2019 • An Yan, Xin Eric Wang, Jiangtao Feng, Lei LI, William Yang Wang

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.

Domain Adaptation Navigate +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.