Search Results for author: Yen-Chun Chen

Found 22 papers, 15 papers with code

iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

1 code implementation • 28 Dec 2023 • Chin-Hsuan Wu, Yen-Chun Chen, Bolivar Solarte, Lu Yuan, Min Sun

Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views.

3D Object Reconstruction Novel View Synthesis +2

Paper
Code

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

1 code implementation • 18 Oct 2023 • Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang

Additional analysis shows that the contrastive objective and meta-actions are complementary in achieving the best results, and the resulting agent better aligns its states with corresponding instructions, making it more suitable for real-world embodied agents.

Contrastive Learning Instruction Following

Paper
Code

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation • NeurIPS 2023 • Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

503

Paper
Code

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

Diffusion models (DMs) have shown great potential for high-quality image synthesis.

Conditional Image Generation Denoising +1

111

Paper
Code

GLIPv2: Unifying Localization and Vision-Language Understanding

1 code implementation • 12 Jun 2022 • Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning).

Ranked #1 on Phrase Grounding on Flickr30k Entities Test (using extra training data)

Contrastive Learning Image Captioning +7

1,947

Paper
Code

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

no code implementations • 22 Apr 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.

Ranked #4 on Visual Question Answering (VQA) on VCR (Q-A) test

Question Answering Visual Commonsense Reasoning +2

Paper
Add Code

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

no code implementations • 15 Jan 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.

Question Answering Visual Commonsense Reasoning +2

Paper
Add Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

Playing Lottery Tickets with Vision and Language

no code implementations • 23 Apr 2021 • Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu

However, we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy.

Question Answering Referring Expression +6

Paper
Add Code

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

2 code implementations • NAACL 2021 • Siqi Sun, Yen-Chun Chen, Linjie Li, Shuohang Wang, Yuwei Fang, Jingjing Liu

Multimodal pre-training has propelled great advancement in vision-and-language research.

Re-Ranking Retrieval +1

2,972

Paper
Code

Cluster-Former: Clustering-based Sparse Transformer for Question Answering

no code implementations • Findings (ACL) 2021 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu

Transformer has become ubiquitous in the deep learning field.

Clustering Question Answering

Paper
Add Code

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

no code implementations • 13 Sep 2020 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu

Transformer has become ubiquitous in the deep learning field.

Ranked #1 on Open-Domain Question Answering on SearchQA

Clustering Language Modelling +1

Paper
Add Code

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

1 code implementation • ACL 2020 • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan

We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer).

Conversational Response Generation Response Generation

2,317

Paper
Code

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

2 code implementations • NeurIPS 2020 • Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Ranked #7 on Visual Entailment on SNLI-VE val (using extra training data)

Question Answering Referring Expression +7

118

Paper
Code

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

no code implementations • ECCV 2020 • Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu

To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e. g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e. g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings).

coreference-resolution

Paper
Add Code

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

3 code implementations • EMNLP 2020 • Linjie Li, Yen-Chun Chen, Yu Cheng, Zhe Gan, Licheng Yu, Jingjing Liu

We present HERO, a novel framework for large-scale video+language omni-representation learning.

Ranked #1 on Video Retrieval on TVR

Language Modelling Masked Language Modeling +8

226

Paper
Code

Distilling Knowledge Learned in BERT for Text Generation

1 code implementation • ACL 2020 • Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Language Modelling Machine Translation +5

130

Paper
Code

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

6 code implementations • 1 Nov 2019 • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan

We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer).

Conversational Response Generation Response Generation

2,317

Paper
Code

UNITER: Learning UNiversal Image-TExt Representations

no code implementations • 25 Sep 2019 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Language Modelling +10

Paper
Add Code

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations • ECCV 2020 • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Ranked #3 on Visual Question Answering (VQA) on VCR (Q-A) test

Image-text matching Language Modelling +12

761

Paper
Code

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

1 code implementation • ACL 2019 • Yichen Jiang, Nitish Joshi, Yen-Chun Chen, Mohit Bansal

Multi-hop reading comprehension requires the model to explore and connect relevant information from multiple sentences/documents in order to answer the question about the context.

Multi-Hop Reading Comprehension Sentence

Paper
Code

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

3 code implementations • ACL 2018 • Yen-Chun Chen, Mohit Bansal

Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i. e., compresses and paraphrases) to generate a concise overall summary.

Ranked #7 on Text Summarization on CNN / Daily Mail (Anonymized)

Abstractive Text Summarization Sentence +1

623

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.