Search Results for author: Owais Khan Mohammed

Found 5 papers, 3 papers with code

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

no code implementations • 1 May 2023 • Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e. g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world.

AI Agent Mixed Reality +2

Paper
Add Code

Language Is Not All You Need: Aligning Perception with Language Models

1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.

Image Captioning Language Modelling +4

18,850

Paper
Code

Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks

no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Cross-Modal Retrieval Image Captioning +10

Paper
Add Code

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Ranked #1 on Visual Reasoning on NLVR2 Test

Cross-Modal Retrieval Image Captioning +11

18,836

Paper
Code

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

2 code implementations • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.

Ranked #2 on Image Retrieval on PhotoChat

Image Retrieval Retrieval +3

18,825

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.