Search Results for author: Wenhui Wang

Found 34 papers, 22 papers with code

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Language Modelling Natural Language Understanding +1

18,262

Paper
Code

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

1 code implementation • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

Paper
Code

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

no code implementations • 6 Dec 2023 • Wenhui Wang, Shuming Ma, Hanwen Xu, Naoto Usuyama, Jiayu Ding, Hoifung Poon, Furu Wei

This technical report presents LongViT, a vision Transformer that can process gigapixel images in an end-to-end manner.

Survival Prediction whole slide images

Paper
Add Code

Kosmos-2.5: A Multimodal Literate Model

no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.

Reading Comprehension Text Generation

Paper
Add Code

LongNet: Scaling Transformers to 1,000,000,000 Tokens

3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

Scaling sequence length has become a critical demand in the era of large language models.

18,262

Paper
Code

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Ranked #11 on Visual Question Answering on ViP-Bench

Image Captioning In-Context Learning +8

18,265

Paper
Code

Language Is Not All You Need: Aligning Perception with Language Models

1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.

Image Captioning Language Modelling +4

18,262

Paper
Code

Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks

no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Cross-Modal Retrieval Image Captioning +10

Paper
Add Code

TorchScale: Transformers at Scale

1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

Large Transformers have achieved state-of-the-art performance across many tasks.

Language Modelling Machine Translation +1

2,911

Paper
Code

Foundation Transformers

4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

A big convergence of model architectures across language, vision, speech, and multimodal is emerging.

Language Modelling Machine Translation +1

18,262

Paper
Code

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Ranked #1 on Visual Reasoning on NLVR2 Test

Cross-Modal Retrieval Image Captioning +11

18,270

Paper
Code

Language Models are General-Purpose Interfaces

1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Ranked #2 on Image Captioning on nocaps val

Causal Language Modeling Few-Shot Learning +6

18,262

Paper
Code

VL-BEiT: Generative Vision-Language Pretraining

no code implementations • 2 Jun 2022 • Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei

Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.

Image Classification Language Modelling +7

Paper
Add Code

TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation

no code implementations • CVPR 2022 • Weihua He, Kaichao You, Zhendong Qiao, Xu Jia, Ziyang Zhang, Wenhui Wang, Huchuan Lu, Yaoyuan Wang, Jianxing Liao

Since event camera is a novel sensor, its potential has not been fulfilled due to the lack of processing algorithms.

Event-based vision

Paper
Add Code

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao

Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.

Inductive Bias Knowledge Distillation +1

Paper
Add Code

Distilled Dual-Encoder Model for Vision-Language Understanding

2 code implementations • 16 Dec 2021 • Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei

We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.

Question Answering Visual Entailment +2

Paper
Code

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

2 code implementations • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.

Ranked #2 on Image Retrieval on PhotoChat

Image Retrieval Retrieval +3

18,266

Paper
Code

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning

1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei

Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.

Abstractive Text Summarization Question Generation +2

18,262

Paper
Code

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei

In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.

Knowledge Distillation

Paper
Add Code

Consistency Regularization for Cross-Lingual Fine-Tuning

1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.

Machine Translation Question Answering +3

Paper
Code

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Relation XLM-R

11,374

Paper
Code

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

4 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.

Ranked #16 on Zero-Shot Cross-Lingual Transfer on XTREME

Contrastive Learning Cross-Lingual Transfer +2

18,273

Paper
Code

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu

Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.

Ranked #189 on Question Answering on SQuAD1.1

Few-Shot Learning Question Answering

Paper
Code

Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and Differences

1 code implementation • 2 May 2020 • Weihua He, Yujie Wu, Lei Deng, Guoqi Li, Haoyu Wang, Yang Tian, Wei Ding, Wenhui Wang, Yuan Xie

Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion.

Ranked #11 on Gesture Recognition on DVS128 Gesture

Fairness Gesture Recognition

Paper
Code

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

3 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Language Modelling +3

18,262

Paper
Code

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

1 code implementation • NeurIPS 2020 • Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou

The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).

Ranked #8 on Zero-shot Text Search on BEIR

Zero-shot Text Search

18,262

Paper
Code

Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension

no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou

Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.

Machine Reading Comprehension

Paper
Add Code

Cross-Lingual Natural Language Generation via Pre-Training

1 code implementation • 23 Sep 2019 • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang

In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.

Abstractive Text Summarization Machine Translation +5

128

Paper
Code

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

no code implementations • ACL 2019 • Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu

We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.

Data Augmentation Machine Reading Comprehension +2

Paper
Add Code

Unified Language Model Pre-training for Natural Language Understanding and Generation

9 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +7

18,262

Paper
Code

Improved Dependency Parsing using Implicit Word Connections Learned from Unlabeled Data

no code implementations • EMNLP 2018 • Wenhui Wang, Baobao Chang, Mairgup Mansur

Pre-trained word embeddings and language model have been shown useful in a lot of tasks.

Dependency Parsing Feature Engineering +4

Paper
Add Code

Multiway Attention Networks for Modeling Sentence Pairs

1 code implementation • IJCAI 2018 • Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou

Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection.

Ranked #11 on Paraphrase Identification on Quora Question Pairs (Accuracy metric)