Search Results for author: Shuohang Wang

Found 60 papers, 37 papers with code

Multi-LoRA Composition for Image Generation

no code implementations26 Feb 2024 Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.

Denoising Image Generation

SciAgent: Tool-augmented Language Models for Scientific Reasoning

no code implementations18 Feb 2024 Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen

To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning.

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

1 code implementation19 Oct 2023 Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks.

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

no code implementations19 Oct 2023 Zhihan Zhang, Shuohang Wang, Wenhao Yu, Yichong Xu, Dan Iter, Qingkai Zeng, Yang Liu, Chenguang Zhu, Meng Jiang

Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning.

Sparse Modular Activation for Efficient Sequence Modeling

1 code implementation NeurIPS 2023 Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai

To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM.

Chunking Long-range modeling

In-Context Demonstration Selection with Cross Entropy Difference

1 code implementation24 May 2023 Dan Iter, Reid Pryzant, Ruochen Xu, Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu

Our method is based on the observation that the effectiveness of in-context demonstrations negatively correlates with the perplexity of the test example by a language model that was finetuned on that demonstration.

Language Modelling Text Generation

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

1 code implementation23 May 2023 Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, Mohit Iyyer

PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance.

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

no code implementations22 May 2023 Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang Zhu, Michael Zeng

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications.

LMGQS: A Large-scale Dataset for Query-focused Summarization

no code implementations22 May 2023 Ruochen Xu, Song Wang, Yang Liu, Shuohang Wang, Yichong Xu, Dan Iter, Chenguang Zhu, Michael Zeng

We hypothesize that there is a hidden query for each summary sentence in a generic summarization annotation, and we utilize a large-scale pretrained language model to recover it.

Language Modelling Query-focused Summarization +1

Small Models are Valuable Plug-ins for Large Language Models

1 code implementation15 May 2023 Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, Julian McAuley

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware.

In-Context Learning

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

2 code implementations29 Mar 2023 Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu

In this work, we present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of NLG outputs.

Dialogue Generation nlg evaluation +1

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

no code implementations19 Dec 2022 Soumya Sanyal, Yichong Xu, Shuohang Wang, ZiYi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, Xiang Ren

Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions.

Data Augmentation Language Modelling +3

Retrieval Augmentation for Commonsense Reasoning: A Unified Approach

1 code implementation23 Oct 2022 Wenhao Yu, Chenguang Zhu, Zhihan Zhang, Shuohang Wang, Zhuosheng Zhang, Yuwei Fang, Meng Jiang

However, applying such methods to commonsense reasoning tasks faces two unique challenges, i. e., the lack of a general large-scale corpus for retrieval and a corresponding effective commonsense retriever.

Retrieval

Prompting GPT-3 To Be Reliable

1 code implementation17 Oct 2022 Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang

While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.

Fairness Language Modelling

Task Compass: Scaling Multi-task Pre-training with Task Prefix

1 code implementation12 Oct 2022 Zhuosheng Zhang, Shuohang Wang, Yichong Xu, Yuwei Fang, Wenhao Yu, Yang Liu, Hai Zhao, Chenguang Zhu, Michael Zeng

Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models.

Common Sense Reasoning Data Augmentation +4

Generate rather than Retrieve: Large Language Models are Strong Context Generators

1 code implementation21 Sep 2022 Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang

We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.

Language Modelling Large Language Model +1

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

1 code implementation22 May 2022 Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

Attribute Automatic Speech Recognition +6

CLIP-Event: Connecting Text and Images with Event Structures

1 code implementation CVPR 2022 Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang

Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text.

Contrastive Learning Event Extraction +2

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation8 Dec 2021 Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering (VQA)

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations6 Dec 2021 Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

 Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

1 code implementation4 Nov 2021 Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li

In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.

Adversarial Attack Adversarial Robustness +1

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

1 code implementation Findings (ACL) 2022 Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng, Meng Jiang

In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary.

Language Modelling Masked Language Modeling +1

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

no code implementations ACL 2022 Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng

The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module.

Answer Generation Open-Domain Question Answering +3

NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset

1 code implementation Findings (EMNLP) 2021 Qiyuan Zhang, Lei Wang, Sicheng Yu, Shuohang Wang, Yang Wang, Jing Jiang, Ee-Peng Lim

While diverse question answering (QA) datasets have been proposed and contributed significantly to the development of deep learning models for QA tasks, the existing datasets fall short in two aspects.

Graph Question Answering Question Answering

The Elastic Lottery Ticket Hypothesis

1 code implementation NeurIPS 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang

Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

no code implementations1 Jan 2021 Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu

By incorporating different feature maps after the masking, we can distill better features to help model generalization.

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

1 code implementation ACL 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.

Model Compression

Counterfactual Variable Control for Robust and Interpretable Question Answering

1 code implementation12 Oct 2020 Sicheng Yu, Yulei Niu, Shuohang Wang, Jing Jiang, Qianru Sun

We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction.

Causal Inference counterfactual +3

Cross-Thought for Sentence Encoder Pre-training

1 code implementation EMNLP 2020 Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.

Information Retrieval Language Modelling +5

Multi-Fact Correction in Abstractive Text Summarization

no code implementations EMNLP 2020 Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.

Abstractive Text Summarization News Summarization +1

Contrastive Distillation on Intermediate Representations for Language Model Compression

1 code implementation EMNLP 2020 Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.

Knowledge Distillation Language Modelling +1

Accelerating Real-Time Question Answering via Question Generation

no code implementations10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu, Chenguang Zhu

Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment.

Data Augmentation Multi-Task Learning +3

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

1 code implementation10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu

During inference, the model makes predictions based on the text input in the target language and its translation in the source language.

NER POS +5

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

3 code implementations EMNLP 2020 Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li

In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.

Adversarial Text Question Answering +3

Compositional De-Attention Networks

no code implementations NeurIPS 2019 Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui

Attentional models are distinctly characterized by their ability to learn relative importance, i. e., assigning a different weight to input values.

Machine Translation Natural Language Inference +4

What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?

no code implementations28 Oct 2019 Chenglei Si, Shuohang Wang, Min-Yen Kan, Jing Jiang

Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2. 0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead to correct prediction, instead of learning semantic understanding and reasoning; and 2) BERT does not need correct syntactic information to solve the task; 3) there exists artifacts in these datasets such that they can be solved even without the full context.

Multiple-choice Reading Comprehension

A Co-Matching Model for Multi-choice Reading Comprehension

1 code implementation ACL 2018 Shuohang Wang, Mo Yu, Shiyu Chang, Jing Jiang

Multi-choice reading comprehension is a challenging task, which involves the matching between a passage and a question-answer pair.

Reading Comprehension

A Compare-Aggregate Model for Matching Text Sequences

2 code implementations6 Nov 2016 Shuohang Wang, Jing Jiang

We particularly focus on the different comparison functions we can use to match two vectors.

Answer Selection Reading Comprehension

Cannot find the paper you are looking for? You can Submit a new open access paper.