no code implementations • EACL (AdaptNLP) 2021 • Tianyu Chen, Shaohan Huang, Furu Wei, JianXin Li
In unsupervised domain adaptation, we aim to train a model that works well on a target domain when provided with labeled source samples and unlabeled target samples.
no code implementations • 8 May 2024 • Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.
no code implementations • 23 Apr 2024 • Xun Wu, Shaohan Huang, Furu Wei
Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts.
no code implementations • 23 Apr 2024 • Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei
These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form.
1 code implementation • 21 Apr 2024 • Xun Wu, Shaohan Huang, Furu Wei
LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques.
1 code implementation • 28 Feb 2024 • Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune large language models (LLMs).
4 code implementations • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).
no code implementations • 24 Feb 2024 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Large language models (LLMs) have emerged as a promising alternative to expensive human evaluations.
no code implementations • 21 Feb 2024 • Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, Qi Zhang
The remarkable capability of large language models (LLMs) for in-context learning (ICL) needs to be activated by demonstration examples.
no code implementations • 20 Feb 2024 • Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang Sui, Benyou Wang, Wai Lam, Furu Wei
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs).
no code implementations • 19 Feb 2024 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio.
1 code implementation • 14 Jan 2024 • Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method.
1 code implementation • 20 Oct 2023 • Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability.
2 code implementations • 17 Oct 2023 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
no code implementations • 23 Sep 2023 • Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation.
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.
1 code implementation • 18 Sep 2023 • Daixuan Cheng, Shaohan Huang, Furu Wei
Taken inspiration from human learning via reading comprehension--practice after reading improves the ability to answer questions based on the learned knowledge--we propose a simple method for transforming raw corpora into reading comprehension texts.
no code implementations • 3 Sep 2023 • Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Carol Fung, Hailong Yang, Depei Qian
In this work, we proposed LogGPT, a log-based anomaly detection framework based on ChatGPT.
1 code implementation • 31 Jul 2023 • Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, Fuzhen Zhuang
We also fine-tune LLMs with current contrastive learning approach, and the 2. 7B OPT model, incorporating our prompt-based method, surpasses the performance of 4. 8B ST5, achieving the new state-of-the-art results on STS tasks.
Ranked #1 on Semantic Textual Similarity on STS12
8 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.
3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Scaling sequence length has become a critical demand in the era of large language models.
2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
Ranked #11 on Visual Question Answering on ViP-Bench
no code implementations • 31 May 2023 • Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, JianXin Li
Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals.
1 code implementation • 16 May 2023 • Ziheng Li, Shaohan Huang, Zihan Zhang, Zhi-Hong Deng, Qiang Lou, Haizhen Huang, Jian Jiao, Furu Wei, Weiwei Deng, Qi Zhang
Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding.
no code implementations • 6 May 2023 • Beiduo Chen, Shaohan Huang, Zihan Zhang, Wu Guo, ZhenHua Ling, Haizhen Huang, Furu Wei, Weiwei Deng, Qi Zhang
Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision.
1 code implementation • 15 Mar 2023 • Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization.
1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
5 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei
Position modeling plays a critical role in Transformers.
1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li
Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.
1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Large Transformers have achieved state-of-the-art performance across many tasks.
no code implementations • 26 Oct 2022 • Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.
1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei
Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.
4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
no code implementations • 19 Jul 2022 • Yuan Xie, Shaohan Huang, Tianyu Chen, Furu Wei
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on Image Captioning on nocaps val
no code implementations • Findings (ACL) 2022 • Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).
no code implementations • 1 Jun 2022 • Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity.
2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
We also present a comprehensive analysis on the representation and routing behaviors of our models.
6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
1 code implementation • 15 Jan 2022 • Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang
In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.
1 code implementation • 12 Jan 2022 • Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, Qi Zhang
We propose PromptBERT, a novel contrastive learning method for learning better sentence representation.
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
1 code implementation • 21 Oct 2021 • Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge.
2 code implementations • EMNLP 2021 • Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.
Ranked #1 on Zero-Shot Cross-Lingual Transfer on XTREME
no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.
2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.
1 code implementation • ACL 2021 • Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei
The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences.
2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.
no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou
Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang
Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense.
1 code implementation • EMNLP 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang
Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation.
2 code implementations • COLING 2020 • Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.
1 code implementation • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
16 code implementations • 31 Dec 2019 • Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
Ranked #7 on Relation Extraction on FUNSD
2 code implementations • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
no code implementations • 30 Sep 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou
In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences.
no code implementations • 12 Sep 2018 • Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou
In this paper, we study a novel task that learns to compose music from natural language.
1 code implementation • ACL 2018 • Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao
In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences.
Ranked #9 on Extractive Text Summarization on CNN / Daily Mail
no code implementations • 21 Jun 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou
An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct.
3 code implementations • 19 Jun 2018 • Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou
Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses.
no code implementations • EACL 2017 • Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu
This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.