Search Results for author: Sinong Wang

Found 26 papers, 10 papers with code

Phonetic and Lexical Discovery of a Canine Language using HuBERT

no code implementations25 Feb 2024 Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization.

Effective Long-Context Scaling of Foundation Models

1 code implementation27 Sep 2023 Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Continual Pretraining Language Modelling

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

1 code implementation30 Aug 2023 Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang

As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues.

2k 4k +1

Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefixes

no code implementations22 May 2023 Kuan-Hao Huang, Liang Tan, Rui Hou, Sinong Wang, Amjad Almahairi, Ruty Rinott

Fine-tuning a large pre-trained language model for each downstream task causes computational burdens in the inference time due to several times of forward passes.

Language Modelling

Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

1 code implementation CVPR 2023 Ajinkya Tejankar, Maziar Sanjabi, Qifan Wang, Sinong Wang, Hamed Firooz, Hamed Pirsiavash, Liang Tan

It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a backdoor that the adversary can exploit.

Data Poisoning Self-Supervised Learning

Representation Deficiency in Masked Language Modeling

1 code implementation4 Feb 2023 Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.

Language Modelling Masked Language Modeling

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

no code implementations4 Nov 2022 Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang

We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.

Active Learning

BayesFormer: Transformer with Uncertainty Estimation

no code implementations2 Jun 2022 Karthik Abinav Sankararaman, Sinong Wang, Han Fang

Transformer has become ubiquitous due to its dominant performance in various NLP and image processing tasks.

Active Learning Language Modelling +3

Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

no code implementations Findings (ACL) 2022 Khalil Mrini, Shaoliang Nie, Jiatao Gu, Sinong Wang, Maziar Sanjabi, Hamed Firooz

Without the use of a knowledge base or candidate sets, our model sets a new state of the art in two benchmark datasets of entity linking: COMETA in the biomedical domain, and AIDA-CoNLL in the news domain.

Entity Linking Re-Ranking

IDPG: An Instance-Dependent Prompt Generation Method

no code implementations NAACL 2022 Zhuofeng Wu, Sinong Wang, Jiatao Gu, Rui Hou, Yuxiao Dong, V. G. Vinod Vydiswaran, Hao Ma

Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.

Language Modelling Natural Language Understanding +2

Reducing Target Group Bias in Hate Speech Detectors

no code implementations7 Dec 2021 Darsh J Shah, Sinong Wang, Han Fang, Hao Ma, Luke Zettlemoyer

The ubiquity of offensive and hateful content on online fora necessitates the need for automatic solutions that detect such content competently across target groups.

text-classification Text Classification

Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models

1 code implementation NAACL 2022 Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech

Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time.

Domain Generalization Privacy Preserving +4

Entailment as Few-Shot Learner

3 code implementations29 Apr 2021 Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma

Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.

Contrastive Learning Data Augmentation +8

On Unifying Misinformation Detection

1 code implementation NAACL 2021 Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa

In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup.

Few-Shot Learning Misinformation

To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks

no code implementations ACL 2020 Sinong Wang, Madian Khabsa, Hao Ma

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.

Language Modelling text-classification +1

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

no code implementations15 Jun 2020 Sinong Wang, Madian Khabsa, Hao Ma

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.

Language Modelling text-classification +1

Linformer: Self-Attention with Linear Complexity

15 code implementations8 Jun 2020 Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.

Language Modelling

Language Models as Fact Checkers?

no code implementations WS 2020 Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, Madian Khabsa

Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data.

Common Sense Reasoning Language Modelling +2

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

no code implementations16 Apr 2018 Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff

We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm.

Decision Making

A New Alternating Direction Method for Linear Programming

no code implementations NeurIPS 2017 Sinong Wang, Ness Shroff

It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\|\mathbf{A}\|_F^2+mn)\log(1/\epsilon))$.

Cannot find the paper you are looking for? You can Submit a new open access paper.