Search Results for author: Paul Bennett

Found 21 papers, 10 papers with code

Say ‘YES’ to Positivity: Detecting Toxic Language in Workplace Communications

no code implementations • Findings (EMNLP) 2021 • Meghana Moorthy Bhat, Saghar Hosseini, Ahmed Hassan Awadallah, Paul Bennett, Weisheng Li

Specifically, the lack of corpus, sparsity of toxicity in enterprise emails, and well-defined criteria for annotating toxic conversations have prevented researchers from addressing the problem at scale.

Paper
Add Code

Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder

1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Decoder Language Modelling +5

Paper
Code

Axiomatic Preference Modeling for Longform Question Answering

no code implementations • 2 Dec 2023 • Corby Rosset, Guoqing Zheng, Victor Dibia, Ahmed Awadallah, Paul Bennett

The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model.

Question Answering

Paper
Add Code

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

no code implementations • 1 May 2023 • Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e. g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world.

Mixed Reality Scene Generation +1

Paper
Add Code

Understanding Causality with Large Language Models: Feasibility and Opportunities

no code implementations • 11 Apr 2023 • Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question.

Decision Making

Paper
Add Code

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

no code implementations • 7 Feb 2023 • Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.

Retrieval Zero-shot Generalization

Paper
Add Code

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

no code implementations • 13 Apr 2022 • Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.

Denoising

Paper
Add Code

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation • ICLR 2022 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

Paper
Code

Neural Approaches to Conversational Information Retrieval

no code implementations • 13 Jan 2022 • Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell

A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form.

Information Retrieval Retrieval

Paper
Add Code

Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

1 code implementation • ACL 2021 • Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity.

Reading Comprehension Text Simplification

Paper
Code

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

1 code implementation • 18 Feb 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, TieYan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Decoder Language Modelling +4

Paper
Code

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations • NeurIPS 2021 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

120

Paper
Code

Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

1 code implementation • ACL 2021 • Si Sun, Yingzhuo Qian, Zhenghao Liu, Chenyan Xiong, Kaitao Zhang, Jie Bao, Zhiyuan Liu, Paul Bennett

To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains.

Information Retrieval Learning-To-Rank +1

Paper
Code

Leveraging Structured Metadata for Improving Question Answering on the Web

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Xinya Du, Ahmed Hassan Awadallah, Adam Fourney, Robert Sim, Paul Bennett, Claire Cardie

We show that leveraging metadata information from web pages can improve the performance of models for answer passage selection/reranking.

Question Answering

Paper
Add Code

CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

3 code implementations • 3 Nov 2020 • Chenyan Xiong, Zhenghao Liu, Si Sun, Zhuyun Dai, Kaitao Zhang, Shi Yu, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett

Neural rankers based on deep pretrained language models (LMs) have been shown to improve many information retrieval benchmarks.

Domain Adaptation Few-Shot Learning +2

444

Paper
Code

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

5 code implementations • ICLR 2021 • Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk

In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.

Ranked #7 on Passage Retrieval on Natural Questions

Contrastive Learning Passage Retrieval +3

346

Paper
Code

Knowledge-Aware Language Model Pretraining

no code implementations • 29 Jun 2020 • Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett, Saurabh Tiwary

How much knowledge do pretrained language models hold?

Knowledge Probing Language Modelling +1

Paper
Add Code

Few-Shot Generative Conversational Query Rewriting

1 code implementation • 9 Jun 2020 • Shi Yu, Jiahua Liu, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, Zhiyuan Liu

Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems.

Information Retrieval Retrieval +2

Paper
Code

Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention

1 code implementation • ICLR 2020 • Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary

Transformers have achieved new heights modeling natural language as a sequence of text tokens.

Ranked #42 on Question Answering on HotpotQA

Fact Verification Multi-hop Question Answering +1

Paper
Code

On Domain Transfer When Predicting Intent in Text

no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Petar Stojanov, Ahmed Hassan Awadallah, Paul Bennett, Saghar Hosseini

In many domains, especially enterprise text analysis, there is an abundance of data which can be used for the development of new AI-powered intelligent experiences to improve people's productivity.

Paper
Add Code

GATEtoGerManC: A GATE-based Annotation Pipeline for Historical German

no code implementations • LREC 2012 • Silke Scheible, Richard J. Whitt, Martin Durrell, Paul Bennett

We describe a new GATE-based linguistic annotation pipeline for Early Modern German, which can be used to annotate historical texts with word tokens, sentence boundaries, lemmas, and POS tags.

POS POS Tagging +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.