Search Results for author: Ruqing Zhang

Found 42 papers, 16 papers with code

Listwise Generative Retrieval Models via a Sequential Learning Process

no code implementations19 Mar 2024 Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng

Specifically, we view the generation of a ranked docid list as a sequence learning process: at each step we learn a subset of parameters that maximizes the corresponding generation likelihood of the $i$-th docid given the (preceding) top $i-1$ docids.

Retrieval

CorpusBrain++: A Continual Generative Pre-Training Framework for Knowledge-Intensive Language Tasks

no code implementations26 Feb 2024 Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance.

Retrieval

A Unified Causal View of Instruction Tuning

no code implementations9 Feb 2024 Lu Chen, Wei Huang, Ruqing Zhang, Wei Chen, Jiafeng Guo, Xueqi Cheng

The key idea is to learn task-required causal factors and only use those to make predictions for a given task.

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

no code implementations16 Dec 2023 Yu-An Liu, Ruqing Zhang, Mingkun Zhang, Wei Chen, Maarten de Rijke, Jiafeng Guo, Xueqi Cheng

We decompose the robust ranking error into two components, i. e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness.

Adversarial Robustness Information Retrieval

RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation

1 code implementation16 Dec 2023 Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication.

Retrieval

CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval

no code implementations6 Nov 2023 Yinqiong Cai, Yixing Fan, Keping Bi, Jiafeng Guo, Wei Chen, Ruqing Zhang, Xueqi Cheng

The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently.

Retrieval

From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification

1 code implementation18 Oct 2023 Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.

Fact Verification Retrieval

Continual Learning for Generative Retrieval over Dynamic Corpora

no code implementations29 Aug 2023 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents.

Continual Learning Quantization +1

Inducing Causal Structure for Abstractive Text Summarization

1 code implementation24 Aug 2023 Lu Chen, Ruqing Zhang, Wei Huang, Wei Chen, Jiafeng Guo, Xueqi Cheng

The key idea is to reformulate the Variational Auto-encoder (VAE) to fit the joint distribution of the document and summary variables from the training corpus.

Abstractive Text Summarization

Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method

no code implementations19 Aug 2023 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query.

Adversarial Attack Attribute +2

On the Robustness of Generative Retrieval Models: An Out-of-Distribution Perspective

no code implementations22 Jun 2023 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Xueqi Cheng

Recently, we have witnessed generative retrieval increasingly gaining attention in the information retrieval (IR) field, which retrieves documents by directly generating their identifiers.

Information Retrieval Retrieval

Gen-IR @ SIGIR 2023: The First Workshop on Generative Information Retrieval

no code implementations5 Jun 2023 Gabriel Bénédict, Ruqing Zhang, Donald Metzler

Generative information retrieval (IR) has experienced substantial growth across multiple research communities (e. g., information retrieval, computer vision, natural language processing, and machine learning), and has been highly visible in the popular press.

Answer Generation Information Retrieval +2

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

no code implementations24 May 2023 Yubao Tang, Ruqing Zhang, Jiafeng Guo, Jiangui Chen, Zuowei Zhu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

Specifically, we assign each document an Elaborative Description based on the query generation technique, which is more meaningful than a string of integers in the original DSI; and (2) For the associations between a document and its identifier, we take inspiration from Rehearsal Strategies in human learning.

Memorization Retrieval

Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models

1 code implementation28 Apr 2023 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic.

Information Retrieval Retrieval

A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning

1 code implementation28 Apr 2023 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yiqun Liu, Yixing Fan, Xueqi Cheng

Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task.

Retrieval Sentence

Visual Named Entity Linking: A New Dataset and A Baseline

1 code implementation9 Nov 2022 Wenxiang Sun, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i. e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL).

Entity Linking Image Retrieval +3

LegoNet: A Fast and Exact Unlearning Architecture

no code implementations28 Oct 2022 Sihao Yu, Fei Sun, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

However, such a strategy typically leads to a loss in model performance, which poses the challenge that increasing the unlearning efficiency while maintaining acceptable performance.

Machine Unlearning Representation Learning

Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models

1 code implementation14 Sep 2022 Chen Wu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Yixing Fan, Maarten de Rijke, Xueqi Cheng

A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack.

Information Retrieval Retrieval

Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models

no code implementations12 Sep 2022 Yinqiong Cai, Jiafeng Guo, Yixing Fan, Qingyao Ai, Ruqing Zhang, Xueqi Cheng

When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse.

Information Retrieval Retrieval

Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval

no code implementations21 Aug 2022 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng

Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters.

Information Retrieval Re-Ranking +1

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

no code implementations21 Aug 2022 Xinyu Ma, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

Empirical results show that our method can significantly outperform the state-of-the-art autoencoder-based language models and other pre-trained models for dense retrieval.

Information Retrieval Retrieval

CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks

1 code implementation16 Aug 2022 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, Xueqi Cheng

We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning.

Retrieval

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction

1 code implementation22 Apr 2022 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng

% Therefore, in this work, we propose to drop out the decoder and introduce a novel contrastive span prediction task to pre-train the encoder alone.

Contrastive Learning Information Retrieval +2

GERE: Generative Evidence Retrieval for Fact Verification

1 code implementation12 Apr 2022 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set.

Claim Verification Fact Verification +2

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

no code implementations4 Apr 2022 Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list.

Document Ranking Information Retrieval +1

A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty

no code implementations CVPR 2022 Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, Xueqi Cheng

By reducing the weights of the majority classes, such instances would become more difficult to learn and hurt the overall performance consequently.

imbalanced classification

Pre-training Methods in Information Retrieval

no code implementations27 Nov 2021 Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.

Information Retrieval Re-Ranking +1

FedMatch: Federated Learning Over Heterogeneous Question Answering Data

2 code implementations11 Aug 2021 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets.

Federated Learning Privacy Preserving +1

A Discriminative Semantic Ranker for Question Retrieval

no code implementations18 Jul 2021 Yinqiong Cai, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Yanyan Lan, Xueqi Cheng

However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance.

Question Answering Re-Ranking +1

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval

1 code implementation20 Apr 2021 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, Xueqi Cheng

The basic idea of PROP is to construct the \textit{representative words prediction} (ROP) task for pre-training inspired by the query likelihood model.

Information Retrieval Language Modelling +1

Semantic Models for the First-stage Retrieval: A Comprehensive Review

1 code implementation8 Mar 2021 Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, Xueqi Cheng

We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development.

Re-Ranking Retrieval +1

A Linguistic Study on Relevance Modeling in Information Retrieval

no code implementations1 Mar 2021 Yixing Fan, Jiafeng Guo, Xinyu Ma, Ruqing Zhang, Yanyan Lan, Xueqi Cheng

We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question.

Information Retrieval Natural Language Understanding +2

Learning to Truncate Ranked Lists for Information Retrieval

no code implementations25 Feb 2021 Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng

One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search.

Information Retrieval Retrieval

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

1 code implementation20 Oct 2020 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, Xueqi Cheng

Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR).

Information Retrieval Language Modelling +1

Query Understanding via Intent Description Generation

1 code implementation25 Aug 2020 Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query.

Clustering Information Retrieval +1

Continual Domain Adaptation for Machine Reading Comprehension

no code implementations25 Aug 2020 Lixin Su, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To tackle such a challenge, in this work, we introduce the \textit{Continual Domain Adaptation} (CDA) task for MRC.

Continual Learning Domain Adaptation +2

Match$^2$: A Matching over Matching Model for Similar Question Identification

no code implementations21 Jun 2020 Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xue-Qi Cheng, Hui Jiang, Xiaozhao Wang

However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i. e., there could be different ways to ask a same question or different questions sharing similar expressions.

Community Question Answering

Outline Generation: Understanding the Inherent Content Structure of Documents

no code implementations24 May 2019 Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings.

Structured Prediction

Spherical Paragraph Model

no code implementations18 Jul 2017 Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu, Xue-Qi Cheng

Representing texts as fixed-length vectors is central to many language processing tasks.

Representation Learning Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.