Search Results for author: Xing Wu

Found 26 papers, 15 papers with code

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

1 code implementation20 Jan 2024 Guangyuan Ma, Xing Wu, Zijia Lin, Songlin Hu

In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints.

Passage Retrieval Retrieval +1

HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

1 code implementation6 Sep 2023 Zhenpeng Su, Xing Wu, Wei Zhou, Guangyuan Ma, Songlin Hu

ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify.

Question Answering

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

no code implementations16 Aug 2023 Guangyuan Ma, Xing Wu, Peng Wang, Zijia Lin, Songlin Hu

Concretely, we leverage the capabilities of LLMs for document expansion, i. e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval.

Contrastive Learning Language Modelling +3

Dial-MAE: ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems

1 code implementation7 Jun 2023 Zhenpeng Su, Xing Wu, Wei Zhou, Guangyuan Ma, Songlin Hu

Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history.

Conversational Response Selection Language Modelling +2

PUNR: Pre-training with User Behavior Modeling for News Recommendation

1 code implementation25 Apr 2023 Guangyuan Ma, Hongtao Liu, Xing Wu, Wanhui Qian, Zhepeng Lv, Qing Yang, Songlin Hu

Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors.

News Recommendation Unsupervised Pre-training

CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

no code implementations20 Apr 2023 Guangyuan Ma, Xing Wu, Peng Wang, Songlin Hu

Siamese or fully separated dual-encoders are often adopted as basic retrieval architecture in the pre-training and fine-tuning stages for encoding queries and passages into their latent embedding spaces.

Passage Retrieval Retrieval

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

no code implementations5 Apr 2023 Xing Wu, Guangyuan Ma, Peng Wang, Meng Lin, Zijia Lin, Fuzheng Zhang, Songlin Hu

As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages.

Passage Retrieval Retrieval +1

Query-as-context Pre-training for Dense Passage Retrieval

2 code implementations19 Dec 2022 Xing Wu, Guangyuan Ma, Wanhui Qian, Zijia Lin, Songlin Hu

Recently, methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training.

Contrastive Learning Passage Retrieval +1

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

1 code implementation13 Oct 2022 Xing Wu, Chaochen Gao, Zijia Lin, Zhongyuan Wang, Jizhong Han, Songlin Hu

Sparse sampling is also likely to miss important frames corresponding to some text portions, resulting in textual redundancy.

Contrastive Learning Retrieval +1

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

2 code implementations8 Oct 2022 Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu

Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer.

Contrastive Learning Language Modelling +5

Pathway to Future Symbiotic Creativity

no code implementations18 Aug 2022 Yike Guo, Qifeng Liu, Jie Chen, Wei Xue, Jie Fu, Henrik Jensen, Fernando Rosas, Jeffrey Shaw, Xing Wu, Jiji Zhang, Jianliang Xu

This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation.

Philosophy

ConTextual Masked Auto-Encoder for Dense Passage Retrieval

2 code implementations16 Aug 2022 Xing Wu, Guangyuan Ma, Meng Lin, Zijia Lin, Zhongyuan Wang, Songlin Hu

Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i. e., vectors) of the query and the passages.

Passage Retrieval Retrieval +1

Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction

1 code implementation8 Jul 2022 Xing Wu, Qiulian Fang

In the cancer survival prediction for TCGA cases, SAEsurv-net addresses the curse of dimensionality with a two-stage dimensionality reduction strategy and handles multi-omics heterogeneity with a stacked autoencoder model.

Data Integration Dimensionality Reduction +1

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

1 code implementation ACL 2022 Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang, Songlin Hu

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary.

Data Augmentation Language Modelling +3

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

1 code implementation10 Dec 2021 Chaochen Gao, Xing Wu, Peng Wang, Jue Wang, Liangjun Zang, Zhongyuan Wang, Songlin Hu

To tackle that, we propose an effective knowledge distillation framework for contrastive sentence embeddings, termed DistilCSE.

Contrastive Learning Knowledge Distillation +5

TransAug: Translate as Augmentation for Sentence Embeddings

no code implementations30 Oct 2021 Jue Wang, Haofan Wang, Xing Wu, Chaochen Gao, Debing Zhang

In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings.

Contrastive Learning Data Augmentation +4

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

2 code implementations COLING 2022 Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, Songlin Hu

Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair.

Contrastive Learning Data Augmentation +5

Distilling Knowledge from Pre-trained Language Models via Text Smoothing

no code implementations8 May 2020 Xing Wu, Yibing Liu, Xiangyang Zhou, dianhai yu

As an alternative, we propose a new method for BERT distillation, i. e., asking the teacher to generate smoothed word ids, rather than labels, for teaching the student model in knowledge distillation.

Knowledge Distillation Language Modelling

Data Augmentation for Copy-Mechanism in Dialogue State Tracking

no code implementations22 Feb 2020 Xiaohui Song, Liangjun Zang, Yipeng Su, Xing Wu, Jizhong Han, Songlin Hu

While several state-of-the-art approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i. e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set).

Data Augmentation Dialogue State Tracking

TransSent: Towards Generation of Structured Sentences with Discourse Marker

no code implementations5 Sep 2019 Xing Wu, Dongjun Wei, Liangjun Zang, Jizhong Han, Songlin Hu

Automatic and human evaluation results show that TransSent can generate structured sentences with high quality, and has certain scalability in different tasks.

Dialogue Generation Sentence

Imbalanced Sentiment Classification Enhanced with Discourse Marker

no code implementations28 Mar 2019 Tao Zhang, Xing Wu, Meng Lin, Jizhong Han, Songlin Hu

Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data.

Classification Data Augmentation +3

Conditional BERT Contextual Augmentation

5 code implementations17 Dec 2018 Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, Songlin Hu

BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model.

Data Augmentation Language Modelling +1

LUCSS: Language-based User-customized Colourization of Scene Sketches

no code implementations30 Aug 2018 Changqing Zou, Haoran Mo, Ruofei Du, Xing Wu, Chengying Gao, Hongbo Fu

We introduce LUCSS, a language-based system for interactive col- orization of scene sketches, based on their semantic understanding.

Colorization

Cannot find the paper you are looking for? You can Submit a new open access paper.