Search Results for author: Linjun Yang

Found 14 papers, 10 papers with code

Multilingual E5 Text Embeddings: A Technical Report

1 code implementation • 8 Feb 2024 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.

18,338

Paper
Code

Improving Text Embeddings with Large Language Models

1 code implementation • 31 Dec 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.

Paper
Code

Large Search Model: Redefining Search Stack in the Era of LLMs

no code implementations • 23 Oct 2023 • Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others.

Language Modelling Large Language Model +3

Paper
Add Code

Allies: Prompting Large Language Model with Beam Search

1 code implementation • 24 May 2023 • Hao Sun, Xiao Liu, Yeyun Gong, Yan Zhang, Daxin Jiang, Linjun Yang, Nan Duan

With the advance of large language models (LLMs), the research field of LLM applications becomes more and more popular and the idea of constructing pipelines to accomplish complex tasks by stacking LLM API calls come true.

Language Modelling Large Language Model +3

Paper
Code

Inference with Reference: Lossless Acceleration of Large Language Models

1 code implementation • 10 Apr 2023 • Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei

We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references.

Language Modelling Large Language Model

3,179

Paper
Code

LEAD: Liberal Feature-based Distillation for Dense Retrieval

1 code implementation • 10 Dec 2022 • Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.

Document Ranking Knowledge Distillation +2

Paper
Code

Text Embeddings by Weakly-Supervised Contrastive Pre-training

1 code implementation • 7 Dec 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.

Ranked #11 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Only Connect Walls Dataset Task 1 (Grouping) Retrieval

18,338

Paper
Code

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

1 code implementation • 31 Aug 2022 • Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang

In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency.

Language Modelling Passage Retrieval +1

Paper
Code

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

1 code implementation • 6 Jul 2022 • Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training.

Language Modelling Passage Retrieval +1

18,338

Paper
Code

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

no code implementations • 21 Jan 2022 • Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang

This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways.

Information Retrieval Retrieval

Paper
Add Code

xMoCo: Cross Momentum Contrastive Learning for Open-Domain Question Answering

no code implementations • ACL 2021 • Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, Linjun Yang

Dense passage retrieval has been shown to be an effective approach for information retrieval tasks such as open domain question answering.

Contrastive Learning Open-Domain Question Answering +2

Paper
Add Code

Embedding-based Retrieval in Facebook Search

2 code implementations • 20 Jun 2020 • Jui-Ting Huang, ASHISH SHARMA, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, Linjun Yang

In this paper, we discuss the techniques for applying EBR to a Facebook Search system.

Retrieval

311

Paper
Code

Web-Scale Responsive Visual Search at Bing

no code implementations • 14 Feb 2018 • Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti

In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing.

Learning-To-Rank

Paper
Add Code

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

3 code implementations • CVPR 2018 • Kuang-Huei Lee, Xiaodong He, Lei Zhang, Linjun Yang

We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets.

Ranked #2 on Image Classification on Food-101N (using extra training data)

Classification General Classification +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.