Search Results for author: Jingwen Lu

Found 4 papers, 3 papers with code

LEAD: Liberal Feature-based Distillation for Dense Retrieval

1 code implementation10 Dec 2022 Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.

Document Ranking Knowledge Distillation +2

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

1 code implementation21 Oct 2022 Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen

Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.

Retrieval Text Retrieval

PROD: Progressive Distillation for Dense Retrieval

1 code implementation27 Sep 2022 Zhenghao Lin, Yeyun Gong, Xiao Liu, Hang Zhang, Chen Lin, Anlei Dong, Jian Jiao, Jingwen Lu, Daxin Jiang, Rangan Majumder, Nan Duan

It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student.

Knowledge Distillation Natural Questions +1

Aligning the Pretraining and Finetuning Objectives of Language Models

no code implementations5 Feb 2020 Nuo Wang Pierse, Jingwen Lu

We found that, with objective alignment, our 768 by 3 and 512 by 3 transformer language models can reach accuracy of 83. 9%/82. 5% for concept-of-interest tagging and 73. 8%/70. 2% for acronym detection using only 200 finetuning examples per task, outperforming the 768 by 3 model pretrained without objective alignment by +4. 8%/+3. 4% and +9. 9%/+6. 3%.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.