1 code implementation • ACL 2022 • Lihu Chen, Gael Varoquaux, Fabian Suchanek
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants.
no code implementations • 7 Feb 2024 • Lihu Chen, Alexandre Perez-Lebel, Fabian M. Suchanek, Gaël Varoquaux
In this work, we construct a new evaluation dataset derived from a knowledge base to assess confidence scores given to answers of Mistral and LLaMA.
1 code implementation • 18 Jan 2024 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
The framework employs phrase type classification as an auxiliary task and incorporates character-level information more effectively into the phrase representation.
1 code implementation • 19 Oct 2023 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
Positional Encodings (PEs) are used to inject word-order information into transformer-based language models.
1 code implementation • 23 Aug 2023 • Fabian Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, Jules Soria
Knowledge Bases (KBs) find applications in many knowledge-intensive tasks and, most notably, in information retrieval.
1 code implementation • 30 Jun 2023 • Lihu Chen, Simon Razniewski, Gerhard Weikum
To evaluate our method and various baselines, we introduce a novel dataset, called MALT, rooted in Wikidata.
1 code implementation • 3 Feb 2023 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries.
1 code implementation • 15 Mar 2022 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words.
1 code implementation • 16 Dec 2020 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek
Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base.