no code implementations • LREC 2022 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
BERT models used in specialized domains all seem to be the result of a simple strategy: initializing with the original BERT and then resuming pre-training on a specialized corpus.
2 code implementations • COLING 2020 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum, Junichi Tsujii
Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers.
Ranked #1 on Semantic Similarity on ClinicalSTS
Clinical Concept Extraction Drug–drug Interaction Extraction +3
no code implementations • JEPTALNRECITAL 2020 • Hicham El Boukkouri
Les mod{\`e}les BERT employ{\'e}s en domaine sp{\'e}cialis{\'e} semblent tous d{\'e}couler d{'}une strat{\'e}gie assez simple : utiliser le mod{\`e}le BERT originel comme initialisation puis poursuivre l{'}entra{\^\i}nement de celuici sur un corpus sp{\'e}cialis{\'e}.
1 code implementation • ACL 2019 • Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum
Using pre-trained word embeddings in conjunction with Deep Learning models has become the {``}de facto{''} approach in Natural Language Processing (NLP).
Ranked #4 on Clinical Concept Extraction on 2010 i2b2/VA