no code implementations • EMNLP 2020 • Oren Barkan, Avi Caciularu, Ido Dagan
We propose the novel \textit{Within-Between} Relation model for recognizing lexical-semantic relations between words.
no code implementations • 10 Mar 2024 • Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty
Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear.
1 code implementation • 11 Jan 2024 • Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva
Inspecting the information encoded in hidden representations of large language models (LLMs) can explain models' behavior and verify their alignment with human values.
1 code implementation • 20 Oct 2023 • Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat
Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc.
1 code implementation • 18 Oct 2023 • Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel
In this paper, we explore the behavior of LLMs when presented with (un)answerable queries.
no code implementations • 16 Oct 2023 • Alon Jacovi, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva
A growing area of research investigates augmenting language models with tools (e. g., search engines, calculators) to overcome their shortcomings (e. g., missing or incorrect knowledge, incorrect logical inferences).
1 code implementation • 13 Oct 2023 • Aviv Slobodkin, Avi Caciularu, Eran Hirsch, Ido Dagan
Further, we substantially improve the silver training data quality via GPT-4 distillation.
no code implementations • 28 Jun 2023 • Oren Barkan, Avi Caciularu, Idan Rejwan, Ori Katz, Jonathan Weill, Itzik Malkiel, Noam Koenigstein
We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce.
1 code implementation • 24 May 2023 • Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan
In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection.
1 code implementation • 24 May 2023 • Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan
The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks.
1 code implementation • 17 May 2023 • Alon Jacovi, Avi Caciularu, Omer Goldman, Yoav Goldberg
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
1 code implementation • 23 Oct 2022 • Alon Eirew, Avi Caciularu, Ido Dagan
The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents.
Cross Document Coreference Resolution Open-Domain Question Answering +2
no code implementations • 13 Aug 2022 • Itzik Malkiel, Dvir Ginzburg, Oren Barkan, Avi Caciularu, Jonathan Weill, Noam Koenigstein
Recently, there has been growing interest in the ability of Transformer-based models to produce meaningful embeddings of text with several applications, such as text similarity.
no code implementations • 13 Aug 2022 • Itzik Malkiel, Dvir Ginzburg, Oren Barkan, Avi Caciularu, Yoni Weill, Noam Koenigstein
We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task.
1 code implementation • 23 May 2022 • Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan
Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements.
1 code implementation • 26 Apr 2022 • Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav Goldberg
The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions.
no code implementations • 23 Apr 2022 • Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri Armstrong, Noam Koenigstein
Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks.
1 code implementation • 28 Mar 2022 • Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav Goldberg
Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood.
2 code implementations • NAACL 2022 • Avi Caciularu, Ido Dagan, Jacob Goldberger, Arman Cohan
Long-context question answering (QA) tasks require reasoning over a long document or multiple documents.
2 code implementations • NAACL 2022 • Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
no code implementations • 12 Dec 2021 • Oren Barkan, Roy Hirsch, Ori Katz, Avi Caciularu, Jonathan Weill, Noam Koenigstein
Next, we propose a novel hybrid recommendation algorithm that bridges these two conflicting objectives and enables a harmonized balance between preserving high accuracy for warm items while effectively promoting completely cold items.
1 code implementation • EMNLP (ACL) 2021 • Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan
We introduce iFacetSum, a web application for exploring topical document sets.
no code implementations • 2 Sep 2021 • Oren Barkan, Omri Armstrong, Amir Hertz, Avi Caciularu, Ori Katz, Itzik Malkiel, Noam Koenigstein
The algorithmic advantages of GAM are explained in detail, and validated empirically, where it is shown that GAM outperforms its alternatives across various tasks and datasets.
no code implementations • Joint Conference on Lexical and Computational Semantics 2021 • Avi Caciularu, Ido Dagan, Jacob Goldberger
We introduce a new approach for smoothing and improving the quality of word embeddings.
1 code implementation • Findings (ACL) 2021 • Dvir Ginzburg, Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Koenigstein
Hence, we introduce SDR, a self-supervised method for document similarity that can be applied to documents of arbitrary length.
no code implementations • RANLP 2021 • Idan Rejwan, Avi Caciularu
We also show that adding information to the sentence, such as case markers and noun-verb distinction, reduces the need for fixed word order, in accordance with the typological findings.
2 code implementations • Findings (EMNLP) 2021 • Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan
We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective.
Ranked #1 on Citation Recommendation on AAN test
no code implementations • 1 Jan 2021 • Avi Caciularu, Jacob Goldberger
In this study we propose a deep clustering algorithm that utilizes variational auto encoder (VAE) framework with a multi encoder-decoder neural architecture.
no code implementations • 1 Jan 2021 • Nir Raviv, Avi Caciularu, Tomer Raviv, Jacob Goldberger, Yair Be'ery
Error correction codes are an integral part of communication applications and boost the reliability of transmission.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Razin, Ori Katz, Noam Koenigstein
In addition, we introduce a new language understanding task for wine recommendations using similarities based on professional wine reviews.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan
We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution.
no code implementations • ACL 2020 • Oren Barkan, Idan Rejwan, Avi Caciularu, Noam Koenigstein
BHWR facilitates Variational Bayes word representation learning combined with semantic taxonomy modeling via hierarchical priors.
no code implementations • 15 Feb 2020 • Oren Barkan, Avi Caciularu, Ori Katz, Noam Koenigstein
However, it is possible that a certain early movie may become suddenly more relevant in the presence of a popular sequel movie.
no code implementations • 6 Feb 2020 • Nir Raviv, Avi Caciularu, Tomer Raviv, Jacob Goldberger, Yair Be'ery
Error correction codes are an integral part of communication applications, boosting the reliability of transmission.
1 code implementation • 14 Aug 2019 • Oren Barkan, Noam Razin, Itzik Malkiel, Ori Katz, Avi Caciularu, Noam Koenigstein
In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks.
no code implementations • 21 May 2019 • Avi Caciularu, David Burshtein
We first consider the reconstruction of uncoded data symbols transmitted over a noisy linear intersymbol interference (ISI) channel, with an unknown impulse response, without using pilot symbols.
no code implementations • 5 Mar 2018 • Avi Caciularu, David Burshtein
A new maximum likelihood estimation approach for blind channel equalization, using variational autoencoders (VAEs), is introduced.
1 code implementation • 28 Oct 2017 • Mor Cohen, Avi Caciularu, Idan Rejwan, Jonathan Berant
Grammar induction is the task of learning a grammar from a set of examples.