no code implementations • 29 Mar 2024 • Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim
On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size.
2 code implementations • NeurIPS 2023 • Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao
Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks.
no code implementations • 27 Mar 2023 • Md Kamrul Hasan, Md Saiful Islam, Sangwu Lee, Wasifur Rahman, Iftekhar Naim, Mohammed Ibrahim Khan, Ehsan Hoque
Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance.
no code implementations • 21 Dec 2022 • Kazuma Hashimoto, Iftekhar Naim, Karthik Raman
Sequence labeling is a core task in text understanding for IE/IR systems.
no code implementations • 2 Nov 2022 • Yujie Qian, Jinhyuk Lee, Sai Meher Karthik Duddu, Zhuyun Dai, Siddhartha Brahma, Iftekhar Naim, Tao Lei, Vincent Y. Zhao
With sparsified unary saliences, we are able to prune a large number of query and document token vectors and improve the efficiency of multi-vector retrieval.
no code implementations • 16 Mar 2022 • Karthik Raman, Iftekhar Naim, Jiecao Chen, Kazuma Hashimoto, Kiran Yalasangi, Krishna Srinivasan
Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks.
no code implementations • CL 2018 • Iftekhar Naim, Parker Riley, Daniel Gildea
The existing decipherment models, however, are not well suited for exploiting these orthographic similarities.
no code implementations • 10 Aug 2015 • Iftekhar Naim, Daniel Gildea
Our results show that the proposed log-linear model with contrastive divergence scales to large vocabularies and outperforms the existing generative decipherment models by exploiting the orthographic features.
1 code implementation • 14 Apr 2015 • Iftekhar Naim, M. Iftekhar Tanveer, Daniel Gildea, Mohammed, Hoque
We present a computational framework for automatically quantifying verbal and nonverbal behaviors in the context of job interviews.