no code implementations • 23 Jan 2024 • Apoorva Beedu, Karan Samel, Irfan Essa
Compared to existing methods, MAT has the advantage of learning additional environmental context from two kinds of text inputs: action descriptions during the pre-training stage, and the text inputs for detected objects and actions during modality feature fusion.
1 code implementation • Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '23) 2023 • Karan Samel, Cheng Li, Weize Kong, Tao Chen, Mingyang Zhang, Shaleen Gupta, Swaraj Khadanga, Wensong Xu, Xingyu Wang, Kashyap Kolipaka, Mike Bendersky, Marc Najork
These inferred weights and terms can be used directly by a retrieval system to perform a query search.
Ranked #1 on Passage Retrieval on MS MARCO
no code implementations • 11 Feb 2022 • Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song
Events across a timeline are a common data representation, seen in different temporal modalities.
no code implementations • NeurIPS 2021 • Jiani Huang, Ziyang Li, Binghong Chen, Karan Samel, Mayur Naik, Le Song, Xujie Si
Deep learning and symbolic reasoning are complementary techniques for an intelligent system.
1 code implementation • NeurIPS 2021 • Zelin Zhao, Karan Samel, Binghong Chen, Le Song
Furthermore, we propose the Program-guided Transformer (ProTo), which integrates both semantic and structural guidance of a program by leveraging cross-attention and masked self-attention to pass messages between the specification and routines in the program.
Ranked #1 on Visual Question Answering (VQA) on GQA test-std
no code implementations • 29 Sep 2021 • Karan Samel, Zelin Zhao, Binghong Chen, Shuang Li, Dharmashankar Subramanian, Irfan Essa, Le Song
Events across a timeline are a common data representation, seen in different temporal modalities.
no code implementations • 22 Mar 2021 • Karan Samel, Zelin Zhao, Binghong Chen, Kuan Wang, Robin Luo, Le Song
In multi-modal reasoning tasks, such as visual question answering (VQA), there have been many modeling and training paradigms tested.
no code implementations • 1 Jan 2021 • Karan Samel, Zelin Zhao, Kuan Wang, Robin Luo, Binghong Chen, Le Song
We present a differentiable end-to-end program executor (DePe), which addresses Visual Question Answering (VQA) in a sample and computationally efficient manner.