no code implementations • 11 Apr 2024 • Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro
Existing datasets for audio understanding primarily focus on single-turn interactions (i. e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue.
no code implementations • 2 Feb 2024 • Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.
1 code implementation • 9 Nov 2023 • Georgios Tziafas, Yucheng Xu, Arushi Goel, Mohammadreza Kasaei, Zhibin Li, Hamidreza Kasaei
To address these limitations, we develop a challenging benchmark based on cluttered indoor scenes from OCID dataset, for which we generate referring expressions and connect them with 4-DoF grasp poses.
1 code implementation • 20 Oct 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.
1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari
Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.
no code implementations • 9 Mar 2023 • Yucheng Xu, Li Nanbo, Arushi Goel, Zijian Guo, Zonghai Yao, Hamidreza Kasaei, Mohammadreze Kasaei, Zhibin Li
Videos depict the change of complex dynamical systems over time in the form of discrete image sequences.
no code implementations • ICCV 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.
no code implementations • 24 Aug 2022 • Doris Antensteiner, Silvia Bucci, Arushi Goel, Marah Halawa, Niveditha Kalavakonda, Tejaswi Kasarla, Miaomiao Liu, Nermin Samet, Ivaxi Sheth
In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2022, organized alongside the hybrid CVPR 2022 in New Orleans, Louisiana.
no code implementations • 11 Mar 2022 • Arushi Goel, Niveditha Kalavakonda, Nour Karessli, Tejaswi Kasarla, Kathryn Leonard, Boyi Li, Nermin Samet and, Ghada Zamzmi
In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2021, organized alongside the virtual CVPR 2021.
no code implementations • 26 Jan 2022 • Arushi Goel, Yunlong Jiao, Jordan Massiah
In this paper, we propose PARS: Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the best from all three worlds in a joint-training framework to achieve robustness to noisy labels.
no code implementations • CVPR 2022 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.
no code implementations • 22 Nov 2019 • Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen
Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them.
1 code implementation • 12 Oct 2019 • Yijie Xu, Arushi Goel
In particular, the lack of sufficient amounts of domain-specific data can reduce the accuracy of a classifier.
no code implementations • CVPR 2019 • Arushi Goel, Keng Teck Ma, Cheston Tan
Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more in-depth understanding of the relationships and attributes of the people involved.
1 code implementation • 12 Dec 2018 • Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong
People naturally understand the emotions of-and often also empathize with-those around them.