Search Results for author: Ali Vosoughi

Found 8 papers, 4 papers with code

OSCaR: Object State Captioning and State Change Representation

1 code implementation27 Feb 2024 Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu

To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.

Change Detection Object

Learning Audio Concepts from Counterfactual Natural Language

1 code implementation10 Jan 2024 Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text.

Audio captioning Audio Classification +2

Video Understanding with Large Language Models: A Survey

1 code implementation29 Dec 2023 Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

no code implementations18 Oct 2023 Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.

MISAR: A Multimodal Instructional System with Augmented Reality

1 code implementation18 Oct 2023 Jing Bi, Nguyen Manh Nguyen, Ali Vosoughi, Chenliang Xu

Augmented reality (AR) requires the seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations31 May 2023 Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

Cross Modal Global Local Representation Learning from Radiology Reports and X-Ray Chest Images

no code implementations26 Jan 2023 Nathan Hadjiyski, Ali Vosoughi, Axel Wismueller

Extensive experiments confirm consistent results for classifying lung pathologies using the multimodal global local representations of language and vision information.

Representation Learning

Re-defining Radiology Quality Assurance (QA) -- Artificial Intelligence (AI)-Based QA by Restricted Investigation of Unequal Scores (AQUARIUS)

no code implementations2 May 2022 Axel Wismueller, Larry Stockmaster, Ali Vosoughi

Using AQUARIUS with NLP on final radiology reports and targeted expert neuroradiology review of only 29 discordantly classified cases reduced the human QA effort by 98. 5%, where we found a total of six non-reported true ICH+ cases, with radiologists' missed ICH detection rates of 0. 52% and 2. 5% for flagged and non-flagged cases, respectively.

Cannot find the paper you are looking for? You can Submit a new open access paper.