1 code implementation • 2 Oct 2023 • Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor
In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems.
no code implementations • CVPR 2023 • Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu
In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others.
no code implementations • 8 Jul 2020 • Abhinav Shukla, Stavros Petridis, Maja Pantic
This enriches the audio encoder with visual information and the encoder can be used for evaluation without the visual modality.
no code implementations • 4 May 2020 • Abhinav Shukla, Stavros Petridis, Maja Pantic
Our results demonstrate the potential of visual self-supervision for audio feature learning and suggest that joint visual and audio self-supervision leads to more informative audio representations for speech and emotion recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 13 Jan 2020 • Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
Self supervised representation learning has recently attracted a lot of research interest for both the audio and visual modalities.
Ranked #8 on Speech Emotion Recognition on CREMA-D
no code implementations • 3 Apr 2019 • Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Mohan Kankanhalli, Stefan Winkler, Ramanathan Subramanian
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience.
no code implementations • 14 Aug 2018 • Abhinav Shukla, Harish Katti, Mohan Kankanhalli, Ramanathan Subramanian
Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.