1 code implementation • 14 Mar 2024 • Yiming Ma, Victor Sanchez, Tanaya Guha
The CLIP (Contrastive Language-Image Pretraining) model has exhibited outstanding performance in recognition problems, such as zero-shot image classification and object detection.
Ranked #1 on Crowd Counting on UCF-QNRF
no code implementations • 23 Jul 2023 • Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker.
1 code implementation • 13 Apr 2023 • Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha
Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles.
1 code implementation • 5 Mar 2023 • Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha
Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities.
no code implementations • 20 Feb 2023 • Surbhi Madan, Monika Gahalawat, Tanaya Guha, Roland Goecke, Ramanathan Subramanian
We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits.
no code implementations • 17 Oct 2022 • Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha
Driver distractions are known to be the dominant cause of road accidents.
1 code implementation • 16 Jul 2022 • Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha
In contrast, we employ heterogeneous graphs to explicitly capture the spatial and temporal relationships between the modalities and represent detailed information about the underlying signal.
2 code implementations • 15 Jul 2022 • Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar
Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.
Ranked #1 on Node Classification on AVA
1 code implementation • 28 Feb 2022 • Yiming Ma, Victor Sanchez, Tanaya Guha
Then, to account for perspective distortion, the highest-level feature map is fed to extra components to extract multiscale features, which are the input to the decoder to generate crowd densities.
Ranked #8 on Crowd Counting on ShanghaiTech B
1 code implementation • 31 Jan 2022 • Amir Shirian, Krishna Somandepalli, Tanaya Guha
Large scale databases with high-quality manual annotations are scarce in audio domain.
no code implementations • 15 Dec 2021 • Surbhi Madan, Monika Gahalawat, Tanaya Guha, Ramanathan Subramanian
We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits.
no code implementations • 2 Dec 2021 • Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar
We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.
1 code implementation • 10 Aug 2021 • Olly Styles, Tanaya Guha, Victor Sanchez
We introduce the problem of multi-camera trajectory forecasting (MCTF), which involves predicting the trajectory of a moving object across a network of cameras.
1 code implementation • ICCV 2021 • Kien Nguyen, Subarna Tripathi, Bang Du, Tanaya Guha, Truong Q. Nguyen
Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions.
1 code implementation • 29 Jul 2020 • Prakhar Kulshreshtha, Tanaya Guha
An effective approach to automated movie content analysis involves building a network (graph) of its characters.
1 code implementation • 6 Jun 2020 • Sachin Singh, Victor Sanchez, Tanaya Guha
The ranking is expected to correspond with human perception of overall appeal of the images.
1 code implementation • 1 May 2020 • Olly Styles, Tanaya Guha, Victor Sanchez, Alex Kot
To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras.
no code implementations • 19 Oct 2019 • Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, Gaurav Sharma
We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos.
Ranked #5 on GZSL Video Classification on VGGSound-GZSL(main)
1 code implementation • 26 Sep 2019 • Olly Styles, Tanaya Guha, Victor Sanchez
In contrast to existing works on object trajectory forecasting which primarily consider the problem from a birds-eye perspective, we formulate the problem from an object-level perspective and call for the prediction of full object bounding boxes, rather than trajectories alone.
Ranked #1 on Multiple Object Forecasting on Citywalks
no code implementations • 30 Mar 2019 • Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha
We introduce the problem of learning affective correspondence between audio (music) and visual data (images).
2 code implementations • International Conference on Image Processing (ICIP) 2018 • Prakhar Kulshreshtha, Tanaya Guha
We address the problem of face clustering in long, real world videos. This is a challenging task because faces in such videos exhibit wid evariability in scale, pose, illumination, expressions, and may also be partially occluded.
no code implementations • 12 Dec 2017 • Karttikeya Mangalam, Tanaya Guha
We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.
no code implementations • 12 Jun 2013 • Tanaya Guha, Ehsan Nezhadarya, Rabab K. Ward
This sparse strategy is employed because it is known to generate basis vectors that are qualitatively similar to the receptive field of the simple cells present in the mammalian primary visual cortex.
no code implementations • 12 Jun 2012 • Tanaya Guha, Rabab K. Ward
This paper proposes a sparse representation-based approach to encode the information content of an image using information from the other image, and uses the compactness (sparsity) of the representation as a measure of its compressibility (how much can the image be compressed) with respect to the other image.