Search Results for author: Tanaya Guha

Found 24 papers, 14 papers with code

CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification

1 code implementation14 Mar 2024 Yiming Ma, Victor Sanchez, Tanaya Guha

The CLIP (Contrastive Language-Image Pretraining) model has exhibited outstanding performance in recognition problems, such as zero-shot image classification and object detection.

Crowd Counting

Explainable Depression Detection via Head Motion Patterns

no code implementations23 Jul 2023 Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke

While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker.

Binary Classification Depression Detection

Heterogeneous Graph Learning for Acoustic Event Classification

1 code implementation5 Mar 2023 Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha

Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities.

Classification graph construction +1

Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics

no code implementations20 Feb 2023 Surbhi Madan, Monika Gahalawat, Tanaya Guha, Roland Goecke, Ramanathan Subramanian

We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits.

Visually-aware Acoustic Event Detection using Heterogeneous Graphs

1 code implementation16 Jul 2022 Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha

In contrast, we employ heterogeneous graphs to explicitly capture the spatial and temporal relationships between the modalities and represent detailed information about the underlying signal.

Event Detection

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

2 code implementations15 Jul 2022 Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.

Audio-Visual Active Speaker Detection Graph Learning +1

FusionCount: Efficient Crowd Counting via Multiscale Feature Fusion

1 code implementation28 Feb 2022 Yiming Ma, Victor Sanchez, Tanaya Guha

Then, to account for perspective distortion, the highest-level feature map is fed to extra components to extract multiscale features, which are the input to the decoder to generate crowd densities.

Crowd Counting

Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics

no code implementations15 Dec 2021 Surbhi Madan, Monika Gahalawat, Tanaya Guha, Ramanathan Subramanian

We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits.

Learning Spatial-Temporal Graphs for Active Speaker Detection

no code implementations2 Dec 2021 Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.

Audio-Visual Active Speaker Detection Node Classification

Multi-Camera Trajectory Forecasting with Trajectory Tensors

1 code implementation10 Aug 2021 Olly Styles, Tanaya Guha, Victor Sanchez

We introduce the problem of multi-camera trajectory forecasting (MCTF), which involves predicting the trajectory of a moving object across a network of cameras.

Trajectory Forecasting

In Defense of Scene Graphs for Image Captioning

1 code implementation ICCV 2021 Kien Nguyen, Subarna Tripathi, Bang Du, Tanaya Guha, Truong Q. Nguyen

Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions.

Human-Object Interaction Detection Image Captioning

Dynamic Character Graph via Online Face Clustering for Movie Analysis

1 code implementation29 Jul 2020 Prakhar Kulshreshtha, Tanaya Guha

An effective approach to automated movie content analysis involves building a network (graph) of its characters.

Clustering Face Clustering +1

Ensemble Network for Ranking Images Based on Visual Appeal

1 code implementation6 Jun 2020 Sachin Singh, Victor Sanchez, Tanaya Guha

The ranking is expected to correspond with human perception of overall appeal of the images.

Multi-Camera Trajectory Forecasting: Pedestrian Trajectory Prediction in a Network of Cameras

1 code implementation1 May 2020 Olly Styles, Tanaya Guha, Victor Sanchez, Alex Kot

To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras.

Pedestrian Trajectory Prediction Trajectory Forecasting

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments

1 code implementation26 Sep 2019 Olly Styles, Tanaya Guha, Victor Sanchez

In contrast to existing works on object trajectory forecasting which primarily consider the problem from a birds-eye perspective, we formulate the problem from an object-level perspective and call for the prediction of full object bounding boxes, rather than trajectories alone.

Multiple Object Forecasting Object +1

Learning Affective Correspondence between Music and Image

no code implementations30 Mar 2019 Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

We introduce the problem of learning affective correspondence between audio (music) and visual data (images).

Binary Classification Emotion Recognition

AN ONLINE ALGORITHM FOR CONSTRAINED FACE CLUSTERING IN VIDEOS

2 code implementations International Conference on Image Processing (ICIP) 2018 Prakhar Kulshreshtha, Tanaya Guha

We address the problem of face clustering in long, real world videos. This is a challenging task because faces in such videos exhibit wid evariability in scale, pose, illumination, expressions, and may also be partially occluded.

Clustering Face Clustering +1

Learning Spontaneity to Improve Emotion Recognition In Speech

no code implementations12 Dec 2017 Karttikeya Mangalam, Tanaya Guha

We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.

Speech Emotion Recognition

Sparse Representation-based Image Quality Assessment

no code implementations12 Jun 2013 Tanaya Guha, Ehsan Nezhadarya, Rabab K. Ward

This sparse strategy is employed because it is known to generate basis vectors that are qualitatively similar to the receptive field of the simple cells present in the mammalian primary visual cortex.

Image Quality Assessment

Image Similarity Using Sparse Representation and Compression Distance

no code implementations12 Jun 2012 Tanaya Guha, Rabab K. Ward

This paper proposes a sparse representation-based approach to encode the information content of an image using information from the other image, and uses the compactness (sparsity) of the representation as a measure of its compressibility (how much can the image be compressed) with respect to the other image.

Clustering Image Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.