Search Results for author: Tanaya Guha

Found 24 papers, 14 papers with code

CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification

1 code implementation • 14 Mar 2024 • Yiming Ma, Victor Sanchez, Tanaya Guha

The CLIP (Contrastive Language-Image Pretraining) model has exhibited outstanding performance in recognition problems, such as zero-shot image classification and object detection.

Ranked #1 on Crowd Counting on UCF-QNRF

Crowd Counting

Paper
Code

Explainable Depression Detection via Head Motion Patterns

no code implementations • 23 Jul 2023 • Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke

While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker.

Binary Classification Depression Detection

Paper
Add Code

Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

1 code implementation • 13 Apr 2023 • Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha

Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles.

Contrastive Learning

Paper
Code

Heterogeneous Graph Learning for Acoustic Event Classification

1 code implementation • 5 Mar 2023 • Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha

Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities.

Classification graph construction +1

Paper
Code

Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics

no code implementations • 20 Feb 2023 • Surbhi Madan, Monika Gahalawat, Tanaya Guha, Roland Goecke, Ramanathan Subramanian

We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits.

Paper
Add Code

Real-Time Driver Monitoring Systems through Modality and View Analysis

no code implementations • 17 Oct 2022 • Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha

Driver distractions are known to be the dominant cause of road accidents.

Paper
Add Code

Visually-aware Acoustic Event Detection using Heterogeneous Graphs

1 code implementation • 16 Jul 2022 • Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha

In contrast, we employ heterogeneous graphs to explicitly capture the spatial and temporal relationships between the modalities and represent detailed information about the underlying signal.

Event Detection

Paper
Code

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

2 code implementations • 15 Jul 2022 • Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.

Ranked #1 on Node Classification on AVA

Audio-Visual Active Speaker Detection Graph Learning +1

Paper
Code

FusionCount: Efficient Crowd Counting via Multiscale Feature Fusion

1 code implementation • 28 Feb 2022 • Yiming Ma, Victor Sanchez, Tanaya Guha

Then, to account for perspective distortion, the highest-level feature map is fed to extra components to extract multiscale features, which are the input to the decoder to generate crowd densities.

Ranked #8 on Crowd Counting on ShanghaiTech B

Crowd Counting

Paper
Code

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

1 code implementation • 31 Jan 2022 • Amir Shirian, Krishna Somandepalli, Tanaya Guha

Large scale databases with high-quality manual annotations are scarce in audio domain.

Event Detection graph construction +2

Paper
Code

Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics

no code implementations • 15 Dec 2021 • Surbhi Madan, Monika Gahalawat, Tanaya Guha, Ramanathan Subramanian

We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits.

Paper
Add Code

Learning Spatial-Temporal Graphs for Active Speaker Detection

no code implementations • 2 Dec 2021 • Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.

Audio-Visual Active Speaker Detection Node Classification

Paper
Add Code

Multi-Camera Trajectory Forecasting with Trajectory Tensors

1 code implementation • 10 Aug 2021 • Olly Styles, Tanaya Guha, Victor Sanchez

We introduce the problem of multi-camera trajectory forecasting (MCTF), which involves predicting the trajectory of a moving object across a network of cameras.

Trajectory Forecasting

Paper
Code

In Defense of Scene Graphs for Image Captioning

1 code implementation • ICCV 2021 • Kien Nguyen, Subarna Tripathi, Bang Du, Tanaya Guha, Truong Q. Nguyen

Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions.

Human-Object Interaction Detection Image Captioning

Paper
Code

Dynamic Character Graph via Online Face Clustering for Movie Analysis

1 code implementation • 29 Jul 2020 • Prakhar Kulshreshtha, Tanaya Guha

An effective approach to automated movie content analysis involves building a network (graph) of its characters.

Clustering Face Clustering +1

Paper
Code

Ensemble Network for Ranking Images Based on Visual Appeal

1 code implementation • 6 Jun 2020 • Sachin Singh, Victor Sanchez, Tanaya Guha

The ranking is expected to correspond with human perception of overall appeal of the images.

Paper
Code

Multi-Camera Trajectory Forecasting: Pedestrian Trajectory Prediction in a Network of Cameras

1 code implementation • 1 May 2020 • Olly Styles, Tanaya Guha, Victor Sanchez, Alex Kot

To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras.

Pedestrian Trajectory Prediction Trajectory Forecasting

Paper
Code

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

no code implementations • 19 Oct 2019 • Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, Gaurav Sharma

We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos.

Ranked #5 on GZSL Video Classification on VGGSound-GZSL(main)

General Classification GZSL Video Classification +1

Paper
Add Code

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments

1 code implementation • 26 Sep 2019 • Olly Styles, Tanaya Guha, Victor Sanchez

In contrast to existing works on object trajectory forecasting which primarily consider the problem from a birds-eye perspective, we formulate the problem from an object-level perspective and call for the prediction of full object bounding boxes, rather than trajectories alone.

Ranked #1 on Multiple Object Forecasting on Citywalks

Multiple Object Forecasting Object +1

Paper
Code

Learning Affective Correspondence between Music and Image

no code implementations • 30 Mar 2019 • Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

We introduce the problem of learning affective correspondence between audio (music) and visual data (images).

Binary Classification Emotion Recognition

Paper
Add Code

AN ONLINE ALGORITHM FOR CONSTRAINED FACE CLUSTERING IN VIDEOS

2 code implementations • International Conference on Image Processing (ICIP) 2018 • Prakhar Kulshreshtha, Tanaya Guha

We address the problem of face clustering in long, real world videos. This is a challenging task because faces in such videos exhibit wid evariability in scale, pose, illumination, expressions, and may also be partially occluded.

Clustering Face Clustering +1

Paper
Code

Learning Spontaneity to Improve Emotion Recognition In Speech

no code implementations • 12 Dec 2017 • Karttikeya Mangalam, Tanaya Guha

We investigate the effect and usefulness of spontaneity (i. e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition.

Speech Emotion Recognition

Paper
Add Code

Sparse Representation-based Image Quality Assessment

no code implementations • 12 Jun 2013 • Tanaya Guha, Ehsan Nezhadarya, Rabab K. Ward

This sparse strategy is employed because it is known to generate basis vectors that are qualitatively similar to the receptive field of the simple cells present in the mammalian primary visual cortex.

Image Quality Assessment

Paper
Add Code

Image Similarity Using Sparse Representation and Compression Distance

no code implementations • 12 Jun 2012 • Tanaya Guha, Rabab K. Ward

This paper proposes a sparse representation-based approach to encode the information content of an image using information from the other image, and uses the compactness (sparsity) of the representation as a measure of its compressibility (how much can the image be compressed) with respect to the other image.

Clustering Image Clustering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.