2 code implementations • 15 Jul 2022 • Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar
Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.
Ranked #1 on Node Classification on AVA
no code implementations • 2 Dec 2021 • Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar
We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.
no code implementations • 6 Aug 2018 • Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury
This necessitates learning of visual features from videos in an unsupervised setting.
1 code implementation • ECCV 2018 • Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement.
Ranked #1 on Action Classification on ActivityNet-1.2
no code implementations • CVPR 2018 • Sourya Roy, Sujoy Paul, Neal E. Young, Amit K. Roy-Chowdhury
Minimization of labeling effort for person re-identification in camera networks is an important problem as most of the existing popular methods are supervised and they require large amount of manual annotations, acquiring which is a tedious job.
no code implementations • 3 May 2016 • Avisek Lahiri, Sourya Roy, Anirban Santara, Pabitra Mitra, Prabir Kumar Biswas
Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets.
no code implementations • 17 Apr 2016 • Sourya Roy, Pabitra Mitra
In this paper we propose a Kalman filter aided saliency detection model which is based on the conjecture that salient regions are considerably different from our "visual expectation" or they are "visually surprising" in nature.