1 code implementation • 25 Apr 2024 • Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks
To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image.
no code implementations • 17 Dec 2023 • Xinghao Zhu, Devesh K. Jha, Diego Romeres, Lingfeng Sun, Masayoshi Tomizuka, Anoop Cherian
Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling.
no code implementations • ICCV 2023 • Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks
To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation.
no code implementations • 25 Sep 2023 • Zachariah Carmichael, Suhas Lohit, Anoop Cherian, Michael Jones, Walter Scheirer
Prototypical part neural networks (ProtoPartNNs), namely PROTOPNET and its derivatives, are an intrinsically interpretable approach to machine learning.
no code implementations • 6 Jun 2023 • Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian
Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.
1 code implementation • CVPR 2023 • Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, Rama Chellappa
In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels.
1 code implementation • CVPR 2023 • Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould
In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.
1 code implementation • CVPR 2023 • Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum
To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group.
no code implementations • 29 Oct 2022 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio.
no code implementations • 22 Oct 2022 • Kei Ota, Hsiao-Yu Tung, Kevin A. Smith, Anoop Cherian, Tim K. Marks, Alan Sullivan, Asako Kanezaki, Joshua B. Tenenbaum
The world is filled with articulated objects that are difficult to determine how to use from vision alone, e. g., a door might open inwards or outwards.
no code implementations • 14 Oct 2022 • Sudipta Paul, Amit K. Roy-Chowdhury, Anoop Cherian
Similar to audio-visual navigation tasks, the goal of our embodied agent is to localize an audio event via navigating the 3D visual world; however, the agent may also seek help from a human (oracle), where the assistance is provided in free-form natural language.
no code implementations • 18 Feb 2022 • Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux
Spatio-temporal scene-graph approaches to video-based reasoning tasks, such as video question-answering (QA), typically construct such graphs for every video frame.
Ranked #23 on Video Question Answering on NExT-QA
1 code implementation • 21 Dec 2021 • Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian
Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.
Ranked #108 on Self-Supervised Image Classification on ImageNet
no code implementations • 1 Nov 2021 • Safa C. Medin, Bernhard Egger, Anoop Cherian, Ye Wang, Joshua B. Tenenbaum, Xiaoming Liu, Tim K. Marks
Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis.
no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).
no code implementations • ICCV 2021 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
no code implementations • ICCV 2021 • Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian
At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention.
no code implementations • ICCV 2021 • Anoop Cherian, Goncalo Dias Pais, Siddarth Jain, Tim K. Marks, Alan Sullivan
To use our model for instance segmentation, we propose an instance pose encoder that learns to take in a generated depth image and reproduce the pose code vectors for all of the object instances.
no code implementations • 24 Jun 2021 • Anoop Cherian, Jue Wang
One-class learning is the classic problem of fitting a model to the data for which annotations are available only for a single class.
no code implementations • 13 Apr 2021 • Anoop Cherian, Panagiotis Stanitsas, Jue Wang, Mehrtash Harandi, Vassilios Morellas, Nikolaos Papanikolopoulos
There exist several similarity measures for comparing SPD matrices with documented benefits.
no code implementations • 1 Jan 2021 • Mouhacine Benosman, Orlando Romero, Anoop Cherian
In this paper, we investigate in the context of deep neural networks, the performance of several discretization algorithms for two first-order finite-time optimization flows.
no code implementations • 1 Jan 2021 • Moitreya Chatterjee, Anoop Cherian, Narendra Ahuja
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
1 code implementation • 28 Dec 2020 • Piotr Koniusz, Lei Wang, Anoop Cherian
In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition.
Ranked #2 on Skeleton Based Action Recognition on UT-Kinect
Action Recognition In Videos Skeleton Based Action Recognition
no code implementations • 6 Oct 2020 • Siqi Zhang, Mouhacine Benosman, Orlando Romero, Anoop Cherian
In this paper, we investigate the performance of two first-order optimization algorithms, obtained from forward Euler discretization of finite-time optimization flows.
no code implementations • ECCV 2020 • Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja
To tackle this problem, we present Sound2Sight, a deep variational framework, that is trained to learn a per frame stochastic prior conditioned on a joint embedding of audio and past frames.
no code implementations • ICML 2020 • Anoop Cherian, Shuchin Aeron
To maximize extraction of such informative cues from the data, we set the problem within the context of contrastive representation learning and to that end propose a novel objective via optimal transport.
no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.
no code implementations • 15 Jun 2020 • Suryansh Kumar, Luc van Gool, Carlos E. P. de Oliveira, Anoop Cherian, Yuchao Dai, Hongdong Li
Assuming that a deforming shape is composed of a union of local linear subspace and, span a global low-rank space over multiple frames enables us to efficiently model complex non-rigid deformations.
no code implementations • 28 Apr 2020 • Rodrigo Santa Cruz, Anoop Cherian, Basura Fernando, Dylan Campbell, Stephen Gould
This paper presents a framework to recognize temporal compositions of atomic actions in videos.
1 code implementation • CVPR 2020 • Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.
Ranked #1 on Face Alignment on Menpo
no code implementations • 17 Jan 2020 • Anoop Cherian, Jue Wang, Chiori Hori, Tim K. Marks
To this end, we propose a Spatio-Temporal and Temporo-Spatial (STaTS) attention model which, conditioned on the language state, hierarchically combines spatial and temporal attention to videos in two different orders: (i) a spatio-temporal (ST) sub-model, which first attends to regions that have temporal evolution, then temporally pools the features from these regions; and (ii) a temporo-spatial (TS) sub-model, which first decides a single frame to attend to, then applies spatial attention within that frame.
no code implementations • 14 Nov 2019 • Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta
This paper introduces the Eighth Dialog System Technology Challenge.
no code implementations • 5 Sep 2019 • Jue Wang, Anoop Cherian
With the features from the video as a positive bag and the irrelevant features as the negative bag, we cast an objective to learn a (nonlinear) hyperplane that separates the unknown useful features from the rest in a multiple instance learning formulation within a support vector machine setup.
no code implementations • ICCV 2019 • Jue Wang, Anoop Cherian
One-class learning is the classic problem of fitting a model to data for which annotations are available only for a single class.
no code implementations • 15 May 2019 • Arvind U. Raghunathan, Anoop Cherian, Devesh K. Jha
To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves: (i) as a merit function, vanishing only at the first-order stationary points of each player's optimization problem, and (ii) provides error bounds to a stationary Nash point.
2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh
We introduce the task of scene-aware dialog.
no code implementations • ECCV 2018 • Jue Wang, Anoop Cherian
As the perturbed features belong to data classes that are likely to be confused with the original features, the discriminative subspace will characterize parts of the feature space that are more representative of the original data, and thus may provide robust video representations.
no code implementations • ECCV 2018 • Jue Wang, Anoop Cherian
In this paper, we propose to use such perturbations within a novel contrastive learning setup to build negative samples, which are then used to produce improved video representations.
Ranked #42 on Action Recognition on HMDB-51
1 code implementation • 12 Jul 2018 • Anoop Cherian, Alan Sullivan
To this end, we present a semantically-consistent GAN framework, dubbed Sem-GAN, in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm.
2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh
We introduce a new dataset of dialogs about videos of human behaviors.
4 code implementations • 1 Jun 2018 • Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori
Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.
no code implementations • CVPR 2018 • Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley
As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.
no code implementations • CVPR 2018 • Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould
In an attempt to tackle this problem, we propose discriminative pooling, based on the notion that among the deep features generated on all short clips, there is at least one that characterizes the action.
no code implementations • CVPR 2018 • Suryansh Kumar, Anoop Cherian, Yuchao Dai, Hongdong Li
To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold.
no code implementations • 26 Jan 2018 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts.
no code implementations • ICCV 2017 • Anoop Cherian, Panagiotis Stanitsas, Mehrtash Harandi, Vassilios Morellas, Nikolaos Papanikolopoulos
Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data.
no code implementations • 19 Sep 2017 • Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould
For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.
no code implementations • 5 Aug 2017 • Anoop Cherian, Panagiotis Stanitsas, Mehrtash Harandi, Vassilios Morellas, Nikolaos Papanikolopoulos
Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data.
no code implementations • 24 Jul 2017 • Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould
Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.
no code implementations • 24 May 2017 • Anoop Cherian, Suvrit Sra, Richard Hartley
As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in an RKHS, projections of data onto which captures their temporal order.
no code implementations • 23 Apr 2017 • Anoop Cherian, Stephen Gould
We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space.
no code implementations • CVPR 2017 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet, an end-to-end CNN model for this task.
no code implementations • CVPR 2017 • Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould
Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity.
no code implementations • 6 Apr 2017 • Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould
Applying multiple instance learning in an SVM setup, we use the parameters of this separating hyperplane as a descriptor for the video.
no code implementations • 19 Jan 2017 • Anoop Cherian, Piotr Koniusz, Stephen Gould
The HOK descriptors are then generated from the higher-order co-occurrences of these feature maps, and are then used as input to a video-level classifier.
no code implementations • 12 Jan 2017 • Jue Wang, Anoop Cherian, Fatih Porikli
Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand.
no code implementations • 19 Jul 2016 • Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo
Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem.
no code implementations • CVPR 2016 • Piotr Koniusz, Anoop Cherian
Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computer vision for modeling data statistics, co-occurrences, or even as visual descriptors.
no code implementations • 1 Apr 2016 • Piotr Koniusz, Anoop Cherian, Fatih Porikli
We first define RBF kernels on 3D joint sequences, which are then linearized to form kernel descriptors.
no code implementations • 9 Sep 2015 • Piotr Koniusz, Anoop Cherian
Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computer vision for modelling data statistics, co-occurrences, or even as visual descriptors.
no code implementations • 10 Jul 2015 • Anoop Cherian, Suvrit Sra
Inspired by the great success of dictionary learning and sparse coding for vector-valued data, our goal in this paper is to represent data in the form of SPD matrices as sparse conic combinations of SPD atoms from a learned dictionary via a Riemannian geometric approach.
no code implementations • CVPR 2014 • Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid
In this paper, we present a method for estimating articulated human poses in videos.