Search Results for author: Sudheendra Vijayanarasimhan

Found 18 papers, 6 papers with code

IC3: Image Captioning by Committee Consensus

1 code implementation • 2 Feb 2023 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

If you ask a human to describe an image, they might do so in a thousand different ways.

Image Captioning

Paper
Code

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.

Action Detection Optical Flow Estimation

Paper
Add Code

Distribution Aware Metrics for Conditional Natural Language Generation

no code implementations • 15 Sep 2022 • David M Chan, Yiming Ni, David A Ross, Sudheendra Vijayanarasimhan, Austin Myers, John Canny

In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context.

speech-recognition Speech Recognition +1

Paper
Add Code

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Video Description

Paper
Code

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

no code implementations • 27 Jul 2020 • David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

Active Learning Video Captioning +1

Paper
Add Code

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

no code implementations • CVPR 2018 • Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework.

Ranked #27 on Temporal Action Localization on THUMOS’14

Action Classification General Classification +3

Paper
Add Code

End-to-End Learning of Semantic Grasping

no code implementations • 6 Jul 2017 • Eric Jang, Sudheendra Vijayanarasimhan, Peter Pastor, Julian Ibarz, Sergey Levine

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images.

Object object-detection +3

Paper
Add Code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Ranked #6 on Action Detection on UCF101-24

Actin Detection Action Detection +3

76,588

Paper
Code

The Kinetics Human Action Video Dataset

12 code implementations • 19 May 2017 • Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman

We describe the DeepMind Kinetics human action video dataset.

Action Classification General Classification

1,703

Paper
Code

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

Paper
Add Code

SfM-Net: Learning of Structure and Motion from Video

no code implementations • 25 Apr 2017 • Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki

We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.

Motion Estimation Object +1

Paper
Add Code

YouTube-8M: A Large-Scale Video Classification Benchmark

6 code implementations • 27 Sep 2016 • Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

Ranked #1 on Action Recognition In Videos on ActivityNet

3D Face Reconstruction Action Recognition In Videos +2

2,273

Paper
Code

Efficient Large Scale Video Classification

no code implementations • 22 May 2015 • Balakrishnan Varadarajan, George Toderici, Sudheendra Vijayanarasimhan, Apostol Natsev

We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost.

Classification General Classification +2

Paper
Add Code

Beyond Short Snippets: Deep Networks for Video Classification

1 code implementation • CVPR 2015 • Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval.

Ranked #5 on Action Recognition on Sports-1M

Action Recognition Classification +4

401

Paper
Code

Deep Networks With Large Output Spaces

no code implementations • 23 Dec 2014 • Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik

Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data.

Video Recognition

Paper
Add Code

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

no code implementations • CVPR 2013 • Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik

Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object's appearance, such as the presence of component parts.

object-detection Object Detection

Paper
Add Code

Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

no code implementations • NeurIPS 2010 • Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point.

Active Learning

Paper
Add Code

Multi-Level Active Prediction of Useful Image Annotations for Recognition

no code implementations • NeurIPS 2008 • Sudheendra Vijayanarasimhan, Kristen Grauman

We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.