Search Results for author: Jonathan Huang

Found 36 papers, 14 papers with code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Decoder Language Modelling +3

Paper
Add Code

Text and Click inputs for unambiguous open vocabulary instance segmentation

1 code implementation • 24 Nov 2023 • Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar

We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.

Instance Segmentation Segmentation +1

Paper
Code

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

no code implementations • 7 Jun 2023 • Shreyank N Gowda, Anurag Arnab, Jonathan Huang

In this paper, we address the challenges posed by the substantial training time and memory consumption associated with video transformers, focusing on the ViViT (Video Vision Transformer) model, in particular the Factorised Encoder version, as our baseline for action recognition tasks.

Action Recognition

Paper
Add Code

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

no code implementations • 3 May 2023 • Vasudha Kowtha, Miquel Espi Marques, Jonathan Huang, Yichi Zhang, Carlos Avendano

This work investigates pretrained audio representations for few shot Sound Event Detection.

Event Detection Few-Shot Learning +1

Paper
Add Code

The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift

no code implementations • CVPR 2022 • Sara Beery, Guanhang Wu, Trevor Edwards, Filip Pavetic, Bo Majewski, Shreyasee Mukherjee, Stanley Chan, John Morgan, Vivek Rathod, Jonathan Huang

We introduce baseline results on our dataset across modalities as well as metrics for the detailed analysis of generalization with respect to geographic distribution shifts, vital for such a system to be deployed at-scale.

Management

Paper
Add Code

Local Metrics for Multi-Object Tracking

1 code implementation • 6 Apr 2021 • Jack Valmadre, Alex Bewley, Jonathan Huang, Chen Sun, Cristian Sminchisescu, Cordelia Schmid

This paper introduces temporally local metrics for Multi-Object Tracking.

Multi-Object Tracking Object

Paper
Code

The surprising impact of mask-head architecture on novel class segmentation

3 code implementations • ICCV 2021 • Vighnesh Birodkar, Zhichao Lu, Siyang Li, Vivek Rathod, Jonathan Huang

Under this family, we study Mask R-CNN and discover that instead of its default strategy of training the mask-head with a combination of proposals and groundtruth boxes, training the mask-head with only groundtruth boxes dramatically improves its performance on novel classes.

Instance Segmentation Segmentation +1

76,633

Paper
Code

PERF-Net: Pose Empowered RGB-Flow Net

no code implementations • 28 Sep 2020 • Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang

In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance.

Ranked #5 on Action Recognition on UCF101

Action Classification Action Recognition +1

Paper
Add Code

Compact Speaker Embedding: lrx-vector

no code implementations • 11 Aug 2020 • Munir Georges, Jonathan Huang, Tobias Bocklet

Deep neural networks (DNN) have recently been widely used in speaker recognition systems, achieving state-of-the-art performance on various benchmarks.

Knowledge Distillation Speaker Recognition

Paper
Add Code

RetinaTrack: Online Single Stage Joint Detection and Tracking

1 code implementation • CVPR 2020 • Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang

Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other.

Ranked #1 on Multiple Object Tracking on Waymo Open Dataset

Autonomous Driving Multi-Object Tracking +3

Paper
Code

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

no code implementations • 20 Dec 2019 • Shachi H. Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman

Recent progress in visual grounding techniques and Audio Understanding are enabling machines to understand shared semantic concepts and listen to the various sensory events in the environment.

Audio Classification Visual Grounding

Paper
Add Code

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

no code implementations • 20 Dec 2019 • Shachi H. Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman

With the recent advancements in Artificial Intelligence (AI), Intelligent Virtual Assistants (IVA) such as Alexa, Google Home, etc., have become a ubiquitous part of many homes.

Audio Classification Response Generation

Paper
Add Code

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

3 code implementations • CVPR 2020 • Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang

In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.

object-detection Video Object Detection +1

76,633

Paper
Code

Structural sparsification for Far-field Speaker Recognition with GNA

no code implementations • 25 Oct 2019 • Jingchi Zhang, Jonathan Huang, Michael Deisher, Hai Li, Yiran Chen

Recently, deep neural networks (DNN) have been widely used in speaker recognition area.

Speaker Recognition

Paper
Add Code

Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

no code implementations • 20 Dec 2018 • Shachi H. Kumar, Eda Okur, Saurav Sahay, Juan Jose Alvarado Leanos, Jonathan Huang, Lama Nachman

With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home.

Audio Classification General Classification

Paper
Add Code

Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference

no code implementations • 27 Nov 2018 • Mahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer, Omesh Tickoo, Jonathan Huang

In the multimodal setting, the proposed framework improved precision-recall AUC by 10. 2% on the subset of MiT dataset as compared to non-Bayesian baseline.

Bayesian Inference Multimodal Activity Recognition +1

Paper
Add Code

Multimodal Relational Tensor Network for Sentiment and Emotion Classification

no code implementations • WS 2018 • Saurav Sahay, Shachi H. Kumar, Rui Xia, Jonathan Huang, Lama Nachman

Understanding Affect from video segments has brought researchers from the language, audio and video domains together.

Classification Emotion Classification +4

Paper
Add Code

Learning to Segment via Cut-and-Paste

1 code implementation • ECCV 2018 • Tal Remez, Jonathan Huang, Matthew Brown

This paper presents a weakly-supervised approach to object instance segmentation.

Instance Segmentation Object +2

Paper
Code

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

1 code implementation • ECCV 2018 • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.

Ranked #27 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Detection +6

124

Paper
Code

Progressive Neural Architecture Search

18 code implementations • ECCV 2018 • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.

Ranked #15 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Evolutionary Algorithms General Classification +3

76,633

Paper
Code

Generative Models of Visually Grounded Imagination

no code implementations • ICLR 2018 • Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy

It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before.

Attribute

Paper
Add Code

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

Paper
Add Code

Spatially Adaptive Computation Time for Residual Networks

1 code implementation • CVPR 2017 • Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, Ruslan Salakhutdinov

This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image.

Classification Computational Efficiency +7

245

Paper
Code

Speed/accuracy trade-offs for modern convolutional object detectors

14 code implementations • CVPR 2017 • Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang song, Sergio Guadarrama, Kevin Murphy

On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

Ranked #220 on Object Detection on COCO test-dev (using extra training data)

Object object-detection +1

76,633

Paper
Code

Im2Calories: Towards an Automated Mobile Vision Food Diary

no code implementations • ICCV 2015 • Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, Kevin P. Murphy

We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories.

Paper
Add Code

Efficient inference in occlusion-aware generative models of images

no code implementations • 19 Nov 2015 • Jonathan Huang, Kevin Murphy

We present a generative model of images based on layering, in which image layers are individually generated, then composited from front to back.

Object

Paper
Add Code

Detecting events and key actors in multi-person videos

no code implementations • CVPR 2016 • Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event.

Event Detection General Classification

Paper
Add Code

Generation and Comprehension of Unambiguous Object Descriptions

1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Image Captioning Object +1

157

Paper
Code

Deep Knowledge Tracing

6 code implementations • NeurIPS 2015 • Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein

Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education.

Ranked #1 on Knowledge Tracing on Assistments

Knowledge Tracing

259

Paper
Code

Learning Program Embeddings to Propagate Feedback on Student Code

no code implementations • 22 May 2015 • Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, Leonidas Guibas

Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students.

Paper
Add Code

What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision

1 code implementation • HLT 2015 • Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nicholas Johnston, Andrew Rabinovich, Kevin Murphy

Keyword Spotting

Paper
Code

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

1 code implementation • 5 Mar 2015 • Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nick Johnston, Andrew Rabinovich, Kevin Murphy

We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task.

Keyword Spotting

Paper
Code

Riffled Independence for Efficient Inference with Partial Rankings

no code implementations • 23 Jan 2014 • Jonathan Huang, Ashish Kapoor, Carlos Guestrin

Simultaneously addressing all of these challenges i. e., designing a compactly representable model which is amenable to efficient inference and can be learned using partial ranking data is a difficult task, but is necessary if we would like to scale to problems with nontrivial size.

Paper
Add Code

Tuned Models of Peer Assessment in MOOCs

no code implementations • 9 Jul 2013 • Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, Daphne Koller

In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students.

Paper
Add Code

Probabilistic Event Cascades for Alzheimer's disease

no code implementations • NeurIPS 2012 • Jonathan Huang, Daniel Alexander

Accurate and detailed models of the progression of neurodegenerative diseases such as Alzheimer's (AD) are crucially important for reliable early diagnosis and the determination and deployment of effective treatments.

Paper
Add Code

Riffled Independence for Ranked Data

no code implementations • NeurIPS 2009 • Jonathan Huang, Carlos Guestrin

Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings.

Card Games

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.