Search Results for author: Tim K. Marks

Found 20 papers, 4 papers with code

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

no code implementations • 25 Apr 2024 • Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image.

Paper
Add Code

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

no code implementations • ICCV 2023 • Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks

To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation.

Colorization Conditional Image Generation +2

Paper
Add Code

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions

no code implementations • 22 Oct 2022 • Kei Ota, Hsiao-Yu Tung, Kevin A. Smith, Anoop Cherian, Tim K. Marks, Alan Sullivan, Asako Kanezaki, Joshua B. Tenenbaum

The world is filled with articulated objects that are difficult to determine how to use from vision alone, e. g., a door might open inwards or outwards.

Paper
Add Code

(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering

no code implementations • 18 Feb 2022 • Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux

Spatio-temporal scene-graph approaches to video-based reasoning tasks, such as video question-answering (QA), typically construct such graphs for every video frame.

Ranked #23 on Video Question Answering on NExT-QA

Question Answering Spatio-temporal Scene Graphs +1

Paper
Add Code

MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation

no code implementations • 1 Nov 2021 • Safa C. Medin, Bernhard Egger, Anoop Cherian, Ye Wang, Joshua B. Tenenbaum, Xiaoming Liu, Tim K. Marks

Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis.

Disentanglement Image Generation +1

Paper
Add Code

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

Paper
Add Code

InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images

no code implementations • ICCV 2021 • Anoop Cherian, Goncalo Dias Pais, Siddarth Jain, Tim K. Marks, Alan Sullivan

To use our model for instance segmentation, we propose an instance pose encoder that learns to take in a generated depth image and reproduce the pose code vectors for all of the object instances.

Generative Adversarial Network Instance Segmentation +2

Paper
Add Code

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

1 code implementation • CVPR 2020 • Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng

In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.

Ranked #1 on Face Alignment on Menpo

Face Alignment Facial Landmark Detection

Paper
Code

Spatio-Temporal Ranked-Attention Networks for Video Captioning

no code implementations • 17 Jan 2020 • Anoop Cherian, Jue Wang, Chiori Hori, Tim K. Marks

To this end, we propose a Spatio-Temporal and Temporo-Spatial (STaTS) attention model which, conditioned on the language state, hierarchically combines spatial and temporal attention to videos in two different orders: (i) a spatio-temporal (ST) sub-model, which first attends to regions that have temporal evolution, then temporally pools the features from these regions; and (ii) a temporo-spatial (TS) sub-model, which first decides a single frame to attend to, then applies spatial attention within that frame.

Video Captioning

Paper
Add Code

The Eighth Dialog System Technology Challenge

no code implementations • 14 Nov 2019 • Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta

This paper introduces the Eighth Dialog System Technology Challenge.

dialog state tracking

Paper
Add Code

Audio-Visual Scene-Aware Dialog

2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

We introduce the task of scene-aware dialog.

Scene-Aware Dialogue

Paper
Code

Dialog System Technology Challenge 7

no code implementations • 11 Jan 2019 • Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems.

Sentence

Paper
Add Code

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

We introduce a new dataset of dialogs about videos of human behaviors.

Question Answering Video Description +1

Paper
Code

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

4 code implementations • 1 Jun 2018 • Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

Video Description Visual Dialog

Paper
Code

Class Subset Selection for Transfer Learning using Submodularity

no code implementations • 30 Mar 2018 • Varun Manjunatha, Srikumar Ramalingam, Tim K. Marks, Larry Davis

To accomplish this, we use a submodular set function to model the accuracy achievable on a new task when the features have been learned on a given subset of classes of the source dataset.

Image Classification Transfer Learning

Paper
Add Code

Attention-Based Multimodal Fusion for Video Description

no code implementations • ICCV 2017 • Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks

Currently successful methods for video description are based on encoder-decoder sentence generation using recur-rent neural networks (RNNs).

Sentence Video Description

Paper
Add Code

A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection

no code implementations • CVPR 2016 • Bharat Singh, Tim K. Marks, Michael Jones, Oncel Tuzel, Ming Shao

We present a multi-stream bi-directional recurrent neural network for fine-grained action detection.

Action Recognition In Videos Fine-Grained Action Detection +2

Paper
Add Code

Robust Face Alignment Using a Mixture of Invariant Experts

no code implementations • 13 Nov 2015 • Oncel Tuzel, Tim K. Marks, Salil Tambe

Face alignment is particularly challenging when there are large variations in pose (in-plane and out-of-plane rotations) and facial expression.

Face Alignment regression +1

Paper
Add Code

An Improved Deep Learning Architecture for Person Re-Identification

no code implementations • CVPR 2015 • Ejaz Ahmed, Michael Jones, Tim K. Marks

Novel elements of our architecture include a layer that computes cross-input neighborhood differences, which capture local relationships among mid-level features that were computed separately from the two input images.

Person Re-Identification Small Data Image Classification

Paper
Add Code

Real-Time 3D Head Pose and Facial Landmark Estimation From Depth Images Using Triangular Surface Patch Features

no code implementations • CVPR 2015 • Chavdar Papazov, Tim K. Marks, Michael Jones

The matched triangular surface patches in the training set are used to compute estimates of the 3D head pose and facial landmark positions in the input depth map.

Face Alignment Head Pose Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.