Search Results for author: Ivan Laptev

Found 87 papers, 49 papers with code

Learning Actionness via Long-range Temporal Order Verification

no code implementations • ECCV 2020 • Dimitri Zhukov, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic

The annotation is particularly difficult for temporal action localization where large parts of the video present no action, or background.

Action Recognition Temporal Action Localization

Paper
Add Code

SUGAR: Pre-training 3D Visual Representations for Robotics

no code implementations • 1 Apr 2024 • ShiZhe Chen, Ricardo Garcia, Ivan Laptev, Cordelia Schmid

SUGAR employs a versatile transformer-based model to jointly address five pre-training tasks, namely cross-modal knowledge distillation for semantic learning, masked point modeling to understand geometry structures, grasping pose synthesis for object affordance, 3D instance segmentation and referring expression grounding to analyze cluttered scenes.

3D Instance Segmentation 3D Object Recognition +5

Paper
Add Code

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

1 code implementation • 12 Dec 2023 • Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic

We address the task of generating temporally consistent and physically plausible images of actions and object state transformations.

Object

Paper
Code

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

1 code implementation • 27 Sep 2023 • ShiZhe Chen, Ricardo Garcia, Cordelia Schmid, Ivan Laptev

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics.

Ranked #5 on Robot Manipulation on RLBench

Multi-Task Learning Robot Manipulation

Paper
Code

VidChapters-7M: Video Chapters at Scale

no code implementations • NeurIPS 2023 • Antoine Yang, Arsha Nagrani, Ivan Laptev, Josef Sivic, Cordelia Schmid

To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total.

Dense Video Captioning Navigate

Paper
Add Code

Object Goal Navigation with Recursive Implicit Maps

no code implementations • 10 Aug 2023 • ShiZhe Chen, Thomas Chabal, Ivan Laptev, Cordelia Schmid

Object goal navigation aims to navigate an agent to locations of a given object category in unseen environments.

Navigate Object

Paper
Add Code

Robust Visual Sim-to-Real Transfer for Robotic Manipulation

no code implementations • 28 Jul 2023 • Ricardo Garcia, Robin Strudel, ShiZhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid

While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks.

object-detection Object Detection +1

Paper
Add Code

Learning Video-Conditioned Policies for Unseen Manipulation Tasks

no code implementations • 10 May 2023 • Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev

To encourage generalization to new tasks, we avoid particular tasks during training and learn our policy from unlabelled robot trajectories and corresponding robot videos.

Action Recognition Robot Manipulation +1

Paper
Add Code

gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction

no code implementations • CVPR 2023 • Zerui Chen, ShiZhe Chen, Cordelia Schmid, Ivan Laptev

In particular, we address reconstruction of hands and manipulated objects from monocular RGB images.

Ranked #5 on hand-object pose on DexYCB

3D Reconstruction 3D Shape Reconstruction +2

Paper
Add Code

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

3 code implementations • CVPR 2023 • Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale.

Ranked #1 on Dense Video Captioning on ActivityNet Captions (using extra training data)

Dense Video Captioning Language Modelling +1

2,986

Paper
Code

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation

2 code implementations • 20 Dec 2022 • Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Benoît Sagot, Rachel Bawden

One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as images.

Multimodal Machine Translation Translation

Paper
Code

Image Compression with Product Quantized Masked Image Modeling

no code implementations • 14 Dec 2022 • Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Hervé Jégou

In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression.

Image Compression Image Generation +3

Paper
Add Code

Multi-Task Learning of Object State Changes from Uncurated Videos

1 code implementation • 24 Nov 2022 • Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos.

Multi-Task Learning Object +2

Paper
Code

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

1 code implementation • 17 Nov 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.

Object Relation

Paper
Code

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

no code implementations • 19 Sep 2022 • Quentin Le Lidec, Wilson Jallet, Ivan Laptev, Cordelia Schmid, Justin Carpentier

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.

Reinforcement Learning (RL) valid

Paper
Add Code

Instruction-driven history-aware policies for robotic manipulations

2 code implementations • 11 Sep 2022 • Pierre-Louis Guhur, ShiZhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions.

Ranked #2 on Robot Manipulation on RLBench (Succ. Rate (10 tasks, 100 demos/task) metric)

Robot Manipulation

Paper
Code

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

1 code implementation • 24 Aug 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions.

Ranked #1 on Visual Navigation on SOON Test

Language Modelling Navigate +3

Paper
Code

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

1 code implementation • 26 Jul 2022 • Zerui Chen, Yana Hasson, Cordelia Schmid, Ivan Laptev

We show that such aligned SDFs better focus on reconstructing shape details and improve reconstruction accuracy both for hands and objects.

Ranked #9 on hand-object pose on DexYCB

hand-object pose Object Reconstruction

Paper
Code

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

2 code implementations • 16 Jun 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid

Manual annotation of question and answers for videos, however, is tedious and prohibits scalability.

Ranked #1 on Zero-Shot Video Question Answer on TVQA

Fill Mask Language Modelling +6

142

Paper
Code

Weakly-supervised segmentation of referring expressions

no code implementations • 10 May 2022 • Robin Strudel, Ivan Laptev, Cordelia Schmid

Visual grounding localizes regions (boxes or segments) in the image corresponding to given referring expressions.

Image Segmentation Referring Expression +5

Paper
Add Code

Learning to Answer Visual Questions from Web Videos

1 code implementation • 10 May 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid

We use our method to generate the WebVidVQA3M dataset from the WebVid dataset, i. e., videos with alt-text annotations, and show its benefits for training VideoQA models.

Question Answering Question Generation +4

113

Paper
Code

TubeDETR: Spatio-Temporal Video Grounding with Transformers

1 code implementation • CVPR 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.

Ranked #2 on Spatio-Temporal Video Grounding on VidSTG

Language-Based Temporal Localization Natural Language Visual Grounding +5

155

Paper
Code

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

1 code implementation • CVPR 2022 • Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

In this paper, we seek to temporally localize object states (e. g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision.

Object

Paper
Code

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

To balance the complexity of large action space reasoning and fine-grained language grounding, we dynamically combine a fine-scale encoding over local observations and a coarse-scale encoding on a global map via graph transformers.

Ranked #4 on Visual Navigation on SOON Test

Efficient Exploration Navigate +2

Paper
Code

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

no code implementations • 20 Dec 2021 • Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.

Denoising Instance Segmentation +1

Paper
Add Code

Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos

no code implementations • 2 Nov 2021 • Zongmian Li, Jiri Sedlar, Justin Carpentier, Ivan Laptev, Nicolas Mansard, Josef Sivic

First, we introduce an approach to jointly estimate the motion and the actuation forces of the person on the manipulated object by modeling contacts and the dynamics of the interactions.

Human-Object Interaction Detection Object

Paper
Add Code

History Aware Multimodal Transformer for Vision-and-Language Navigation

1 code implementation • NeurIPS 2021 • ShiZhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow instructions and navigate in real scenes.

Ranked #3 on Vision and Language Navigation on RxR

Decision Making Navigate +2

Paper
Code

Differentiable Rendering with Perturbed Optimizers

no code implementations • NeurIPS 2021 • Quentin Le Lidec, Ivan Laptev, Cordelia Schmid, Justin Carpentier

Notably, images depend both on the properties of observed scenes and on the process of image formation.

3D Scene Reconstruction 6D Pose Estimation

Paper
Add Code

Reconstructing and grounding narrated instructional videos in 3D

no code implementations • 9 Sep 2021 • Dimitri Zhukov, Ignacio Rocco, Ivan Laptev, Josef Sivic, Johannes L. Schönberger, Bugra Tekin, Marc Pollefeys

Contrary to the standard scenario of instance-level 3D reconstruction, where identical objects or scenes are present in all views, objects in different instructional videos may have large appearance variations given varying conditions and versions of the same product.

3D Reconstruction

Paper
Add Code

Airbert: In-domain Pretraining for Vision-and-Language Navigation

2 code implementations • ICCV 2021 • Pierre-Louis Guhur, Makarand Tapaswi, ShiZhe Chen, Ivan Laptev, Cordelia Schmid

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Ranked #3 on Vision and Language Navigation on VLN Challenge

Navigate Referring Expression +1

Paper
Code

Towards unconstrained joint hand-object reconstruction from RGB videos

1 code implementation • 16 Aug 2021 • Yana Hasson, Gül Varol, Ivan Laptev, Cordelia Schmid

Our work aims to obtain 3D reconstruction of hands and manipulated objects from monocular videos.

Ranked #5 on hand-object pose on HO-3D

3D Reconstruction hand-object pose +6

Paper
Code

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

no code implementations • 1 Jul 2021 • Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev

Goal-conditioned reinforcement learning endows an agent with a large variety of skills, but it often struggles to solve tasks that require more temporally extended reasoning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

XCiT: Cross-Covariance Image Transformers

11 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Ranked #55 on Instance Segmentation on COCO minival

Instance Segmentation object-detection +3

29,680

Paper
Code

Segmenter: Transformer for Semantic Segmentation

7 code implementations • ICCV 2021 • Robin Strudel, Ricardo Garcia, Ivan Laptev, Cordelia Schmid

In this paper we introduce Segmenter, a transformer model for semantic segmentation.

Ranked #15 on Semantic Segmentation on PASCAL Context

Image Classification Image Segmentation +3

8,228

Paper
Code

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

no code implementations • CVPR 2021 • Antoine Miech, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic, Andrew Zisserman

We also extend our method to the video domain, improving the state of the art on the VATEX dataset.

Re-Ranking Retrieval

Paper
Add Code

Training Vision Transformers for Image Retrieval

1 code implementation • 10 Feb 2021 • Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.

Image Classification Image Retrieval +3

Paper
Code

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

1 code implementation • ICCV 2021 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid

In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.

Ranked #1 on Video Question Answering on VideoQA

Question Answering Question Generation +4

113

Paper
Code

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

1 code implementation • 13 Nov 2020 • Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic

We evaluate our method on simple single- and two-object actions from the Something-Something dataset.

Object

Paper
Code

Learning Obstacle Representations for Neural Motion Planning

1 code implementation • 25 Aug 2020 • Robin Strudel, Ricardo Garcia, Justin Carpentier, Jean-Paul Laumond, Ivan Laptev, Cordelia Schmid

Motion planning and obstacle avoidance is a key challenge in robotics applications.

Robotics

Paper
Code

RareAct: A video dataset of unusual interactions

1 code implementation • 3 Aug 2020 • Antoine Miech, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic, Andrew Zisserman

This paper introduces a manually annotated video dataset of unusual actions, namely RareAct, including actions such as "blend phone", "cut keyboard" and "microwave shoes".

Action Recognition

Paper
Code

The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)

1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao

This report summarizes the results of the first edition of the challenge together with the findings of the participants.

Natural Language Queries Retrieval +3

327

Paper
Code

Occlusion resistant learning of intuitive physics from videos

no code implementations • 30 Apr 2020 • Ronan Riochet, Josef Sivic, Ivan Laptev, Emmanuel Dupoux

In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions.

Object

Paper
Add Code

Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction

no code implementations • CVPR 2020 • Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid

Modeling hand-object manipulations is essential for understanding how humans interact with their environment.

Ranked #9 on hand-object pose on HO-3D

hand-object pose Object +3

Paper
Add Code

Learning visual policies for building 3D shape categories

no code implementations • 15 Apr 2020 • Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid

We then show the success of our visual policies for building arches from different primitives.

Object

Paper
Add Code

Learning Interactions and Relationships between Movie Characters

1 code implementation • CVPR 2020 • Anna Kukleva, Makarand Tapaswi, Ivan Laptev

Localizing the pair of interacting characters in video is a time-consuming process, instead, we train our model to learn from clip-level weak labels.

Paper
Code

Action Modifiers: Learning from Adverbs in Instructional Videos

1 code implementation • CVPR 2020 • Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations.

Video-Adverb Retrieval

Paper
Code

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

4 code implementations • CVPR 2020 • Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev, Josef Sivic, Andrew Zisserman

Annotating videos is cumbersome, expensive and not scalable.

Ranked #3 on Action Recognition on RareAct

Action Recognition Action Segmentation +5

207

Paper
Code

Synthetic Humans for Action Recognition from Unseen Viewpoints

1 code implementation • 9 Dec 2019 • Gül Varol, Ivan Laptev, Cordelia Schmid, Andrew Zisserman

Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored.

Action Classification Action Recognition +2

Paper
Code

Learning to combine primitive skills: A step towards versatile robotic manipulation

1 code implementation • 2 Aug 2019 • Robin Strudel, Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Josef Sivic, Cordelia Schmid

Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision.

Data Augmentation Imitation Learning +4

Paper
Code

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

4 code implementations • ICCV 2019 • Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic

In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.

Ranked #4 on Temporal Action Localization on CrossTask

Action Localization Long Video Retrieval (Background Removed) +3

207

Paper
Code

Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning

2 code implementations • 23 Apr 2019 • Yann Labbé, Sergey Zagoruyko, Igor Kalevatykh, Ivan Laptev, Justin Carpentier, Mathieu Aubry, Josef Sivic

We address the problem of visually guided rearrangement planning with many movable objects, i. e., finding a sequence of actions to move a set of objects from an initial arrangement to a desired one, while relying on visual inputs coming from an RGB camera.

Paper
Code

Deep Metric Learning Beyond Binary Supervision

1 code implementation • CVPR 2019 • Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak

Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not.

Image Captioning Image Retrieval +4

Paper
Code

Learning joint reconstruction of hands and manipulated objects

3 code implementations • CVPR 2019 • Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid

Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation.

Ranked #7 on hand-object pose on DexYCB

Hand Joint Reconstruction hand-object pose +2

559

Paper
Code

Estimating 3D Motion and Forces of Person-Object Interactions from Monocular Video

1 code implementation • CVPR 2019 • Zongmian Li, Jiri Sedlar, Justin Carpentier, Ivan Laptev, Nicolas Mansard, Josef Sivic

First, we introduce an approach to jointly estimate the motion and the actuation forces of the person on the manipulated object by modeling contacts and the dynamics of their interactions.

Object

Paper
Code

Cross-task weakly supervised learning from instructional videos

2 code implementations • CVPR 2019 • Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic

In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations.

Ranked #5 on Temporal Action Localization on CrossTask

Weakly-supervised Learning

Paper
Code

Learning to Augment Synthetic Images for Sim2Real Policy Transfer

1 code implementation • 18 Mar 2019 • Alexander Pashevich, Robin Strudel, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid

Policies learned in simulators, however, do not transfer well to real scenes given the domain gap between real and synthetic data.

Object Localization

Paper
Code

Detecting unseen visual relations using analogies

no code implementations • ICCV 2019 • Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic

We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are unseen at training.

Retrieval

Paper
Add Code

Tube-CNN: Modeling temporal evolution of appearance for object detection in video

no code implementations • 6 Dec 2018 • Tuan-Hung Vu, Anton Osokin, Ivan Laptev

Our goal in this paper is to learn discriminative models for the temporal evolution of object appearance and to use such models for object detection.

Object object-detection +2

Paper
Add Code

MobileFace: 3D Face Reconstruction with Efficient CNN Regression

1 code implementation • 24 Sep 2018 • Nikolai Chinaev, Alexander Chigorin, Ivan Laptev

Estimation of facial shapes plays a central role for face transfer and animation.

3D Face Reconstruction Face Transfer +1

Paper
Code

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

no code implementations • 22 Sep 2018 • Meera Hahn, Nataniel Ruiz, Jean-Baptiste Alayrac, Ivan Laptev, James M. Rehg

Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision.

Object Object Recognition

Paper
Add Code

A flexible model for training action localization with varying levels of supervision

1 code implementation • NeurIPS 2018 • Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid

Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization.

Action Detection Action Localization +1

Paper
Code

Modeling Spatio-Temporal Human Track Structure for Action Localization

no code implementations • 28 Jun 2018 • Guilhem Chéron, Anton Osokin, Ivan Laptev, Cordelia Schmid

In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks.

Human Detection Optical Flow Estimation +3

Paper
Add Code

BodyNet: Volumetric Inference of 3D Human Body Shapes

2 code implementations • ECCV 2018 • Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid

Human shape estimation is an important task for video editing, animation and fashion industry.

Ranked #3 on 3D Human Pose Estimation on Surreal (using extra training data)

3D Human Pose Estimation Segmentation +1

261

Paper
Code

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

5 code implementations • 7 Apr 2018 • Antoine Miech, Ivan Laptev, Josef Sivic

We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.

Ranked #33 on Video Retrieval on LSMDC (using extra training data)

Retrieval Text Retrieval +2

148

Paper
Code

Weakly-supervised learning of visual relations

no code implementations • ICCV 2017 • Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic

This paper introduces a novel approach for modeling visual relations between pairs of objects.

Ranked #5 on Visual Relationship Detection on VRD Predicate Detection

Clustering Relation +3

Paper
Add Code

Learning from Video and Text via Large-Scale Discriminative Clustering

2 code implementations • ICCV 2017 • Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic

Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks.

Ranked #35 on Video Retrieval on LSMDC

Clustering Temporal Action Localization +4

Paper
Code

Learnable pooling with Context Gating for video classification

5 code implementations • 21 Jun 2017 • Antoine Miech, Ivan Laptev, Josef Sivic

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

Classification Clustering +3

467

Paper
Code

Joint Discovery of Object States and Manipulation Actions

1 code implementation • ICCV 2017 • Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien

We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision.

Action Recognition Clustering +2

Paper
Code

Learning from Synthetic Humans

2 code implementations • CVPR 2017 • Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid

In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data.

2D Human Pose Estimation 3D Human Pose Estimation +2

576

Paper
Code

ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization

1 code implementation • 14 Sep 2016 • Vadim Kantorov, Maxime Oquab, Minsu Cho, Ivan Laptev

The additive model encourages the predicted object region to be supported by its surrounding context region.

Ranked #4 on Weakly Supervised Object Detection on Charades

Object Object Recognition +2

Paper
Code

Much Ado About Time: Exhaustive Annotation of Temporal Data

no code implementations • 25 Jul 2016 • Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).

Paper
Add Code

Thin-Slicing for Pose: Learning to Understand Pose Without Explicit Pose Estimation

no code implementations • CVPR 2016 • Suha Kwak, Minsu Cho, Ivan Laptev

We address the problem of learning a pose-aware, compact embedding that projects images with similar human poses to be placed close-by in the embedding space.

Action Recognition Image Retrieval +3

Paper
Add Code

Instance-Level Video Segmentation From Object Tracks

no code implementations • CVPR 2016 • Guillaume Seguin, Piotr Bojanowski, Remi Lajugie, Ivan Laptev

We address the problem of segmenting multiple object instances in complex videos.

Clustering Object +5

Paper
Add Code

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations • 21 Apr 2016 • Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +3

Paper
Add Code

Long-term Temporal Convolutions for Action Recognition

1 code implementation • 15 Apr 2016 • Gül Varol, Ivan Laptev, Cordelia Schmid

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure.

Ranked #63 on Action Recognition on HMDB-51

Action Recognition Optical Flow Estimation +1

Paper
Code

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations • 6 Apr 2016 • Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition Temporal Action Localization

Paper
Add Code

Context-aware CNNs for person head detection

1 code implementation • ICCV 2015 • Tuan-Hung Vu, Anton Osokin, Ivan Laptev

First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image.

Face Detection Head Detection +1

202

Paper
Code

Unsupervised Learning from Narrated Instruction Videos

no code implementations • CVPR 2016 • Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien

Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

Ranked #7 on Temporal Action Localization on CrossTask

Clustering

Paper
Add Code

P-CNN: Pose-based CNN Features for Action Recognition

no code implementations • ICCV 2015 • Guilhem Chéron, Ivan Laptev, Cordelia Schmid

This work targets human action recognition in video.

Action Recognition Temporal Action Localization

Paper
Add Code

Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks

no code implementations • CVPR 2015 • Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic

Successful visual object recognition methods typically rely on training datasets containing lots of richly annotated images.

General Classification Object +3

Paper
Add Code

Weakly-Supervised Alignment of Video With Text

no code implementations • ICCV 2015 • Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid

Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.

Sentence

Paper
Add Code

Unsupervised Object Discovery and Tracking in Video Collections

no code implementations • ICCV 2015 • Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid

This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision.

Object Object Discovery +1

Paper
Add Code

On Pairwise Costs for Network Flow Multi-Object Tracking

no code implementations • CVPR 2015 • Visesh Chari, Simon Lacoste-Julien, Ivan Laptev, Josef Sivic

Multi-object tracking has been recently approached with the min-cost network flow optimization techniques.

Multi-Object Tracking Object

Paper
Add Code

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

no code implementations • 4 Jul 2014 • Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic

We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.

Paper
Add Code

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

1 code implementation • CVPR 2014 • Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic

We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets.

Action Classification Action Localization +4

271

Paper
Code

Efficient Feature Extraction, Encoding and Classification for Action Recognition

no code implementations • CVPR 2014 • Vadim Kantorov, Ivan Laptev

Local video features provide state-of-the-art performance for action recognition.

Action Classification Action Recognition +4

Paper
Add Code

Learning person-object interactions for action recognition in still images

no code implementations • NeurIPS 2011 • Vincent Delaitre, Josef Sivic, Ivan Laptev

First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors.

Action Recognition In Still Images Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.