Search Results for author: Hilde Kuehne

Found 55 papers, 37 papers with code

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

1 code implementation • 4 Apr 2024 • Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne

Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision.

Paper
Code

Uncertainty Quantification via Stable Distribution Propagation

no code implementations • 13 Feb 2024 • Felix Petersen, Aashwin Mishra, Hilde Kuehne, Christian Borgelt, Oliver Deussen, Mikhail Yurochkin

We propose a new approach for propagating stable probability distributions through neural networks.

Uncertainty Quantification

Paper
Add Code

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

1 code implementation • 1 Dec 2023 • Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne

To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.

Ranked #1 on Zero Shot Segmentation on ADE20K training-free zero-shot segmentation

Image Retrieval Object Localization +2

Paper
Code

Learning Human Action Recognition Representations Without Real Humans

1 code implementation • NeurIPS 2023 • Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.

Action Recognition Ethics +2

Paper
Code

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

1 code implementation • 7 Oct 2023 • Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Specifically, we prompt an LLM to create plausible video descriptions based on ASR narrations of the video for a large-scale instructional video dataset.

Automatic Speech Recognition Sentence +3

Paper
Code

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

1 code implementation • ICCV 2023 • Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models.

Retrieval Style Transfer +1

Paper
Code

Preserving Modality Structure Improves Multi-Modal Learning

1 code implementation • ICCV 2023 • Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.

Retrieval Self-Supervised Learning

Paper
Code

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Paper
Add Code

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

1 code implementation • 1 May 2023 • Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons.

Second-order methods

Paper
Code

Learning Situation Hyper-Graphs for Video Question Answering

1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.

Ranked #6 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)

Question Answering Video Question Answering +1

Paper
Code

WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition

1 code implementation • 11 Apr 2023 • Marius Bock, Hilde Kuehne, Kristof Van Laerhoven, Michael Moeller

Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both egocentric video and inertial-based sensor data remain scarce.

Egocentric Activity Recognition Human Activity Recognition +2

Paper
Code

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

no code implementations • 29 Mar 2023 • Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Spatio-temporal grounding describes the task of localizing events in space and time, e. g., in video data, based on verbal descriptions only.

Representation Learning Spatio-Temporal Video Grounding

Paper
Add Code

Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data

1 code implementation • 23 Mar 2023 • Anna Kukleva, Moritz Böhle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht

Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details.

Self-Supervised Learning

Paper
Code

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

1 code implementation • ICCV 2023 • Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

Ranked #3 on Zero-Shot Action Recognition on Kinetics

Few-Shot action recognition Few Shot Action Recognition +5

Paper
Code

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

no code implementations • 9 Mar 2023 • Wei Lin, Anna Kukleva, Horst Possegger, Hilde Kuehne, Horst Bischof

Temporal action segmentation in untrimmed videos has gained increased attention recently.

Action Segmentation Clustering +1

Paper
Add Code

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

1 code implementation • ICCV 2023 • Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between positive data pairs, e. g., views from the same images, and maximizing distance between negative data pairs, e. g., views from different images.

Contrastive Learning Self-Supervised Learning

Paper
Code

Video Test-Time Adaptation for Action Recognition

1 code implementation • CVPR 2023 • Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof

Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts.

Action Recognition Temporal Action Localization +1

Paper
Code

Deep Differentiable Logic Gate Networks

1 code implementation • 15 Oct 2022 • Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

Recently, research has increasingly focused on developing efficient neural network architectures.

Efficient Neural Network

298

Paper
Code

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation • 7 Oct 2022 • Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Paper
Code

Contrastive Audio-Visual Masked Autoencoder

1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

201

Paper
Code

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

1 code implementation • 12 Sep 2022 • Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne

We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision.

Attribute Retrieval +2

Paper
Code

Augmentation Learning for Semi-Supervised Classification

no code implementations • 3 Aug 2022 • Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne

As the accuracy for ImageNet and similar datasets increased over time, the performance on tasks beyond the classification of natural images is yet to be explored.

Classification Data Augmentation +1

Paper
Add Code

Weakly Supervised Grounding for VQA in Vision-Language Transformers

1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah

Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.

Question Answering Representation Learning +1

Paper
Code

Differentiable Top-k Classification Learning

1 code implementation • 15 Jun 2022 • Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen

In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a differentiable top-k cross-entropy classification loss.

Ranked #58 on Image Classification on ImageNet

General Classification Image Classification

Paper
Code

CycDA: Unsupervised Cycle Domain Adaptation from Image to Video

1 code implementation • 30 Mar 2022 • Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof

To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap.

Action Recognition Domain Adaptation +1

Paper
Code

Monotonic Differentiable Sorting Networks

1 code implementation • ICLR 2022 • Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

We introduce a family of sigmoid functions and prove that they produce differentiable sorting networks that are monotonic.

102

Paper
Code

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation • CVPR 2022 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Paper
Code

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation • 8 Dec 2021 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Paper
Code

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Paper
Code

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

Paper
Add Code

Cascaded Multilingual Audio-Visual Learning from Videos

1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

In this paper, we explore self-supervised audio-visual models that learn from instructional videos.

audio-visual learning Retrieval

Paper
Code

Style Agnostic 3D Reconstruction via Adversarial Style Transfer

no code implementations • 20 Oct 2021 • Felix Petersen, Bastian Goldluecke, Oliver Deussen, Hilde Kuehne

Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image.

3D Object Reconstruction 3D Reconstruction +3

Paper
Add Code

Learning with Algorithmic Supervision via Continuous Relaxations

1 code implementation • NeurIPS 2021 • Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

The integration of algorithmic components into neural architectures has gained increased attention recently, as it allows training neural networks with new forms of supervision such as ordering constraints or silhouettes instead of using ground truth labels.

Paper
Code

Propagating Distributions through Neural Networks

no code implementations • 29 Sep 2021 • Felix Petersen, Christian Borgelt, Mikhail Yurochkin, Hilde Kuehne, Oliver Deussen

We propose a new approach to propagating probability distributions through neural networks.

regression

Paper
Add Code

A Sampling-Free Approximation of Gaussian Variational Auto-Encoders

no code implementations • 29 Sep 2021 • Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

We propose a sampling-free approximate formulation of Gaussian variational auto-encoders.

Paper
Add Code

Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting

1 code implementation • ICCV 2021 • Anna Kukleva, Hilde Kuehne, Bernt Schiele

Both generalized and incremental few-shot learning have to deal with three major challenges: learning novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes.

Classifier calibration Few-Shot Learning

Paper
Code

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Paper
Code

Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision

1 code implementation • 9 May 2021 • Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints.

102

Paper
Code

Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities

no code implementations • 30 Apr 2021 • Sirnam Swetha, Hilde Kuehne, Yogesh S Rawat, Mubarak Shah

This paper proposes a novel approach for unsupervised sub-action learning in complex activities.

Ranked #28 on Action Segmentation on Breakfast

Action Recognition Action Segmentation

Paper
Add Code

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Ranked #4 on Long Video Retrieval (Background Removed) on YouCook2

Clustering Contrastive Learning +6

Paper
Code

Detector-Free Weakly Supervised Grounding by Separation

1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Ranked #1 on Phrase Grounding on Visual Genome

Phrase Grounding

Paper
Code

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

no code implementations • 29 Jan 2020 • Rosaura G. VidalMata, Walter J. Scheirer, Anna Kukleva, David Cox, Hilde Kuehne

Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition.

Action Recognition Action Segmentation +1

Paper
Add Code

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

1 code implementation • NeurIPS 2019 • Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, David Cox

Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets.

Ranked #89 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +1

Paper
Code

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation

no code implementations • 3 Jun 2019 • Hilde Kuehne, Alexander Richard, Juergen Gall

Action recognition has become a rapidly developing research field within the last decade.

Action Recognition Action Segmentation +1

Paper
Add Code

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

1 code implementation • 3 Jun 2019 • Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field.

Action Recognition General Classification +1

Paper
Code

Unsupervised learning of action classes with continuous temporal embedding

2 code implementations • CVPR 2019 • Anna Kukleva, Hilde Kuehne, Fadime Sener, Juergen Gall

The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently.

Paper
Code

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

no code implementations • CVPR 2018 • Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall

Video learning is an important task in computer vision and has experienced increasing interest over the recent years.

Ranked #6 on Weakly Supervised Action Segmentation (Transcript) on Breakfast

Incremental Learning Segmentation +3

Paper
Add Code

Recurrent Residual Learning for Action Recognition

no code implementations • 27 Jun 2017 • Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition.

Action Recognition Image Classification +1

Paper
Add Code

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

1 code implementation • CVPR 2018 • Alexander Richard, Hilde Kuehne, Juergen Gall

Action detection and temporal segmentation of actions in videos are topics of increasing interest.

Action Detection Action Segmentation

Paper
Code

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

1 code implementation • CVPR 2017 • Alexander Richard, Hilde Kuehne, Juergen Gall

We present an approach for weakly supervised learning of human actions.

Action Segmentation Weakly-supervised Learning

Paper
Code

Weakly supervised learning of actions from transcripts

no code implementations • 7 Oct 2016 • Hilde Kuehne, Alexander Richard, Juergen Gall

Our system is based on the idea that, given a sequence of input data and a transcript, i. e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation.

Weakly-supervised Learning

Paper
Add Code

An end-to-end generative framework for video segmentation and recognition

no code implementations • 7 Sep 2015 • Hilde Kuehne, Juergen Gall, Thomas Serre

We describe an end-to-end generative approach for the segmentation and recognition of human activities.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos

no code implementations • 25 Aug 2015 • Hilde Kuehne, Juergen Gall, Thomas Serre

Through extensive system evaluations, we demonstrate that combining compact video representations based on Fisher Vectors with HMM-based modeling yields very significant gains in accuracy and when properly trained with sufficient training samples, structured temporal models outperform unstructured bag-of-word types of models by a large margin on the tested performance metric.

Action Recognition Temporal Action Localization

Paper
Add Code

The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities

no code implementations • CVPR 2014 • Hilde Kuehne, Ali Arslan, Thomas Serre

This paper describes a framework for modeling human activities as temporally structured processes.

Semantic Parsing speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.