Search Results for author: Otniel-Bogdan Mercea

Found 7 papers, 6 papers with code

Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

2 code implementations • 9 Apr 2024 • David Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP.

Audio Classification Generalized Zero-Shot Learning

156

Paper
Code

Time-, Memory- and Parameter-Efficient Visual Adaptation

no code implementations • 5 Feb 2024 • Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab

Here, we outperform a prior adaptor-based method which could only scale to a 1 billion parameter backbone, or fully-finetuning a smaller backbone, with the same GPU and less training time.

Video Classification

Paper
Add Code

Video-adverb retrieval with compositional adverb-action embeddings

1 code implementation • 26 Sep 2023 • Thomas Hummel, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space.

Ranked #1 on Video-Adverb Retrieval (Unseen Compositions) on MSR-VTT Adverbs

Video-Adverb Retrieval (Unseen Compositions)

Paper
Code

Text-to-feature diffusion for audio-visual few-shot learning

1 code implementation • 7 Sep 2023 • Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

Classification Few-Shot Learning +1

Paper
Code

PlanT: Explainable Planning Transformers via Object-Level Representations

1 code implementation • 25 Oct 2022 • Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger

Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene.

Ranked #6 on CARLA longest6 on CARLA

CARLA longest6 Imitation Learning +1

196

Paper
Code

Temporal and cross-modal attention for audio-visual zero-shot learning

2 code implementations • 20 Jul 2022 • Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

Ranked #2 on GZSL Video Classification on UCF-GZSL(main)

GZSL Video Classification Video Classification

Paper
Code

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

1 code implementation • CVPR 2022 • Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

Ranked #1 on ZSL Video Classification on UCF-GZSL (cls)

GZSL Video Classification ZSL Video Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.