Search Results for author: Maksim Dzabraev

Found 4 papers, 2 papers with code

VLRM: Vision-Language Models act as Reward Models for Image Captioning

no code implementations • 2 Apr 2024 • Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta

In this work, we present an unsupervised method for enhancing an image captioning model (in our case, BLIP2) using reinforcement learning and vision-language models like CLIP and BLIP2-ITM as reward models.

Image Captioning reinforcement-learning

Paper
Add Code

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

no code implementations • 14 Mar 2022 • Alexander Kunitsyn, Maksim Kalashnikov, Maksim Dzabraev, Andrei Ivaniuta

In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model.

Ranked #1 on Video Retrieval on TGIF (using extra training data)

Retrieval Text to Video Retrieval +1

Paper
Add Code

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

3 code implementations • 19 Mar 2021 • Maksim Dzabraev, Maksim Kalashnikov, Stepan Komkov, Aleksandr Petiushko

We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin.

Ranked #25 on Video Retrieval on LSMDC (using extra training data)

Retrieval Text to Video Retrieval +1

3,001

Paper
Code

Mutual Modality Learning for Video Action Classification

1 code implementation • 4 Nov 2020 • Stepan Komkov, Maksim Dzabraev, Aleksandr Petiushko

In this paper, we explore the various methods to embed the ensemble power into a single model.

Ranked #47 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.