Search Results for author: Maksim Dzabraev

Found 4 papers, 2 papers with code

VLRM: Vision-Language Models act as Reward Models for Image Captioning

no code implementations2 Apr 2024 Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta

In this work, we present an unsupervised method for enhancing an image captioning model (in our case, BLIP2) using reinforcement learning and vision-language models like CLIP and BLIP2-ITM as reward models.

Image Captioning reinforcement-learning

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

no code implementations14 Mar 2022 Alexander Kunitsyn, Maksim Kalashnikov, Maksim Dzabraev, Andrei Ivaniuta

In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model.

 Ranked #1 on Video Retrieval on TGIF (using extra training data)

Retrieval Text to Video Retrieval +1

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

3 code implementations19 Mar 2021 Maksim Dzabraev, Maksim Kalashnikov, Stepan Komkov, Aleksandr Petiushko

We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin.

Ranked #25 on Video Retrieval on LSMDC (using extra training data)

Retrieval Text to Video Retrieval +1

Mutual Modality Learning for Video Action Classification

1 code implementation4 Nov 2020 Stepan Komkov, Maksim Dzabraev, Aleksandr Petiushko

In this paper, we explore the various methods to embed the ensemble power into a single model.

Ranked #47 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.