no code implementations • MMMPIE (COLING) 2022 • Anton Razzhigaev, Anton Voronov, Andrey Kaznacheev, Andrey Kuznetsov, Denis Dimitrov, Alexander Panchenko
Pixel-level autoregression with Transformer models (Image GPT or iGPT) is one of the recent approaches to image generation that has not received massive attention and elaboration due to quadratic complexity of attention as it imposes huge memory requirements and thus restricts the resolution of the generated images.
2 code implementations • 9 Apr 2024 • Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality.
Ranked #39 on Visual Question Answering on MM-Vet
1 code implementation • 6 Dec 2023 • Vladimir Arkhipkin, Andrei Filatov, Viacheslav Vasilev, Anastasia Maltseva, Said Azizov, Igor Pavlov, Julia Agafonova, Andrey Kuznetsov, Denis Dimitrov
We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others.
1 code implementation • 22 Nov 2023 • Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov
The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth.
no code implementations • 10 Nov 2023 • Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders.
1 code implementation • 5 Oct 2023 • Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov
Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures.
Ranked #22 on Text-to-Image Generation on MS COCO
1 code implementation • Computers and Geosciences 2023 • Sergey Nesteruk, Julia Agafonova, Igor Pavlov, Maxim Gerasimov, Nikolay Latyshev, Denis Dimitrov, Andrey Kuznetsov, Artur Kadurin, Pavel Plechov
On the contrary, in a raw sample, the target mineral can appear in the form of thinly represented inclusions.
1 code implementation • 7 Jun 2023 • Anastasia Martynova, Mikhail Kuznetsov, Vadim Porvatov, Vladislav Tishin, Andrey Kuznetsov, Natalia Semenova, Ksenia Kuznetsova
Parking guidance systems have recently become a popular trend as a part of the smart cities' paradigm of development.
Ranked #1 on Parking Space Occupancy on PKLot (F1-score metric)
1 code implementation • 29 Mar 2023 • Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov
In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild.
1 code implementation • 22 Feb 2022 • Alex Shonenkov, Andrey Kuznetsov, Denis Dimitrov, Tatyana Shavrina, Daniil Chesakov, Anastasia Maltseva, Alena Fenogenova, Igor Pavlov, Anton Emelyanov, Sergey Markov, Daria Bakshandaeva, Vera Shybaeva, Andrey Chertok
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs.
no code implementations • 7 Feb 2022 • Daniil Chesakov, Anastasia Maltseva, Alexander Groshev, Andrey Kuznetsov, Denis Dimitrov
Deep fake technology became a hot field of research in the last few years.
1 code implementation • 22 Nov 2021 • Daria Bakshandaeva, Denis Dimitrov, Vladimir Arkhipkin, Alex Shonenkov, Mark Potanin, Denis Karachev, Andrey Kuznetsov, Anton Voronov, Vera Davydova, Elena Tutubalina, Aleksandr Petiushko
Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language.
2 code implementations • 18 Apr 2020 • Alina Belko, Konstantin Dobratulin, Andrey Kuznetsov
This paper introduces a novel dataset FeatherV1, containing 28, 272 images of feathers categorized by 595 bird species.
Fine-Grained Visual Categorization Fine-Grained Visual Recognition