Search Results for author: Mustafa Shukor

Found 19 papers, 10 papers with code

What Makes Multimodal In-Context Learning Work?

1 code implementation24 Apr 2024 Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples.

In-Context Learning

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

no code implementations29 Mar 2024 Barbara Toniella Corradini, Mustafa Shukor, Paul Couairon, Guillaume Couairon, Franco Scarselli, Matthieu Cord

The pipeline is as follows: the image is passed to both a captioner model (i. e. BLIP) and a diffusion model (i. e., Stable Diffusion Model) to generate a text description and visual representation, respectively.

Image Generation Image Segmentation +3

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

no code implementations20 Mar 2024 Théophane Vallaeys, Mustafa Shukor, Matthieu Cord, Jakob Verbeek

The abilities of large language models (LLMs) have recently progressed to unprecedented levels, paving the way to novel applications in a wide variety of areas.

Audio captioning Image Captioning +2

Zero-Shot Refinement of Buildings' Segmentation Models using SAM

1 code implementation3 Oct 2023 Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor, Ali J. Ghandour

For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback.

Image Segmentation Instance Segmentation +2

Extending CAM-based XAI methods for Remote Sensing Imagery Segmentation

1 code implementation3 Oct 2023 Abdul Karim Gizzini, Mustafa Shukor, Ali J. Ghandour

This paper offers to bridge this gap by adapting the recent XAI classification algorithms and making them usable for muti-class image segmentation, where we mainly focus on buildings' segmentation from high-resolution satellite images.

Decision Making Explainable artificial intelligence +5

Empirical Study of PEFT techniques for Winter Wheat Segmentation

2 code implementations3 Oct 2023 Mohamad Hasan Zahweh, Hasan Nasrallah, Mustafa Shukor, Ghaleb Faour, Ali J. Ghandour

This study seeks to bridge this gap by comprehensively exploring the feasibility of cross-area and cross-year out-of-distribution generalization using the State-of-the-Art (SOTA) wheat crop monitoring model.

Out-of-Distribution Generalization

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

1 code implementation1 Oct 2023 Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL.

In-Context Learning Instruction Following +1

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

1 code implementation30 Jul 2023 Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord

Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.

Out-of-Distribution Generalization

eP-ALM: Efficient Perceptual Augmentation of Language Models

1 code implementation ICCV 2023 Mustafa Shukor, Corentin Dancette, Matthieu Cord

In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.

In-Context Learning Visual Question Answering (VQA)

Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval

1 code implementation8 Dec 2022 Mustafa Shukor, Nicolas Thome, Matthieu Cord

Finally, we validate the generalization of the approach to other tasks (i. e, Food Recognition) and domains with structured text such as the Medical domain on the ROCO dataset.

Cross-Modal Retrieval Food Recognition +1

Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment

1 code implementation29 Aug 2022 Mustafa Shukor, Guillaume Couairon, Matthieu Cord

Vision and Language Pretraining has become the prevalent approach for tackling multimodal downstream tasks.

Retrieval Text Retrieval +4

Video Coding Using Learned Latent GAN Compression

no code implementations9 Jul 2022 Mustafa Shukor, Bharath Bhushan Damodaran, Xu Yao, Pierre Hellier

We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression.

Video Compression

Semantic Unfolding of StyleGAN Latent Space

no code implementations29 Jun 2022 Mustafa Shukor, Xu Yao, Bharath Bushan Damodaran, Pierre Hellier

Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to an input real image.

Attribute Disentanglement +1

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

1 code implementation20 Apr 2022 Mustafa Shukor, Guillaume Couairon, Asya Grechka, Matthieu Cord

We propose a new retrieval framework, T-Food (Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval) that exploits the interaction between modalities in a novel regularization scheme, while using only unimodal encoders at test time for efficient retrieval.

Cross-Modal Retrieval Retrieval

Buildings Classification using Very High Resolution Satellite Imagery

no code implementations29 Nov 2021 Mohammad Dimassi, Abed Ellatif Samhat, Mohammad Zaraket, Jamal Haidar, Mustafa Shukor, Ali J. Ghandour

Buildings classification using satellite images is becoming more important for several applications such as damage assessment, resource allocation, and population estimation.

Classification Semantic Segmentation +2

Sci-Net: Scale Invariant Model for Buildings Segmentation from Aerial Imagery

no code implementations12 Nov 2021 Hasan Nasrallah, Mustafa Shukor, Ali J. Ghandour

Buildings' segmentation is a fundamental task in the field of earth observation and aerial imagery analysis.

Earth Observation Segmentation

Learning Perceptual Compression of Facial Video

no code implementations29 Sep 2021 Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier

We leverage the generative capacity of GANs such as StyleGAN to represent and compress each video frame (intra compression), as well as the successive differences between frames (inter compression).

Video Compression

Semantic and Geometric Unfolding of StyleGAN Latent Space

no code implementations9 Jul 2021 Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier

Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to a natural image.

Attribute Disentanglement +1

Synthetic training data generation for deep learning based quality inspection

no code implementations7 Apr 2021 Pierre Gutierrez, Maria Luschkova, Antoine Cordier, Mustafa Shukor, Mona Schappert, Tim Dahmen

In order to detect defects, supervised learning is often utilized, but necessitates a large amount of annotated images, which can be costly: collecting, cleaning, and annotating the data is tedious and limits the speed at which a system can be deployed as everything the system must detect needs to be observed first.

Defect Detection Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.