Search Results for author: Mustafa Shukor

Found 19 papers, 10 papers with code

What Makes Multimodal In-Context Learning Work?

1 code implementation • 24 Apr 2024 • Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples.

In-Context Learning

Paper
Code

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

no code implementations • 29 Mar 2024 • Barbara Toniella Corradini, Mustafa Shukor, Paul Couairon, Guillaume Couairon, Franco Scarselli, Matthieu Cord

The pipeline is as follows: the image is passed to both a captioner model (i. e. BLIP) and a diffusion model (i. e., Stable Diffusion Model) to generate a text description and visual representation, respectively.

Image Generation Image Segmentation +3

Paper
Add Code

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

no code implementations • 20 Mar 2024 • Théophane Vallaeys, Mustafa Shukor, Matthieu Cord, Jakob Verbeek

The abilities of large language models (LLMs) have recently progressed to unprecedented levels, paving the way to novel applications in a wide variety of areas.

Audio captioning Image Captioning +2

Paper
Add Code

Zero-Shot Refinement of Buildings' Segmentation Models using SAM

1 code implementation • 3 Oct 2023 • Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor, Ali J. Ghandour

For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback.

Image Segmentation Instance Segmentation +2

Paper
Code

Extending CAM-based XAI methods for Remote Sensing Imagery Segmentation

1 code implementation • 3 Oct 2023 • Abdul Karim Gizzini, Mustafa Shukor, Ali J. Ghandour

This paper offers to bridge this gap by adapting the recent XAI classification algorithms and making them usable for muti-class image segmentation, where we mainly focus on buildings' segmentation from high-resolution satellite images.

Decision Making Explainable artificial intelligence +5

Paper
Code

Empirical Study of PEFT techniques for Winter Wheat Segmentation

2 code implementations • 3 Oct 2023 • Mohamad Hasan Zahweh, Hasan Nasrallah, Mustafa Shukor, Ghaleb Faour, Ali J. Ghandour

This study seeks to bridge this gap by comprehensively exploring the feasibility of cross-area and cross-year out-of-distribution generalization using the State-of-the-Art (SOTA) wheat crop monitoring model.

Out-of-Distribution Generalization

Paper
Code

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

1 code implementation • 1 Oct 2023 • Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL.

In-Context Learning Instruction Following +1

Paper
Code

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

1 code implementation • 30 Jul 2023 • Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord

Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.

Out-of-Distribution Generalization

219

Paper
Code

eP-ALM: Efficient Perceptual Augmentation of Language Models

1 code implementation • ICCV 2023 • Mustafa Shukor, Corentin Dancette, Matthieu Cord

In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.

In-Context Learning Visual Question Answering (VQA)

Paper
Code

Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval

1 code implementation • 8 Dec 2022 • Mustafa Shukor, Nicolas Thome, Matthieu Cord

Finally, we validate the generalization of the approach to other tasks (i. e, Food Recognition) and domains with structured text such as the Medical domain on the ROCO dataset.

Ranked #1 on Cross-Modal Retrieval on Recipe1M+

Cross-Modal Retrieval Food Recognition +1

Paper
Code

Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment

1 code implementation • 29 Aug 2022 • Mustafa Shukor, Guillaume Couairon, Matthieu Cord

Vision and Language Pretraining has become the prevalent approach for tackling multimodal downstream tasks.

Retrieval Text Retrieval +4

Paper
Code

Video Coding Using Learned Latent GAN Compression

no code implementations • 9 Jul 2022 • Mustafa Shukor, Bharath Bhushan Damodaran, Xu Yao, Pierre Hellier

We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression.

Video Compression

Paper
Add Code

Semantic Unfolding of StyleGAN Latent Space

no code implementations • 29 Jun 2022 • Mustafa Shukor, Xu Yao, Bharath Bushan Damodaran, Pierre Hellier

Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to an input real image.

Attribute Disentanglement +1

Paper
Add Code

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

1 code implementation • 20 Apr 2022 • Mustafa Shukor, Guillaume Couairon, Asya Grechka, Matthieu Cord

We propose a new retrieval framework, T-Food (Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval) that exploits the interaction between modalities in a novel regularization scheme, while using only unimodal encoders at test time for efficient retrieval.

Ranked #3 on Cross-Modal Retrieval on Recipe1M

Cross-Modal Retrieval Retrieval

Paper
Code

Buildings Classification using Very High Resolution Satellite Imagery

no code implementations • 29 Nov 2021 • Mohammad Dimassi, Abed Ellatif Samhat, Mohammad Zaraket, Jamal Haidar, Mustafa Shukor, Ali J. Ghandour

Buildings classification using satellite images is becoming more important for several applications such as damage assessment, resource allocation, and population estimation.

Classification Semantic Segmentation +2

Paper
Add Code

Sci-Net: Scale Invariant Model for Buildings Segmentation from Aerial Imagery

no code implementations • 12 Nov 2021 • Hasan Nasrallah, Mustafa Shukor, Ali J. Ghandour

Buildings' segmentation is a fundamental task in the field of earth observation and aerial imagery analysis.

Earth Observation Segmentation

Paper
Add Code

Learning Perceptual Compression of Facial Video

no code implementations • 29 Sep 2021 • Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier

We leverage the generative capacity of GANs such as StyleGAN to represent and compress each video frame (intra compression), as well as the successive differences between frames (inter compression).

Video Compression

Paper
Add Code

Semantic and Geometric Unfolding of StyleGAN Latent Space

no code implementations • 9 Jul 2021 • Mustafa Shukor, Xu Yao, Bharath Bhushan Damodaran, Pierre Hellier

Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to a natural image.

Attribute Disentanglement +1

Paper
Add Code

Synthetic training data generation for deep learning based quality inspection

no code implementations • 7 Apr 2021 • Pierre Gutierrez, Maria Luschkova, Antoine Cordier, Mustafa Shukor, Mona Schappert, Tim Dahmen

In order to detect defects, supervised learning is often utilized, but necessitates a large amount of annotated images, which can be costly: collecting, cleaning, and annotating the data is tedious and limits the speed at which a system can be deployed as everything the system must detect needs to be observed first.

Defect Detection Domain Adaptation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.