Search Results for author: Muhammad Maaz

Found 14 papers, 13 papers with code

PALO: A Polyglot Large Multimodal Model for 5B People

1 code implementation • 22 Feb 2024 • Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan

PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).

Language Modelling Large Language Model +1

Paper
Code

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

1 code implementation • 22 Nov 2023 • Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan

Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.

Benchmarking Phrase Grounding +4

199

Paper
Code

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation • 6 Nov 2023 • Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

576

Paper
Code

On Orderings of Probability Vectors and Unsupervised Performance Estimation

1 code implementation • 16 Jun 2023 • Muhammad Maaz, Rui Qiao, Yiheng Zhou, Renxian Zhang

We conduct numerous experiments on well-known NLP data sets and rigorously explore the performance of different score functions.

Binary Classification

Paper
Code

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

1 code implementation • 8 Jun 2023 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data.

Ranked #6 on Zero-Shot Video Question Answer on TGIF-QA

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +5

898

Paper
Code

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

328

Paper
Code

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.

Image Segmentation Medical Image Segmentation +2

280

Paper
Code

Fine-tuned CLIP Models are Efficient Video Learners

1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.

211

Paper
Code

MaPLe: Multi-modal Prompt Learning

2 code implementations • CVPR 2023 • Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.

Ranked #2 on Prompt Engineering on ImageNet-A

Prompt Engineering

517

Paper
Code

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

1 code implementation • 7 Jul 2022 • Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan

Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.

Ranked #1 on Open Vocabulary Object Detection on OpenImages-v4

Object Open Vocabulary Attribute Detection +1

295

Paper
Code

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

7 code implementations • 21 Jun 2022 • Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan

Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.

Ranked #29 on Semantic Segmentation on PASCAL VOC 2012 test

Image Classification Object Detection +1

29,789

Paper
Code

Class-agnostic Object Detection with Multi-modal Transformer

1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

This has been a long-standing question in computer vision.

Ranked #1 on Open World Object Detection on COCO 2017 (Outdoor, Accessories, Appliance, Truck)

Class-agnostic Object Detection Object +3

295

Paper
Code

Self-Supervised Learning for Fine-Grained Visual Categorization

1 code implementation • 18 May 2021 • Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam

The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts.

Fine-Grained Visual Categorization Representation Learning +1

Paper
Code

Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper

no code implementations • 22 Aug 2019 • Muhammad Maaz

This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews.

BIG-bench Machine Learning General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.