Search Results for author: Kanchana Ranasinghe

Found 15 papers, 10 papers with code

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

no code implementations11 Apr 2024 Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).

Descriptive Hallucination +2

Understanding Long Videos in One Multimodal Language Model Pass

1 code implementation25 Mar 2024 Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo

In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.

Fine-grained Action Recognition Language Modelling +3

Language Repository for Long Video Understanding

1 code implementation21 Mar 2024 Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo

In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.

Video Understanding Visual Question Answering +1

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation21 Mar 2024 Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Diffusion Illusions: Hiding Images in Plain Sight

no code implementations6 Dec 2023 Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, Michael S. Ryoo

We explore the problem of computationally generating special `prime' images that produce optical illusions when physically arranged and viewed in a certain way.

Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors

1 code implementation23 Nov 2022 Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo

In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training.

Segmentation Unsupervised Semantic Segmentation

Self-supervised Video Transformer

1 code implementation CVPR 2022 Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Action Classification Action Recognition In Videos

On Improving Adversarial Transferability of Vision Transformers

3 code implementations ICLR 2022 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

Intriguing Properties of Vision Transformers

1 code implementation NeurIPS 2021 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

Orthogonal Projection Loss

1 code implementation ICCV 2021 Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

Conditional Generative Modeling via Learning the Latent Space

no code implementations ICLR 2021 Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.

Extending Multi-Object Tracking systems to better exploit appearance and 3D information

no code implementations25 Dec 2019 Kanchana Ranasinghe, Sahan Liyanaarachchi, Harsha Ranasinghe, Mayuka Jayawardhana

Tracking multiple objects in real time is essential for a variety of real-world applications, with self-driving industry being at the foremost.

Object Real-Time Multi-Object Tracking

Bipartite Conditional Random Fields for Panoptic Segmentation

1 code implementation11 Dec 2019 Sadeep Jayasumana, Kanchana Ranasinghe, Mayuka Jayawardhana, Sahan Liyanaarachchi, Harsha Ranasinghe

To tackle this problem, we propose a CRF model, named Bipartite CRF or BCRF, with two types of random variables for semantic and instance labels.

Panoptic Segmentation Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.