Search Results for author: Kumara Kahatapitiya

Found 15 papers, 10 papers with code

Understanding Long Videos in One Multimodal Language Model Pass

1 code implementation • 25 Mar 2024 • Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo

In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.

Ranked #2 on Zero-Shot Video Question Answer on EgoSchema (subset)

Fine-grained Action Recognition Language Modelling +5

Paper
Code

Language Repository for Long Video Understanding

1 code implementation • 21 Mar 2024 • Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo

In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.

Ranked #1 on Zero-Shot Video Question Answer on EgoSchema (subset)

Video Understanding Visual Question Answering +1

Paper
Code

Object-Centric Diffusion for Efficient Video Editing

no code implementations • 11 Jan 2024 • Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts.

Object Video Editing

Paper
Add Code

VicTR: Video-conditioned Text Representations for Activity Recognition

no code implementations • 5 Apr 2023 • Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

In this paper, we argue the contrary, that better video-VLMs can be designed by focusing more on augmenting text, rather than visual information.

Ranked #8 on Action Classification on Charades

Action Classification Activity Recognition +1

Paper
Add Code

Token Turing Machines

1 code implementation • CVPR 2023 • Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step.

Ranked #1 on Action Detection on Charades

Action Detection Activity Detection

3,039

Paper
Code

Grafting Vision Transformers

no code implementations • 28 Oct 2022 • Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo

In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike.

Image Classification Instance Segmentation +3

Paper
Add Code

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

1 code implementation • CVPR 2022 • Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond

Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos.

Ranked #2 on Action Detection on TSU

Action Detection Temporal Action Localization

Paper
Code

Weakly-guided Self-supervised Pretraining for Temporal Activity Detection

1 code implementation • 26 Nov 2021 • Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua

However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks.

Ranked #3 on Action Detection on Charades

Action Detection Activity Detection +2

Paper
Code

SWAT: Spatial Structure Within and Among Tokens

1 code implementation • 26 Nov 2021 • Kumara Kahatapitiya, Michael S. Ryoo

Modeling visual data as tokens (i. e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years.

Paper
Code

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

1 code implementation • 12 Oct 2021 • Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions.

Imitation Learning Inductive Bias +3

Paper
Code

Coarse-Fine Networks for Temporal Activity Detection in Videos

1 code implementation • CVPR 2021 • Kumara Kahatapitiya, Michael S. Ryoo

In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.

Ranked #9 on Action Detection on Charades

Action Detection Activity Detection

Paper
Code

Feature-Dependent Cross-Connections in Multi-Path Neural Networks

no code implementations • 24 Jun 2020 • Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo

As opposed to conventional network widening, multi-path architectures restrict the quadratic increment of complexity to a linear scale.

Paper
Add Code

Context-Aware Multipath Networks

no code implementations • 26 Jul 2019 • Dumindu Tissera, Kumara Kahatapitiya, Rukshan Wijesinghe, Subha Fernando, Ranga Rodrigo

In view of this, networks which can allocate resources according to the context of the input and regulate flow of information across the network are effective.

Ranked #2 on Image Classification on Kuzushiji-MNIST

Image Classification

Paper
Add Code

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction

1 code implementation • 26 Jul 2019 • Kumara Kahatapitiya, Ranga Rodrigo

Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks over the years.

Paper
Code

Context-Aware Automatic Occlusion Removal

1 code implementation • 7 May 2019 • Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo

Occlusion removal is an interesting application of image enhancement, for which, existing work suggests manually-annotated or domain-specific occlusion removal.

Image Enhancement

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.