1 code implementation • 25 Mar 2024 • Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo
In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.
1 code implementation • 21 Mar 2024 • Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo
In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.
no code implementations • 11 Jan 2024 • Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian
Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts.
no code implementations • 5 Apr 2023 • Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo
In this paper, we argue the contrary, that better video-VLMs can be designed by focusing more on augmenting text, rather than visual information.
Ranked #8 on Action Classification on Charades
1 code implementation • CVPR 2023 • Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step.
Ranked #1 on Action Detection on Charades
no code implementations • 28 Oct 2022 • Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo
In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike.
1 code implementation • CVPR 2022 • Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos.
Ranked #2 on Action Detection on TSU
1 code implementation • 26 Nov 2021 • Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua
However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks.
Ranked #3 on Action Detection on Charades
1 code implementation • 26 Nov 2021 • Kumara Kahatapitiya, Michael S. Ryoo
Modeling visual data as tokens (i. e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years.
1 code implementation • 12 Oct 2021 • Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions.
1 code implementation • CVPR 2021 • Kumara Kahatapitiya, Michael S. Ryoo
In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
Ranked #9 on Action Detection on Charades
no code implementations • 24 Jun 2020 • Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo
As opposed to conventional network widening, multi-path architectures restrict the quadratic increment of complexity to a linear scale.
no code implementations • 26 Jul 2019 • Dumindu Tissera, Kumara Kahatapitiya, Rukshan Wijesinghe, Subha Fernando, Ranga Rodrigo
In view of this, networks which can allocate resources according to the context of the input and regulate flow of information across the network are effective.
Ranked #2 on Image Classification on Kuzushiji-MNIST
1 code implementation • 26 Jul 2019 • Kumara Kahatapitiya, Ranga Rodrigo
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks over the years.
1 code implementation • 7 May 2019 • Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo
Occlusion removal is an interesting application of image enhancement, for which, existing work suggests manually-annotated or domain-specific occlusion removal.