Search Results for author: Kunyu Shi

Found 4 papers, 2 papers with code

Enhancing Vision-Language Pre-training with Rich Supervisions

no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Table Detection

Paper
Add Code

Non-autoregressive Sequence-to-Sequence Vision-Language Models

no code implementations • 4 Mar 2024 • Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto

Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions.

Decoder Language Modelling

Paper
Add Code

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

1 code implementation • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer.

Language Modelling

Paper
Code

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Ranked #22 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.