no code implementations • 28 Aug 2021 • Ruotian Luo
In Chapter 3, we focus on generating the referring expression, a text description for an object in the image so that a receiver can infer which object is being described.
1 code implementation • 29 May 2020 • Ruotian Luo, Greg Shakhnarovich
We develop and evaluate captioning models that allow control of caption length.
no code implementations • CVPR 2020 • Haochen Wang, Ruotian Luo, Michael Maire, Greg Shakhnarovich
The core of our approach, Pixel Consensus Voting, is a framework for instance segmentation based on the Generalized Hough transform.
Ranked #36 on Panoptic Segmentation on COCO test-dev
no code implementations • 27 Mar 2020 • Davis Gilton, Ruotian Luo, Rebecca Willett, Greg Shakhnarovich
This paper presents a framework for the analysis of changes in visual streams: ordered sequences of images, possibly separated by significant time gaps.
1 code implementation • 22 Mar 2020 • Ruotian Luo
In this work, we present a simple yet better variant of Self-Critical Sequence Training.
Ranked #24 on Image Captioning on COCO Captions
2 code implementations • 27 Feb 2020 • Ruotian Luo, Gregory Shakhnarovich
We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions.
2 code implementations • 1 Aug 2019 • Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, Gregory Shakhnarovich
We introduce DIODE, a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements.
1 code implementation • 19 Apr 2019 • Ruotian Luo, Ning Zhang, Bohyung Han, Linjie Yang
We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context.
1 code implementation • CVPR 2018 • Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich
One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them.
no code implementations • CVPR 2017 • Ruotian Luo, Gregory Shakhnarovich
Second, we use the comprehension module in a generate-and-rerank pipeline, which chooses from candidate expressions generated by a model according to their performance on the comprehension task.