no code implementations • 20 Mar 2024 • Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
We introduce SynGround, a novel framework that combines data-driven learning and knowledge transfer from various large-scale pretrained models to enhance the visual grounding capabilities of a pretrained vision-and-language model.
no code implementations • 7 Dec 2023 • Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image.
1 code implementation • 28 Nov 2022 • Ruozhen He, Jiaying Lin, Rynson W. H. Lau
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network.
1 code implementation • 28 Jul 2022 • Ruozhen He, Qihua Dong, Jiaying Lin, Rynson W. H. Lau
To achieve this, we first relabel 4, 040 images in existing camouflaged object datasets with scribbles, which takes ~10s to label one image.