1 code implementation • 7 May 2024 • Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens
Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features.
1 code implementation • 4 Apr 2024 • Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong
To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene.
1 code implementation • 3 Mar 2024 • Boyang Wang, Fengyu Yang, Xihang Yu, Chao Zhang, Hanbin Zhao
In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.
no code implementations • 31 Jan 2024 • Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong
We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound.
1 code implementation • 2 Nov 2023 • Boyang Wang, Bowen Liu, Shiyu Liu, Fengyu Yang
In this work, we for the first time, present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task.
no code implementations • ICCV 2023 • Fengyu Yang, Jiacheng Zhang, Andrew Owens
An emerging line of work has sought to generate plausible imagery from touch.
1 code implementation • 10 Sep 2023 • Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang
To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.
1 code implementation • CVPR 2023 • Shaokai Wu, Fengyu Yang
Detection-based methods have been viewed unfavorably in crowd analysis due to their poor performance in dense crowds.
1 code implementation • 23 Aug 2023 • Siyue Yao, MingJie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang
In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users.
1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)
Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection
no code implementations • 7 Dec 2022 • Fengyu Yang, Jian Luan, Yujun Wang
We introduce phonology embedding to capture the English differences between different phonology.
no code implementations • 22 Nov 2022 • Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens
The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world.
no code implementations • 16 Mar 2022 • Hanbin Zhao, Fengyu Yang, Xinghe Fu, Xi Li
In practice, new images are usually made available in a consecutive manner, leading to a problem called Continual Semantic Segmentation (CSS).
no code implementations • CVPR 2022 • Fengyu Yang, Chenyang Ma
In particular, to enhance the sparsity of the latent space, we design a prototypical contrastive learning to have prototypes of the same category clustering together and prototypes of different categories to be far away from each other.
no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li
Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.