no code implementations • 14 Nov 2023 • Yating Xu, Conghui Hu, Gim Hee Lee
Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context.
1 code implementation • 20 Sep 2023 • Yating Xu, Na Zhao, Gim Hee Lee
Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples.
1 code implementation • ICCV 2023 • Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee
Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes.
no code implementations • 9 Dec 2022 • Yating Xu, Conghui Hu, Gim Hee Lee
The existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame.