2 code implementations • 8 May 2024 • Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao
In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via "space bonds".
2 code implementations • 13 Dec 2023 • Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao
These tokens capture the object's attributes and spatial relationships with surrounding objects in the 3D scene.
no code implementations • 15 Oct 2023 • Zijian Zhang, Luping Liu, Zhijie Lin, Yichen Zhu, Zhou Zhao
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
1 code implementation • 13 Oct 2023 • Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao
Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.
1 code implementation • 4 Jun 2023 • Luping Liu, Zijian Zhang, Yi Ren, Rongjie Huang, Xiang Yin, Zhou Zhao
Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment.
no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu
Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.
no code implementations • 30 Jan 2023 • Shengmeng Li, Luping Liu, Zenghao Chai, Runnan Li, Xu Tan
Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower error in the estimated noise.
1 code implementation • 30 Jan 2023 • Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
Ranked #11 on Audio Generation on AudioCaps
1 code implementation • 21 Nov 2022 • Luping Liu, Yi Ren, Xize Cheng, Rongjie Huang, Chongxuan Li, Zhou Zhao
In this paper, we introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem.
8 code implementations • ICLR 2022 • Luping Liu, Yi Ren, Zhijie Lin, Zhou Zhao
Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs).
Ranked #11 on Image Generation on CelebA 64x64
no code implementations • 4 Aug 2019 • Yan Wang, Peng Jia, Luping Liu, Jiayong Liu
Next, this paper assesses the performance of the machine learning models based on the frequently used evaluation metrics.