no code implementations • 19 Nov 2023 • JIA YU, Lichao Zhang, Zijie Chen, Fayu Pan, Miaomiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan
Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models.
no code implementations • 1 Nov 2023 • Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu
Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.
1 code implementation • 12 Oct 2023 • Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan
Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users.
no code implementations • 19 Jul 2023 • Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, RuiQi Li, Lichao Zhang, Fei Wu
We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning.
no code implementations • 24 May 2023 • Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao
Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.
no code implementations • 8 May 2023 • RuiQi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation.
1 code implementation • 7 May 2023 • Lei Kang, Lichao Zhang, Dazhi Jiang
Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions.
1 code implementation • NIPS 2022 • Lichao Zhang, RuiQi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, Jinzheng He, Rongjie Huang, Jieming Zhu, Xiao Chen, Zhou Zhao
The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS).
1 code implementation • 25 May 2022 • Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao
Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e. g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism.
no code implementations • 11 Nov 2021 • Jianyun Zou, Min Yang, Lichao Zhang, Yechen Xu, Qifan Pan, Fengqing Jiang, Ran Qin, Shushu Wang, Yifan He, Songfang Huang, Zhou Zhao
We finally analyze the performance of SOTA KBQA models on this dataset and identify the challenges facing Chinese KBQA.
1 code implementation • 31 Jul 2021 • Jingxian Sun, Lichao Zhang, Yufei zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang
To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data.
1 code implementation • 30 Aug 2019 • Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan
Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities.
Ranked #7 on Rgb-T Tracking on RGBT210
1 code implementation • ICCV 2019 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan
In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time.
no code implementations • 4 Jun 2018 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan
These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking.
no code implementations • 3 Dec 2016 • Yaxing Wang, Lichao Zhang, Joost Van de Weijer
The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal.