Search Results for author: Lichao Zhang

Found 15 papers, 7 papers with code

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

no code implementations • 19 Nov 2023 • JIA YU, Lichao Zhang, Zijie Chen, Fayu Pan, Miaomiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan

Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models.

Image Generation

Paper
Add Code

Efficient Human-AI Coordination via Preparatory Language-based Convention

no code implementations • 1 Nov 2023 • Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu

Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.

Language Modelling Large Language Model

Paper
Add Code

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

1 code implementation • 12 Oct 2023 • Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan

Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users.

Text-to-Image Generation

Paper
Code

DisCover: Disentangled Music Representation Learning for Cover Song Identification

no code implementations • 19 Jul 2023 • Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, RuiQi Li, Lichao Zhang, Fei Wu

We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning.

Blocking Cover song identification +3

Paper
Add Code

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

no code implementations • 24 May 2023 • Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.

Speech-to-Speech Translation Translation

Paper
Add Code

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

no code implementations • 8 May 2023 • RuiQi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation.

STS Voice Conversion

Paper
Add Code

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

1 code implementation • 7 May 2023 • Lei Kang, Lichao Zhang, Dazhi Jiang

Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions.

Speech Emotion Recognition

Paper
Code

M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus

1 code implementation • NIPS 2022 • Lichao Zhang, RuiQi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, Jinzheng He, Rongjie Huang, Jieming Zhu, Xiao Chen, Zhou Zhao

The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS).

Music Transcription Singing Voice Synthesis +1

165

Paper
Code

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

1 code implementation • 25 May 2022 • Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao

Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e. g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism.

Representation Learning Speech Synthesis +2

157

Paper
Code

A Chinese Multi-type Complex Questions Answering Dataset over Wikidata

no code implementations • 11 Nov 2021 • Jianyun Zou, Min Yang, Lichao Zhang, Yechen Xu, Qifan Pan, Fengqing Jiang, Ran Qin, Shushu Wang, Yifan He, Songfang Huang, Zhou Zhao

We finally analyze the performance of SOTA KBQA models on this dataset and identify the challenges facing Chinese KBQA.

Knowledge Base Question Answering Semantic Parsing +2

Paper
Add Code

Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking

1 code implementation • 31 Jul 2021 • Jingxian Sun, Lichao Zhang, Yufei zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang

To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data.

Transfer Learning

Paper
Code

Multi-Modal Fusion for End-to-End RGB-T Tracking

1 code implementation • 30 Aug 2019 • Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan

Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities.

Ranked #7 on Rgb-T Tracking on RGBT210

Image-to-Image Translation Rgb-T Tracking

Paper
Code

Learning the Model Update for Siamese Trackers

1 code implementation • ICCV 2019 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time.

Visual Tracking

134

Paper
Code

Synthetic data generation for end-to-end thermal infrared tracking

no code implementations • 4 Jun 2018 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking.

Image-to-Image Translation Synthetic Data Generation +2

Paper
Add Code

Ensembles of Generative Adversarial Networks

no code implementations • 3 Dec 2016 • Yaxing Wang, Lichao Zhang, Joost Van de Weijer

The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.