Search Results for author: Zhiyuan Zhao

Found 17 papers, 8 papers with code

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

1 code implementation28 Nov 2023 Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images.

Hallucination

Performative Time-Series Forecasting

1 code implementation9 Oct 2023 Zhiyuan Zhao, Alexander Rodriguez, B. Aditya Prakash

Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years.

Time Series Time Series Forecasting

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

1 code implementation25 Aug 2023 Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.

Benchmarking

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

1 code implementation21 Jul 2023 Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash

Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs).

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

no code implementations24 Oct 2022 Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.

Speech Enhancement

Exploring Effective Knowledge Transfer for Few-shot Object Detection

1 code implementation5 Oct 2022 Zhiyuan Zhao, Qingjie Liu, Yunhong Wang

For the high-shot regime, we propose to use the knowledge learned from ImageNet as guidance for the feature learning in the fine-tuning stage, which will implicitly align the distributions of the novel classes.

Few-Shot Object Detection Object +2

An Anchor-Free Detector for Continuous Speech Keyword Spotting

no code implementations9 Aug 2022 Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.

Keyword Spotting object-detection +1

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

no code implementations28 Jun 2022 Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo

In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.

Sentence

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

1 code implementation12 Sep 2021 Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.

Voice Conversion

Beating the Standard Quantum Limit under Ambient Conditions with Solid-State Spins

no code implementations28 Jan 2021 Tianyu Xie, Zhiyuan Zhao, Xi Kong, Wenchao Ma, Mengqi Wang, Xiangyu Ye, Pei Yu, Zhiping Yang, Shaoyi Xu, Pengfei Wang, Ya Wang, Fazhan Shi, Jiangfeng Du

However, it has not been realized in solid-state spin systems at ambient conditions, owing to its intrinsic complexity for the preparation and survival of pure and entangled quantum states.

Quantum Physics

A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View

no code implementations29 Sep 2020 Zhiyuan Zhao, Tao Han, Junyu. Gao, Qi. Wang, Xuelong. Li

Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks.

Crowd Counting Density Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.