Search Results for author: Shan Yang

Found 29 papers, 8 papers with code

EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

no code implementations21 Feb 2024 Zhendong Xiao, Changhao Chen, Shan Yang, Wu Wei

Camera relocalization is pivotal in computer vision, with applications in AR, drones, robotics, and autonomous driving.

Autonomous Driving Camera Relocalization +3

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

no code implementations6 Oct 2023 Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings.

Monocular Depth Estimation Self-Supervised Learning

ICAR: Image-based Complementary Auto Reasoning

no code implementations17 Aug 2023 Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation.

Retrieval

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

no code implementations13 Apr 2023 Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.

Object

Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction

no code implementations18 Mar 2023 Jiayang Bai, Zhen He, Shan Yang, Jie Guo, Zhenyu Chen, Yan Zhang, Yanwen Guo

Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama.

HDR Reconstruction

Aligning Multi-Sequence CMR Towards Fully Automated Myocardial Pathology Segmentation

no code implementations7 Feb 2023 Wangbin Ding, Lei LI, Junyi Qiu, Sihan Wang, Liqin Huang, Yinyin Chen, Shan Yang, Xiahai Zhuang

For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visualize myocardial scar and edema of MI, respectively.

Image Registration

MyoPS-Net: Myocardial Pathology Segmentation with Flexible Combination of Multi-Sequence CMR Images

no code implementations6 Nov 2022 Junyi Qiu, Lei LI, Sihan Wang, Ke Zhang, Yinyin Chen, Shan Yang, Xiahai Zhuang

We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences.

Segmentation

TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

1 code implementation11 Aug 2022 Jakob Wasserthal, Hanns-Christian Breit, Manfred T. Meyer, Maurice Pradella, Daniel Hinck, Alexander W. Sauter, Tobias Heye, Daniel Boll, Joshy Cyriac, Shan Yang, Michael Bach, Martin Segeroth

The model significantly outperformed another publicly available segmentation model on a separate dataset (Dice score, 0. 932 versus 0. 871, respectively).

Segmentation

End-to-End Voice Conversion with Information Perturbation

no code implementations15 Jun 2022 Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.

Voice Conversion

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations18 Feb 2022 Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

Deep Graph Learning for Spatially-Varying Indoor Lighting Prediction

no code implementations13 Feb 2022 Jiayang Bai, Jie Guo, Chenchen Wan, Zhenyu Chen, Zhen He, Shan Yang, Piaopiao Yu, Yan Zhang, Yanwen Guo

At its core is a new lighting model (dubbed DSGLight) based on depth-augmented Spherical Gaussians (SG) and a Graph Convolutional Network (GCN) that infers the new lighting representation from a single LDR image of limited field-of-view.

Graph Learning Lighting Estimation

A Color Image Steganography Based on Frequency Sub-band Selection

no code implementations29 Dec 2021 Hai Su, Shan Yang, Shuqing Zhang, Songsen Yu

Color image steganography based on deep learning is the art of hiding information in the color image.

Image Steganography

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations8 Sep 2021 Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Sentence +1

Attention Bottlenecks for Multimodal Fusion

1 code implementation NeurIPS 2021 Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.

Action Classification Action Recognition +2

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Lei Xie, Dan Su

Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.

Speech Synthesis

Controllable Context-aware Conversational Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition

no code implementations CVPR 2021 Fengmin Shi, Jie Guo, Haonan Zhang, Shan Yang, Xiying Wang, Yanwen Guo

We demonstrate that local geometry has a greater impact on the sound than the global geometry and offers more cues in material recognition.

Material Recognition

Optical Mouse: 3D Mouse Pose From Single-View Video

no code implementations17 Jun 2021 Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, Yi Liu

We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.

Entity Concept-enhanced Few-shot Relation Extraction

1 code implementation ACL 2021 Shan Yang, Yongfei Zhang, Guanglin Niu, Qinghua Zhao, ShiLiang Pu

Few-shot relation extraction (FSRE) is of great importance in long-tail distribution problem, especially in special domain with low-resource data.

Relation Relation Extraction +5

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++

1 code implementation ICCV 2021 RuiLong Li, Shan Yang, David A. Ross, Angjoo Kanazawa

We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music.

Motion Synthesis Pose Estimation

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

9 code implementations Interspeech2020 2020 Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie

In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.

Sound Audio and Speech Processing

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

no code implementations28 Apr 2020 Shan Yang, Yuxuan Wang, Lei Xie

As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.

Clustering Data Augmentation +5

Learning-Based Cloth Material Recovery From Video

no code implementations ICCV 2017 Shan Yang, Junbang Liang, Ming C. Lin

To extract information about the cloth, our method characterizes both the motion space and the visual appearance of the cloth geometry.

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

4 code implementations6 Jul 2017 Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).

Sound

Detailed Garment Recovery from a Single-View Image

no code implementations3 Aug 2016 Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara Berg, Ming C. Lin

Most recent garment capturing techniques rely on acquiring multiple views of clothing, which may not always be readily available, especially in the case of pre-existing photographs from the web.

Semantic Parsing Virtual Try-on

Cannot find the paper you are looking for? You can Submit a new open access paper.