Search Results for author: Shan Yang

Found 29 papers, 8 papers with code

EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

no code implementations • 21 Feb 2024 • Zhendong Xiao, Changhao Chen, Shan Yang, Wu Wei

Camera relocalization is pivotal in computer vision, with applications in AR, drones, robotics, and autonomous driving.

Autonomous Driving Camera Relocalization +3

Paper
Add Code

MLLMReID: Multimodal Large Language Model-based Person Re-identification

no code implementations • 24 Jan 2024 • Shan Yang, Yongfei Zhang

This paper will investigate how to adapt them for the task of ReID.

Language Modelling Large Language Model +1

Paper
Add Code

VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering

no code implementations • 13 Dec 2023 • Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang

Our VLAP model addresses both efficient frame sampling and effective cross-modal alignment in a unified way.

Ranked #1 on Video Question Answering on STAR Benchmark

Language Modelling Question Answering +2

Paper
Add Code

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

no code implementations • 6 Oct 2023 • Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings.

Ranked #10 on Monocular Depth Estimation on NYU-Depth V2

Monocular Depth Estimation Self-Supervised Learning

Paper
Add Code

ICAR: Image-based Complementary Auto Reasoning

no code implementations • 17 Aug 2023 • Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation.

Retrieval

Paper
Add Code

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

no code implementations • 13 Apr 2023 • Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.

Object

Paper
Add Code

Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction

no code implementations • 18 Mar 2023 • Jiayang Bai, Zhen He, Shan Yang, Jie Guo, Zhenyu Chen, Yan Zhang, Yanwen Guo

Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama.

HDR Reconstruction

Paper
Add Code

Aligning Multi-Sequence CMR Towards Fully Automated Myocardial Pathology Segmentation

no code implementations • 7 Feb 2023 • Wangbin Ding, Lei LI, Junyi Qiu, Sihan Wang, Liqin Huang, Yinyin Chen, Shan Yang, Xiahai Zhuang

For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visualize myocardial scar and edema of MI, respectively.

Image Registration

Paper
Add Code

MyoPS-Net: Myocardial Pathology Segmentation with Flexible Combination of Multi-Sequence CMR Images

no code implementations • 6 Nov 2022 • Junyi Qiu, Lei LI, Sihan Wang, Ke Zhang, Yinyin Chen, Shan Yang, Xiahai Zhuang

We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences.

Segmentation

Paper
Add Code

TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

1 code implementation • 11 Aug 2022 • Jakob Wasserthal, Hanns-Christian Breit, Manfred T. Meyer, Maurice Pradella, Daniel Hinck, Alexander W. Sauter, Tobias Heye, Daniel Boll, Joshy Cyriac, Shan Yang, Michael Bach, Martin Segeroth

The model significantly outperformed another publicly available segmentation model on a separate dataset (Dice score, 0. 932 versus 0. 871, respectively).

Segmentation

1,166

Paper
Code

End-to-End Voice Conversion with Information Perturbation

no code implementations • 15 Jun 2022 • Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.

Voice Conversion

Paper
Add Code

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

Paper
Add Code

Deep Graph Learning for Spatially-Varying Indoor Lighting Prediction

no code implementations • 13 Feb 2022 • Jiayang Bai, Jie Guo, Chenchen Wan, Zhenyu Chen, Zhen He, Shan Yang, Piaopiao Yu, Yan Zhang, Yanwen Guo

At its core is a new lighting model (dubbed DSGLight) based on depth-augmented Spherical Gaussians (SG) and a Graph Convolutional Network (GCN) that infers the new lighting representation from a single LDR image of limited field-of-view.

Graph Learning Lighting Estimation

Paper
Add Code

A Color Image Steganography Based on Frequency Sub-band Selection

no code implementations • 29 Dec 2021 • Hai Su, Shan Yang, Shuqing Zhang, Songsen Yu

Color image steganography based on deep learning is the art of hiding information in the color image.

Image Steganography

Paper
Add Code

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

Attention Bottlenecks for Multimodal Fusion

1 code implementation • NeurIPS 2021 • Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.

Ranked #2 on Action Classification on Kinetics-Sounds

Action Classification Action Recognition +2

2,997

Paper
Code

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Lei Xie, Dan Su

Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.

Speech Synthesis

Paper
Add Code

Controllable Context-aware Conversational Speech Synthesis

no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

Paper
Add Code

GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition

no code implementations • CVPR 2021 • Fengmin Shi, Jie Guo, Haonan Zhang, Shan Yang, Xiying Wang, Yanwen Guo

We demonstrate that local geometry has a greater impact on the sound than the global geometry and offers more cues in material recognition.

Material Recognition

Paper
Add Code

Optical Mouse: 3D Mouse Pose From Single-View Video

no code implementations • 17 Jun 2021 • Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, Yi Liu

We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.

Paper
Add Code

Entity Concept-enhanced Few-shot Relation Extraction

1 code implementation • ACL 2021 • Shan Yang, Yongfei Zhang, Guanglin Niu, Qinghua Zhao, ShiLiang Pu

Few-shot relation extraction (FSRE) is of great importance in long-tail distribution problem, especially in special domain with low-resource data.

Relation Relation Extraction +5

Paper
Code

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++

1 code implementation • ICCV 2021 • RuiLong Li, Shan Yang, David A. Ross, Angjoo Kanazawa

We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music.

Ranked #2 on Motion Synthesis on BRACE

Motion Synthesis Pose Estimation

486

Paper
Code

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

122

Paper
Code

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

9 code implementations • Interspeech2020 2020 • Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie

In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.

Sound Audio and Speech Processing

29,277

Paper
Code

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

no code implementations • 28 Apr 2020 • Shan Yang, Yuxuan Wang, Lei Xie

As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.

Clustering Data Augmentation +5

Paper
Add Code

Learning-Based Cloth Material Recovery From Video

no code implementations • ICCV 2017 • Shan Yang, Junbang Liang, Ming C. Lin

To extract information about the cloth, our method characterizes both the motion space and the visual appearance of the cloth geometry.

Paper
Add Code

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

4 code implementations • 6 Jul 2017 • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).

Sound

514

Paper
Code

Detailed Garment Recovery from a Single-View Image

no code implementations • 3 Aug 2016 • Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara Berg, Ming C. Lin

Most recent garment capturing techniques rely on acquiring multiple views of clothing, which may not always be readily available, especially in the case of pre-existing photographs from the web.

Semantic Parsing Virtual Try-on

Paper
Add Code

Modeling Context in Referring Expressions

4 code implementations • 31 Jul 2016 • Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg

Humans refer to objects in their environments all the time, especially in dialogue with other people.

Referring Expression Referring expression generation +1

395

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.