Search Results for author: Jia Jia

Found 31 papers, 7 papers with code

The Sogou-TIIC Speech Translation System for IWSLT 2018

no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia

Our final average result on speech translation is 31. 02 BLEU.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

1 code implementation • 20 Mar 2024 • Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data.

Paper
Code

Siamese Meets Diffusion Network: SMDNet for Enhanced Change Detection in High-Resolution RS Imagery

no code implementations • 17 Jan 2024 • Jia Jia, Geunho Lee, Zhibo Wang, Lyu Zhi, Yuchu He

This network combines the Siam-U2Net Feature Differential Encoder (SU-FDE) and the denoising diffusion implicit model to improve the accuracy of image edge change detection and enhance the model's robustness under environmental changes.

Change Detection Denoising

Paper
Add Code

Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos

no code implementations • 28 Dec 2023 • Houlun Chen, Xin Wang, Hong Chen, Zihan Song, Jia Jia, Wenwu Zhu

To tackle these challenges, in this work we propose a Grounding-Prompter method, which is capable of conducting TSG in long videos through prompting LLM with multimodal information.

Denoising In-Context Learning +3

Paper
Add Code

A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

no code implementations • 21 Sep 2023 • Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang

More interestingly, although we aim at the synthesis effect of the style transfer model, the synthesized speech by the proposed text prosodic analysis model is even better than the style transfer from the original speech in some user evaluation indicators.

Emotion Recognition Speech Synthesis +1

Paper
Add Code

Semantics2Hands: Transferring Hand Motion Semantics between Avatars

1 code implementation • 11 Aug 2023 • Zijie Ye, Jia Jia, Junliang Xing

Human hands, the primary means of non-verbal communication, convey intricate semantics in various scenarios.

Anatomy motion retargeting

Paper
Code

Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space

1 code implementation • 11 Aug 2023 • Haoyu Wang, Haozhe Wu, Junliang Xing, Jia Jia

Creating realistic 3D facial animation is crucial for various applications in the movie production and gaming industry, especially with the burgeoning demand in the metaverse.

motion retargeting Optical Flow Estimation

Paper
Code

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

1 code implementation • 10 Aug 2023 • Haozhe Wu, Songtao Zhou, Jia Jia, Junliang Xing, Qi Wen, Xiang Wen

This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation.

3D Face Animation

Paper
Code

Exploring the Spatiotemporal Features of Online Food Recommendation Service

no code implementations • 8 Aug 2023 • Shaochuan Lin, Jiayan Pei, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu

Online Food Recommendation Service (OFRS) has remarkable spatiotemporal characteristics and the advantage of being able to conveniently satisfy users' needs in a timely manner.

Food recommendation

Paper
Add Code

Multi-Granularity Attention Model for Group Recommendation

no code implementations • 8 Aug 2023 • Jianye Ji, Jiayan Pei, Shaochuan Lin, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu

Group recommendation provides personalized recommendations to a group of users based on their shared interests, preferences, and characteristics.

Paper
Add Code

Mobile Supply: The Last Piece of Jigsaw of Recommender System

no code implementations • 7 Aug 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu

In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.

Recommendation Systems Re-Ranking

Paper
Add Code

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation

no code implementations • 4 Aug 2023 • Shikun Sun, Longhui Wei, Junliang Xing, Jia Jia, Qi Tian

Recent score-based diffusion models (SBDMs) show promising results in unpaired image-to-image translation (I2I).

Image Denoising Image-to-Image Translation

Paper
Add Code

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

no code implementations • 18 Jul 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.

Decision Making Recommendation Systems +1

Paper
Add Code

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations • 13 Jul 2023 • Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

Paper
Add Code

Shuffled Autoregression For Motion Interpolation

no code implementations • 10 Jun 2023 • Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Paper
Add Code

MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation

1 code implementation • 17 Mar 2023 • Haozhe Wu, Jia Jia, Junliang Xing, Hongwei Xu, Xiangyuan Wang, Jelo Wang

Upon MMFace4D, we construct a non-autoregressive framework for audio-driven 3D face animation.

3D Face Animation

Paper
Code

BASM: A Bottom-up Adaptive Spatiotemporal Model for Online Food Ordering Service

no code implementations • 22 Nov 2022 • Boya Du, Shaochuan Lin, Jiong Gao, Xiyu Ji, Mengya Wang, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu

Therefore, we address this challenge by proposing a Bottom-up Adaptive Spatiotemporal Model(BASM) to adaptively fit the spatiotemporal data distribution, which further improve the fitting capability of the model.

Recommendation Systems

Paper
Add Code

Spatiotemporal-Enhanced Network for Click-Through Rate Prediction in Location-based Services

no code implementations • 20 Sep 2022 • Shaochuan Lin, Yicong Yu, Xiyu Ji, Taotao Zhou, Hengxu He, Zisen Sang, Jia Jia, Guodong Cao, Ning Hu

In Location-Based Services(LBS), user behavior naturally has a strong dependence on the spatiotemporal information, i. e., in different geographical locations and at different times, user click behavior will change significantly.

Attribute Click-Through Rate Prediction

Paper
Add Code

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.

Style Transfer

Paper
Add Code

Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis

1 code implementation • 30 Oct 2021 • Haozhe Wu, Jia Jia, Haoyu Wang, Yishun Dou, Chao Duan, Qingshan Deng

Due to such huge differences between different styles, it is necessary to incorporate the talking style into audio-driven talking face synthesis framework.

Face Generation

270

Paper
Code

Towards Multi-Scale Style Control for Expressive Speech Synthesis

no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng

This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.

Expressive Speech Synthesis Style Transfer

Paper
Add Code

ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit

no code implementations • 16 Sep 2020 • Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, Yan-Feng Wang

Meanwhile, human choreographers design dance motions from music in a two-stage manner: they firstly devise multiple choreographic dance units (CAUs), each with a series of dance motions, and then arrange the CAU sequence according to the rhythm, melody and emotion of the music.

Paper
Add Code

Visual-speech Synthesis of Exaggerated Corrective Feedback

no code implementations • 12 Sep 2020 • Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo Lu

To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT).

Speech Synthesis

Paper
Add Code

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Paper
Add Code

Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction

1 code implementation • 17 Nov 2019 • Haozhe Wu, Zhiyuan Hu, Jia Jia, Yaohua Bu, Xiangnan He, Tat-Seng Chua

Next, we define user's attributes as two categories: spatial attributes (e. g., social role of user) and temporal attributes (e. g., post content of user).

Informativeness

Paper
Code

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia

Transformer has shown promising results in many sequence to sequence transformation tasks recently.

speech-recognition Speech Recognition

Paper
Add Code

Automated segmentaiton and classification of arterioles and venules using Cascading Dilated Convolutional Neural Networks

no code implementations • 1 Dec 2018 • Meng Li, Yan Zhang, Haicheng She, Jinqiong Zhou, Jia Jia, Danmei He, Li Zhang

The change of retinal vasculature is an early sign of many vascular and systematic diseases, such as diabetes and hypertension.

General Classification

Paper
Add Code

An Online Attention-based Model for Speech Recognition

no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Exploring RNN-Transducer for Chinese Speech Recognition

no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Modality Attention for End-to-End Audio-visual Speech Recognition

no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia

In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.

Audio-Visual Speech Recognition Robust Speech Recognition +2

Paper
Add Code

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai

Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.

General Classification Speech Emotion Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.