Search Results for author: Jingdong Chen

Found 24 papers, 8 papers with code

Variational Connectionist Temporal Classification

no code implementations • ECCV 2020 • Linlin Chao, Jingdong Chen, Wei Chu

However, CTC tends to output spiky distributions since it prefers to output blank symbol most of the time.

Classification General Classification +2

Paper
Add Code

Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

no code implementations • 6 May 2024 • Yingying Zhang, Chuangji Shi, Xin Guo, Jiangwei Lao, Jian Wang, Jiaotuan Wang, Jingdong Chen

The design of the query is crucial for the performance of DETR and its variants.

Paper
Add Code

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

1 code implementation • 27 Feb 2024 • ZiCheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang

Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.

Paper
Code

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

1 code implementation • 29 Jan 2024 • Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.

Ranked #1 on Zero-shot Image Retrieval on Flickr30k-CN (using extra training data)

Zero-Shot Cross-Modal Retrieval Zero-shot Image Retrieval +3

Paper
Code

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

no code implementations • 15 Dec 2023 • Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li

Prior studies on Remote Sensing Foundation Model (RSFM) reveal immense potential towards a generic model for Earth Observation.

Contrastive Learning Earth Observation +1

Paper
Add Code

A computationally efficient semi-blind source separation based approach for nonlinear echo cancellation based on an element-wise iterative source steering

no code implementations • 14 Dec 2023 • Kunxing Lu, Xianrui Wang, Tetsuya Ueda, Shoji Makino, Jingdong Chen

While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and nonlinear distortions.

Acoustic echo cancellation blind source separation

Paper
Add Code

Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup

no code implementations • 10 Dec 2023 • Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao

In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy.

Model Compression

Paper
Add Code

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

1 code implementation • 27 Sep 2023 • Weidi Xu, Jingwei Wang, Lele Xie, Jianshan He, Hongting Zhou, Taifeng Wang, Xiaopei Wan, Jingdong Chen, Chao Qu, Wei Chu

Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints.

Variational Inference

Paper
Code

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

Paper
Add Code

Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification

no code implementations • 8 Sep 2023 • Yiqian Yang, Zhengqiao Zhao, Qian Wang, Yan Yang, Jingdong Chen

Existing approaches to modeling associations between visual stimuli and brain responses are facing difficulties in handling between-subject variance and model generalization.

EEG Video Reconstruction

Paper
Add Code

Uncertainty-guided Learning for Improving Image Manipulation Detection

no code implementations • ICCV 2023 • Kaixiang Ji, Feng Chen, Xin Guo, Yadong Xu, Jian Wang, Jingdong Chen

Image manipulation detection (IMD) is of vital importance as faking images and spreading misinformation can be malicious and harm our daily life.

Image Manipulation Image Manipulation Detection +1

Paper
Add Code

Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation

no code implementations • CVPR 2023 • Jiangwei Lao, Weixiang Hong, Xin Guo, Yingying Zhang, Jian Wang, Jingdong Chen, Wei Chu

In this work, we propose a novel feature enhancement network to simultaneously model short- and long-term temporal correlation.

Pseudo Label Semantic Segmentation +1

Paper
Add Code

Robust Manifold Nonnegative Tucker Factorization for Tensor Data Representation

no code implementations • 8 Nov 2022 • Jianyu Wang, Linruize Tang, Jie Chen, Jingdong Chen

Nonnegative Tucker Factorization (NTF) minimizes the euclidean distance or Kullback-Leibler divergence between the original data and its low-rank approximation which often suffers from grossly corruptions or outliers and the neglect of manifold structures of data.

Paper
Add Code

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization

1 code implementation • CVPR 2022 • Canjie Luo, Lianwen Jin, Jingdong Chen

Motivated by this common sense, we augment one image patch and use its neighboring patch as guidance to recover itself.

Common Sense Reasoning Contrastive Learning +2

Paper
Code

Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

no code implementations • 14 Mar 2022 • Youming Deng, Yansheng Li, Yongjun Zhang, Xiang Xiang, Jian Wang, Jingdong Chen, Jiayi Ma

After the autonomous partition of coarse and fine predicates, the model is first trained on the coarse predicates and then learns the fine predicates.

Graph Generation Scene Graph Generation

Paper
Add Code

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.

Scene Text Recognition

Paper
Code

Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

no code implementations • CVPR 2022 • Weixiang Hong, Jiangwei Lao, Wang Ren, Jian Wang, Jingdong Chen, Wei Chu

Instead of proposing a specific vision transformer based detector, in this work, our goal is to reveal the insights of training vision transformer based detectors from scratch.

object-detection Object Detection +1

Paper
Add Code

CBNet: A Composite Backbone Network Architecture for Object Detection

5 code implementations • 1 Jul 2021 • TingTing Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, Wei Chu, Jingdong Chen, Haibin Ling

With multi-scale testing, we push the current best single model result to a new record of 60. 1% box AP and 52. 3% mask AP without using extra training data.

Ranked #6 on Object Detection on COCO-O (using extra training data)

Instance Segmentation Object +2

12,113

Paper
Code

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

no code implementations • 24 Jun 2021 • Guozhi Tang, Lele Xie, Lianwen Jin, Jiapeng Wang, Jingdong Chen, Zhen Xu, Qianying Wang, Yaqiang Wu, Hui Li

Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics, and simply focuses on the strong relevancy between entities.

Paper
Add Code

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

no code implementations • CVPR 2021 • Weixiang Hong, Qingpei Guo, Wei zhang, Jingdong Chen, Wei Chu

Panoptic segmentation is a challenging task aiming to simultaneously segment objects (things) at instance level and background contents (stuff) at semantic level.

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

1 code implementation • 23 May 2021 • Hao Huang, Yongtao Wang, Zhaoyu Chen, Yuze Zhang, Yuheng Li, Zhi Tang, Wei Chu, Jingdong Chen, Weisi Lin, Kai-Kuang Ma

Then, we design a two-level perturbation fusion strategy to alleviate the conflict between the adversarial watermarks generated by different facial images and models.

Adversarial Attack Face Swapping +1

Paper
Code

Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification

no code implementations • 19 Nov 2019 • Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen

We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance.

Text-Independent Speaker Verification

Paper
Add Code

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

no code implementations • 2 Jan 2019 • Xingjian Du, Mengyao Zhu, Xuan Shi, Xinpeng Zhang, Wen Zhang, Jingdong Chen

The experiments comparing ourCSM based end-to-end model with other methods are conductedto confirm that the CSM accelerate the model training andhave significant improvements in speech quality.

Speech Enhancement

Paper
Add Code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Ranked #1 on Accented Speech Recognition on VoxForge American-Canadian

Accented Speech Recognition Noisy Speech Recognition

76,632

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.