Search Results for author: Yadong Mu

Found 33 papers, 14 papers with code

Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

no code implementations • 25 Apr 2024 • Hongyu Yan, Yadong Mu

To tackle this, we propose an end-to-end model known as the Neural Assembler.

Paper
Add Code

Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

no code implementations • 17 Apr 2024 • Xinghan Wang, Zixi Kang, Yadong Mu

We address these challenges by proposing Text-controlled Motion Mamba (TM-Mamba), a unified model that integrates temporal global context, language query control, and spatial graph topology with only linear memory cost.

Question Answering

Paper
Add Code

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

no code implementations • 7 Feb 2024 • Chenguo Lin, Yadong Mu

We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis.

Benchmarking Decoder +1

Paper
Add Code

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Ranked #64 on Visual Question Answering on MM-Vet

Video Understanding Visual Question Answering

366

Paper
Code

Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion

no code implementations • 14 Sep 2023 • Peiran Xu, Yadong Mu

Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image.

Co-Salient Object Detection Decoder +3

Paper
Add Code

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.

Language Modelling Large Language Model +1

366

Paper
Code

Regularizing Second-Order Influences for Continual Learning

1 code implementation • CVPR 2023 • Zhicheng Sun, Yadong Mu, Gang Hua

Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.

Continual Learning

Paper
Code

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

no code implementations • CVPR 2023 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.

Decoder

Paper
Add Code

Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition

1 code implementation • CVPR 2023 • Xinghan Wang, Xin Xu, Yadong Mu

Besides, we also show that our Koopman pooling framework can be easily extended to one-shot action recognition when combined with Dynamic Mode Decomposition.

Action Recognition Skeleton Based Action Recognition +1

Paper
Code

Video Action Segmentation via Contextually Refined Temporal Keypoints

no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.

Action Segmentation Graph Matching +1

Paper
Add Code

Image Completion with Heterogeneously Filtered Spectral Hints

1 code implementation • 7 Nov 2022 • Xingqian Xu, Shant Navasardyan, Vahram Tadevosyan, Andranik Sargsyan, Yadong Mu, Humphrey Shi

We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i. e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved.

Ranked #1 on Image Inpainting on FFHQ 512 x 512

Image Inpainting

Paper
Code

Patch-based Knowledge Distillation for Lifelong Person Re-Identification

1 code implementation • ACM Multimedia 2022 • Zhicheng Sun, Yadong Mu

The task of lifelong person re-identification aims to match a person across multiple cameras given continuous data streams.

Continual Learning Knowledge Distillation +1

Paper
Code

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

1 code implementation • 27 Sep 2022 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.

Decoder Spatio-Temporal Video Grounding +1

Paper
Code

Learning Sample Importance for Cross-Scenario Video Temporal Grounding

no code implementations • 8 Jan 2022 • Peijun Bao, Yadong Mu

To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship.

Sentence

Paper
Add Code

Complex Video Action Reasoning via Learnable Markov Logic Network

no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu

The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.

Action Recognition Human-Object Interaction Detection +1

Paper
Add Code

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

no code implementations • CVPR 2022 • Hao Jiang, Yadong Mu

To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i. e., video moment localization) equipped with abundant training data.

Video Summarization

Paper
Add Code

Rethinking the Spatial Route Prior in Vision-and-Language Navigation

no code implementations • 12 Oct 2021 • Xinzhe Zhou, Wei Liu, Yadong Mu

In a most information-rich case of knowing environment maps and admitting shortest-path prior, we observe that given an origin-destination node pair, the internal route can be uniquely determined.

Navigate Vision and Language Navigation

Paper
Add Code

Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds

no code implementations • 11 May 2021 • Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu

To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.

Backdoor Attack Denoising

Paper
Add Code

Fast Fourier Convolution

1 code implementation • NeurIPS 2020 • Lu Chi, Borui Jiang, Yadong Mu

FFC is a generic operator that can directly replace vanilla convolutions in a large body of existing networks, without any adjustments and with comparable complexity metrics (e. g., FLOPs).

Action Recognition Keypoint Detection +1

318

Paper
Code

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

125

Paper
Code

Weakly-Supervised Action Localization by Generative Attention Modeling

1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang

By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.

Ranked #8 on Weakly Supervised Action Localization on ActivityNet-1.2

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

136

Paper
Code

Fast Non-Local Neural Networks with Spectral Residual Learning

1 code implementation • MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian

We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.

Pose Estimation Video Classification

Paper
Code

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

27,947

Paper
Code

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

no code implementations • 2 Aug 2019 • Guoqiang Gong, Liangfeng Zheng, Kun Bai, Yadong Mu

Our proposed TSA-Net demonstrates clear and consistent better performances and re-calibrates new state-of-the-art on both benchmarks.

Temporal Action Localization

Paper
Add Code

Two-Stream Video Classification with Cross-Modality Attention

no code implementations • 1 Aug 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian

In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification.

Ranked #32 on Action Recognition on UCF101

Action Classification Action Recognition +5

Paper
Add Code

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees G. M. Snoek

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremen- dous amounts of data.

Ranked #1 on Few-Shot Semantic Segmentation on Pascal5i

Few-Shot Semantic Segmentation One-Shot Learning +1

Paper
Add Code

High-Resolution Representations for Labeling Pixels and Regions

39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang

The proposed approach achieves superior results to existing single-model networks on COCO object detection.

Ranked #7 on Semantic Segmentation on LIP val

Face Alignment Facial Landmark Detection +5

12,128

Paper
Code

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

1 code implementation • 12 Aug 2017 • Lu Chi, Yadong Mu

There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks.

Autonomous Driving object-detection +1

Paper
Code

Deep Hashing: A Joint Approach for Image Signature Learning

no code implementations • 12 Aug 2016 • Yadong Mu, Zhu Liu

In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning.

Deep Hashing Feature Engineering +2

Paper
Add Code

Learning Binary Codes and Binary Weights for Efficient Classification

no code implementations • 14 Mar 2016 • Fumin Shen, Yadong Mu, Wei Liu, Yang Yang, Heng Tao Shen

The optimization alternatively proceeds over the binary classifiers and image hash codes.

Classification General Classification +2

Paper
Add Code

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

no code implementations • 28 Jun 2015 • Yadong Mu, Wei Liu, Wei Fan

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data.

Paper
Add Code

Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

no code implementations • CVPR 2014 • Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang

This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.

Classification General Classification

Paper
Add Code

Distributed Low-rank Subspace Segmentation

no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Clustering Event Detection +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.