no code implementations • 25 Apr 2024 • Hongyu Yan, Yadong Mu
To tackle this, we propose an end-to-end model known as the Neural Assembler.
no code implementations • 17 Apr 2024 • Xinghan Wang, Zixi Kang, Yadong Mu
We address these challenges by proposing Text-controlled Motion Mamba (TM-Mamba), a unified model that integrates temporal global context, language query control, and spatial graph topology with only linear memory cost.
no code implementations • 7 Feb 2024 • Chenguo Lin, Yadong Mu
We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis.
1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu
In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.
Ranked #64 on Visual Question Answering on MM-Vet
no code implementations • 14 Sep 2023 • Peiran Xu, Yadong Mu
Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image.
1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu
Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.
1 code implementation • CVPR 2023 • Zhicheng Sun, Yadong Mu, Gang Hua
Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.
no code implementations • CVPR 2023 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Extensive experimental results show that, without further fine-tuning, ECLIP surpasses existing methods by a large margin on a broad range of downstream tasks, demonstrating the strong transferability to real-world E-commerce applications.
1 code implementation • CVPR 2023 • Xinghan Wang, Xin Xu, Yadong Mu
Besides, we also show that our Koopman pooling framework can be easily extended to one-shot action recognition when combined with Dynamic Mode Decomposition.
no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu
Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.
1 code implementation • 7 Nov 2022 • Xingqian Xu, Shant Navasardyan, Vahram Tadevosyan, Andranik Sargsyan, Yadong Mu, Humphrey Shi
We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i. e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved.
Ranked #1 on Image Inpainting on FFHQ 512 x 512
1 code implementation • ACM Multimedia 2022 • Zhicheng Sun, Yadong Mu
The task of lifelong person re-identification aims to match a person across multiple cameras given continuous data streams.
1 code implementation • 27 Sep 2022 • Yang Jin, Yongzhi Li, Zehuan Yuan, Yadong Mu
Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression.
no code implementations • 8 Jan 2022 • Peijun Bao, Yadong Mu
To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship.
no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu
The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.
no code implementations • CVPR 2022 • Hao Jiang, Yadong Mu
To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i. e., video moment localization) equipped with abundant training data.
no code implementations • 12 Oct 2021 • Xinzhe Zhou, Wei Liu, Yadong Mu
In a most information-rich case of knowing environment maps and admitting shortest-path prior, we observe that given an origin-destination node pair, the internal route can be uniquely determined.
no code implementations • 11 May 2021 • Guiyu Tian, Wenhao Jiang, Wei Liu, Yadong Mu
To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category.
1 code implementation • NeurIPS 2020 • Lu Chi, Borui Jiang, Yadong Mu
FFC is a generic operator that can directly replace vanilla convolutions in a large body of existing networks, without any adjustments and with comparable complexity metrics (e. g., FLOPs).
1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1
1 code implementation • MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian
We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.
42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)
no code implementations • 2 Aug 2019 • Guoqiang Gong, Liangfeng Zheng, Kun Bai, Yadong Mu
Our proposed TSA-Net demonstrates clear and consistent better performances and re-calibrates new state-of-the-art on both benchmarks.
no code implementations • 1 Aug 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian
In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification.
Ranked #32 on Action Recognition on UCF101
no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees G. M. Snoek
Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremen- dous amounts of data.
Ranked #1 on Few-Shot Semantic Segmentation on Pascal5i
39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang
The proposed approach achieves superior results to existing single-model networks on COCO object detection.
Ranked #7 on Semantic Segmentation on LIP val
1 code implementation • 12 Aug 2017 • Lu Chi, Yadong Mu
There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks.
no code implementations • 12 Aug 2016 • Yadong Mu, Zhu Liu
In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning.
no code implementations • 14 Mar 2016 • Fumin Shen, Yadong Mu, Wei Liu, Yang Yang, Heng Tao Shen
The optimization alternatively proceeds over the binary classifiers and image hash codes.
no code implementations • 28 Jun 2015 • Yadong Mu, Wei Liu, Wei Fan
Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data.
no code implementations • CVPR 2014 • Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang
This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.
no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan
Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.