Search Results for author: Mengmeng Wang

Found 45 papers, 18 papers with code

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

no code implementations • 28 Mar 2024 • Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong liu, Jingdong Wang

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context.

Denoising Face Generation

Paper
Add Code

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

1 code implementation • 24 Mar 2024 • Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, JunHao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong liu

Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.

Ranked #17 on Rgb-T Tracking on RGBT234

Rgb-T Tracking Visual Object Tracking

Paper
Code

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

no code implementations • 22 Jan 2024 • Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu

In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.

Action Recognition Decoder +1

Paper
Add Code

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network

1 code implementation • 10 Dec 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao, Jongwon Ra, Laijian Li, Yong liu

Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving.

3D Semantic Scene Completion Autonomous Driving

Paper
Code

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

no code implementations • 4 Dec 2023 • Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts.

Action Recognition Descriptive +1

Paper
Add Code

LooGLE: Can Long-Context Language Models Understand Long Contexts?

1 code implementation • 8 Nov 2023 • Jiaqi Li, Mengmeng Wang, Zilong Zheng, Muhan Zhang

In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding.

In-Context Learning Long-Context Understanding +1

110

Paper
Code

Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking

no code implementations • ICCV 2023 • Teli Ma, Mengmeng Wang, Jimin Xiao, Huifeng Wu, Yong liu

In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network.

3D Object Tracking Object Tracking

Paper
Add Code

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

no code implementations • 21 Aug 2023 • Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong liu

This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold.

Second-order methods

Paper
Add Code

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

no code implementations • 20 Aug 2023 • Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls.

Layout-to-Image Generation

Paper
Add Code

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

1 code implementation • ICCV 2023 • Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong liu

Class prototype construction and matching are core aspects of few-shot action recognition.

Feature Correlation Few-Shot action recognition +1

Paper
Code

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

no code implementations • 3 Aug 2023 • Jiazheng Xing, Mengmeng Wang, Xiaojun Hou, Guang Dai, Jingdong Wang, Yong liu

The adapters we design can combine information from video-text multimodal sources for task-oriented spatiotemporal modeling, which is fast, efficient, and has low training costs.

Few-Shot action recognition Few Shot Action Recognition

Paper
Add Code

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

no code implementations • 2 Jul 2023 • Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong liu

In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process.

Data Free Quantization Model Compression

Paper
Add Code

PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation

1 code implementation • 27 Jun 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Xiaojun Hou, Laijian Li, Yong liu

Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the ``sampling-shifting-grouping" scheme to directly group thing points into instances from the raw point cloud efficiently.

Autonomous Driving Instance Segmentation +2

Paper
Code

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion

1 code implementation • 27 Jun 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Tianxin Huang, Xuemeng Yang, Yong liu

However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structure in scene completion remains under exploration.

Autonomous Driving Scene Understanding +1

Paper
Code

Correlation Pyramid Network for 3D Single Object Tracking

no code implementations • 16 May 2023 • Mengmeng Wang, Teli Ma, Xingxing Zuo, Jiajun Lv, Yong liu

Additionally, considering the sparsity characteristics of the point clouds, we design a lateral correlation pyramid structure for the encoder to keep as many points as possible by integrating hierarchical correlated features.

3D Single Object Tracking Autonomous Driving +3

Paper
Add Code

RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction

1 code implementation • ICCV 2023 • Zizhang Li, Xiaoyang Lyu, Yuanyuan Ding, Mengmeng Wang, Yiyi Liao, Yong liu

Recently, neural implicit surfaces have become popular for multi-view reconstruction.

Disentanglement Object +1

Paper
Code

Exploiting Neighborhood Structural Features for Change Detection

no code implementations • 10 Feb 2023 • Mengmeng Wang, Zhiqiang Han, Peizhen Yang, Bai Zhu, Ming Hao, Jianwei Fan, Yuanxin Ye

In this letter, a novel method for change detection is proposed using neighborhood structure correlation.

Change Detection

Paper
Add Code

Adjacent-Level Feature Cross-Fusion With 3-D CNN for Remote Sensing Image Change Detection

1 code implementation • 10 Feb 2023 • Yuanxin Ye, Mengmeng Wang, Liang Zhou, Guangyang Lei, Jianwei Fan, Yao Qin

First, through the inner fusion property of 3D convolution, we design a new feature fusion way that can simultaneously extract and fuse the feature information from bi-temporal images.

Change Detection

Paper
Code

Learning Discretized Neural Networks under Ricci Flow

no code implementations • 7 Feb 2023 • Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong liu

By introducing a partial differential equation on metrics, i. e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation.

Paper
Add Code

Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

no code implementations • 19 Jan 2023 • Jiazheng Xing, Mengmeng Wang, Yong liu, Boyu Mu

In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner.

Few-Shot action recognition Few Shot Action Recognition

Paper
Add Code

BSNet: Lane Detection via Draw B-spline Curves Nearby

no code implementations • 17 Jan 2023 • Haoxin Chen, Mengmeng Wang, Yong liu

The locality of lane representation is the ability to modify lanes locally which can simplify parameter optimization.

Lane Detection

Paper
Add Code

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

1 code implementation • 17 Jul 2022 • Zizhang Li, Mengmeng Wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, Yong liu

However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance.

Ranked #4 on Video Reconstruction on UVG

Video Reconstruction

Paper
Code

Dynamically Stable Poincaré Embeddings for Neural Manifolds

no code implementations • 21 Dec 2021 • Jun Chen, Yuang Liu, Xiangrui Zhao, Mengmeng Wang, Yong liu

As a result, we prove that, if initial metrics have an $L^2$-norm perturbation which deviates from the Hyperbolic metric on the Poincar\'e ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric.

Image Classification

Paper
Add Code

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

Paper
Code

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation

no code implementations • 21 Nov 2021 • Zizhang Li, Mengmeng Wang, Jianbiao Mei, Yong liu

Referring image segmentation is a typical multi-modal task, which aims at generating a binary mask for referent described in given language expressions.

Ranked #1 on Referring Expression Segmentation on G-Ref test B

Decoder Image Segmentation +3

Paper
Add Code

Explicitly Modeling the Discriminability for Instance-Aware Visual Object Tracking

no code implementations • 28 Oct 2021 • Mengmeng Wang, Xiaoqian Yang, Yong liu

Visual object tracking performance has been dramatically improved in recent years, but some severe challenges remain open, like distractors and occlusions.

Contrastive Learning Visual Object Tracking +1

Paper
Add Code

ActionCLIP: A New Paradigm for Video Action Recognition

2 code implementations • 17 Sep 2021 • Mengmeng Wang, Jiazheng Xing, Yong liu

Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".

Ranked #2 on Action Recognition In Videos on Kinetics-400

Action Classification Action Recognition In Videos +4

3,009

Paper
Code

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

2 code implementations • ICCV 2021 • Lina Liu, Xibin Song, Mengmeng Wang, Yong liu, Liangjun Zhang

Meanwhile, to guarantee that the day and night images contain the same information, the domain-separated network takes the day-time images and corresponding night-time images (generated by GAN) as input, and the private and invariant feature extractors are learned by orthogonality and similarity loss, where the domain gap can be alleviated, thus better depth maps can be expected.

Monocular Depth Estimation

1,426

Paper
Code

TransVOS: Video Object Segmentation with Transformers

1 code implementation • 1 Jun 2021 • Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yi Yuan, Yong liu

Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS).

Object One-shot visual object segmentation +3

Paper
Code

One-shot Face Reenactment Using Appearance Adaptive Normalization

no code implementations • 8 Feb 2021 • Guangming Yao, Yi Yuan, Tianjia Shao, Shuang Li, Shanqi Liu, Yong liu, Mengmeng Wang, Kun Zhou

The paper proposes a novel generative adversarial network for one-shot face reenactment, which can animate a single face image to a different pose-and-expression (provided by a driving image) while keeping its original appearance.

Face Reenactment Generative Adversarial Network

Paper
Add Code

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

no code implementations • 5 Feb 2021 • Jilin Tang, Yi Yuan, Tianjia Shao, Yong liu, Mengmeng Wang, Kun Zhou

In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance.

Image Generation

Paper
Add Code

RFNet: Recurrent Forward Network for Dense Point Cloud Completion

no code implementations • ICCV 2021 • Tianxin Huang, Hao Zou, Jinhao Cui, Xuemeng Yang, Mengmeng Wang, Xiangrui Zhao, Jiangning Zhang, Yi Yuan, Yifan Xu, Yong liu

The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline.

Point Cloud Completion

Paper
Add Code

FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion

no code implementations • 15 Dec 2020 • Lina Liu, Xibin Song, Xiaoyang Lyu, Junwei Diao, Mengmeng Wang, Yong liu, Liangjun Zhang

Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with a coarse depth map and color image as input.

Depth Completion

Paper
Add Code

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

1 code implementation • 14 Dec 2020 • Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong liu, Xinxin Chen, Yi Yuan

To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information.

Ranked #7 on Unsupervised Monocular Depth Estimation on KITTI-C

Monocular Depth Estimation Self-Supervised Learning +2

233

Paper
Code

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

no code implementations • 15 Sep 2020 • Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao

Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher.

Action Recognition Knowledge Distillation +1

Paper
Add Code

Semantic Graph Based Place Recognition for 3D Point Clouds

1 code implementation • 26 Aug 2020 • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong liu, Wanlong Li, Feng Wen

First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud.

Graph Matching Graph Similarity

177

Paper
Code

DTVNet: Dynamic Time-lapse Video Generation via Single Still Image

1 code implementation • ECCV 2020 • Jiangning Zhang, Chao Xu, Liang Liu, Mengmeng Wang, Xia Wu, Yong liu, Yunliang Jiang

The proposed DTVNet consists of two submodules: \emph{Optical Flow Encoder} (OFE) and \emph{Dynamic Video Generator} (DVG).

Decoder Optical Flow Estimation +1

Paper
Code

The 'Letter' Distribution in the Chinese Language

no code implementations • 26 May 2020 • Qinghua Chen, Yan Wang, Mengmeng Wang, Xiaomeng Li

In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts.

Paper
Add Code

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

no code implementations • 29 Mar 2020 • Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong liu

On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations.

Face Reenactment

Paper
Add Code

Extended Feature Pyramid Network for Small Object Detection

1 code implementation • 16 Mar 2020 • Chunfang Deng, Mengmeng Wang, Liang Liu, Yong liu

Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels.

Object object-detection +1

Paper
Code

STM: SpatioTemporal and Motion Encoding for Action Recognition

no code implementations • ICCV 2019 • Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan

Spatiotemporal and motion features are two complementary and crucial information for video action recognition.

Ranked #1 on Action Recognition In Videos on HMDB-51

Action Classification Action Recognition In Videos +1

Paper
Add Code

FReeNet: Multi-Identity Face Reenactment

1 code implementation • CVPR 2020 • Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong liu, Yu Ding, Changjie Fan

This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model.

Decoder Face Reenactment

Paper
Code

Large Margin Object Tracking with Circulant Feature Maps

no code implementations • CVPR 2017 • Mengmeng Wang, Yong liu, Zeyi Huang

Structured output support vector machine (SVM) based tracking algorithms have shown favorable performance recently.

Object Object Tracking

Paper
Add Code

Real-time 3D Human Tracking for Mobile Robots with Multisensors

no code implementations • 15 Mar 2017 • Mengmeng Wang, Daobilige Su, Lei Shi, Yong liu, Jaime Valls Miro

An ultrasonic sensor array is employed to provide the range information from the target person to the robot and Gaussian Process Regression is used for partial location estimation (2-D).

Sensor Fusion Visual Tracking

Paper
Add Code

Robust Object Tracking with a Hierarchical Ensemble Framework

no code implementations • 23 Sep 2015 • Mengmeng Wang, Yong liu

A discriminative model which accounts for the matching degree of local patches is adopted via a bottom ensemble layer, and a generative model which exploits holistic templates is used to search for the object through the middle ensemble layer as well as an adaptive Kalman filter.

Object Object Tracking

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.