Search Results for author: Guoxi Huang

Found 7 papers, 4 papers with code

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

1 code implementation30 Nov 2023 Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, Jinwen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, Jingdong Wang

The LRDiff framework constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object.

Denoising Image Generation

Masked Image Residual Learning for Scaling Deeper Vision Transformers

1 code implementation NeurIPS 2023 Guoxi Huang, Hongtao Fu, Adrian G. Bors

With the same level of computational complexity as ViT-Base and ViT-Large, we instantiate 4. 5$\times$ and 2$\times$ deeper ViTs, dubbed ViT-S-54 and ViT-B-48.

object-detection Object Detection +3

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

no code implementations23 Nov 2022 Guoxi Huang, Adrian G. Bors

Static appearance of video may impede the ability of a deep neural network to learn motion-relevant features in video action recognition.

Action Recognition Temporal Action Localization +1

BQN: Busy-Quiet Net Enabled by Motion Band-Pass Module for Action Recognition

no code implementations TIP 2022 Guoxi Huang, Adrian G. Bors

Through experiments we show that the proposed MBPM can be used as a plug-in module in various CNN backbone architectures, significantly boosting their performance.

Action Recognition

Busy-Quiet Video Disentangling for Video Classification

2 code implementations29 Mar 2021 Guoxi Huang, Adrian G. Bors

We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data.

Action Classification Action Recognition In Videos +3

Region-based Non-local Operation for Video Classification

1 code implementation17 Jul 2020 Guoxi Huang, Adrian G. Bors

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult.

Action Classification Action Recognition In Videos +4

Learning spatio-temporal representations with temporal squeeze pooling

no code implementations11 Feb 2020 Guoxi Huang, Adrian G. Bors

In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images, named Squeezed Images.

Ranked #43 on Action Recognition on UCF101 (using extra training data)

Action Recognition Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.