Search Results for author: Yu-Wing Tai

Found 115 papers, 59 papers with code

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation5 Jan 2024 Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

no code implementations30 Dec 2023 Han Jiang, Haosen Sun, Ruoxuan Li, Chi-Keung Tang, Yu-Wing Tai

Second and the remaining problem is thus 3D multiview consistency among all completed images, now guided by the seed images and their 3D proxies.

Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent

no code implementations5 Dec 2023 Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

This paper explores promptable NeRF generation (e. g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control.

3D Generation 3D Reconstruction

DragVideo: Interactive Drag-style Video Editing

1 code implementation3 Dec 2023 Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang

The main issues are: 1) how to perform direct and accurate user control in editing; 2) how to execute editings like changing shape, expression, and layout without unsightly distortion and artifacts to the edited content; and 3) how to maintain spatio-temporal consistency of video after editing.

Video Editing Video Generation

SANeRF-HQ: Segment Anything for NeRF in High Quality

no code implementations3 Dec 2023 Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai

Recently, the Segment Anything Model (SAM) has showcased remarkable capabilities of zero-shot segmentation, while NeRF (Neural Radiance Fields) has gained popularity as a method for various 3D problems beyond novel view synthesis.

Novel View Synthesis Object +4

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

no code implementations29 Nov 2023 Juntao Zhang, Yuehuai Liu, Yu-Wing Tai, Chi-Keung Tang

Specifically, C3Net first aligns the conditions from multi-modalities to the same semantic latent space using modality-specific encoders based on contrastive training.

multimodal generation

Stable Segment Anything Model

1 code implementation27 Nov 2023 Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent Synthetic Images

1 code implementation27 Nov 2023 Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper presents Deceptive-Human, a novel Prompt-to-NeRF framework capitalizing state-of-the-art control diffusion models (e. g., ControlNet) to generate a high-quality controllable 3D human NeRF.

Density Estimation

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

no code implementations ICCV 2023 Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed.

Action Recognition Temporal Action Localization

Scene-Generalizable Interactive Segmentation of Radiance Fields

no code implementations9 Aug 2023 Songlin Tang, Wenjie Pei, Xin Tao, Tanghui Jia, Guangming Lu, Yu-Wing Tai

Existing methods for interactive segmentation in radiance fields entail scene-specific optimization and thus cannot generalize across different scenes, which greatly limits their applicability.

Interactive Segmentation Segmentation +1

Feature Decoupling-Recycling Network for Fast Interactive Segmentation

no code implementations7 Aug 2023 Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei

First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately.

Image Segmentation Interactive Segmentation +3

Segment Anything Meets Point Tracking

1 code implementation3 Jul 2023 Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models.

Interactive Video Object Segmentation Object +5

UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks

no code implementations7 Jun 2023 Yanan sun, Zihan Zhong, Qi Fan, Chi-Keung Tang, Yu-Wing Tai

Our thorough studies validate that models pre-trained as such can learn rich representations of both modalities, improving their ability to understand how images and text relate to each other.

Semantic Segmentation

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations NeurIPS 2023 Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

1 code implementation28 May 2023 Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

Our method consistently enhances the distillation algorithms, even on much larger-scale and more heterogeneous datasets, e. g. ImageNet-1K and Kinetics-400.

Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models

no code implementations24 May 2023 Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang

We introduce Deceptive-NeRF, a novel methodology for few-shot NeRF reconstruction, which leverages diffusion models to synthesize plausible pseudo-observations to improve the reconstruction.

Registering Neural Radiance Fields as 3D Density Images

no code implementations22 May 2023 Han Jiang, Ruoxuan Li, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang

No significant work has been done to directly merge two partially overlapping scenes using NeRF representations.

Contrastive Learning

Instance Neural Radiance Field

1 code implementation ICCV 2023 Yichen Liu, Benran Hu, Junkai Huang, Yu-Wing Tai, Chi-Keung Tang

This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as {\bf \inerflong}, or \inerf.

3D Instance Segmentation Panoptic Segmentation +1

Clean-NeRF: Reformulating NeRF to account for View-Dependent Observations

no code implementations26 Mar 2023 Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper analyzes the NeRF's struggles in such settings and proposes Clean-NeRF for accurate 3D reconstruction and novel view rendering in complex scenes.

3D Reconstruction Density Estimation +3

Compression-Aware Video Super-Resolution

1 code implementation CVPR 2023 Yingwei Wang, Xu Jia, Xin Tao, Takashi Isobe, Huchuan Lu, Yu-Wing Tai

Videos stored on mobile devices or delivered on the Internet are usually in compressed format and are of various unknown compression parameters, but most video super-resolution (VSR) methods often assume ideal inputs resulting in large performance gap between experimental settings and real-world applications.

Model Compression Video Enhancement +1

ONeRF: Unsupervised 3D Object Segmentation from Multiple Views

no code implementations22 Nov 2022 Shengnan Liang, Yichen Liu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We present ONeRF, a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.

3D scene Editing Object +1

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation21 Nov 2022 Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

no code implementations21 Nov 2022 Changlin Li, Guangyang Wu, Yanan sun, Xin Tao, Chi-Keung Tang, Yu-Wing Tai

The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame.

Video Frame Interpolation

Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

no code implementations8 Nov 2022 Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.

Autonomous Driving Domain Generalization

SDRTV-to-HDRTV Conversion via Spatial-Temporal Feature Fusion

no code implementations4 Nov 2022 Kepeng Xu, Li Xu, Gang He, Chang Wu, Zijia Ma, Ming Sun, Yu-Wing Tai

To evaluate the performance of the proposed method, we construct a corresponding multi-frame dataset using HDR video of the HDR10 standard to conduct a comprehensive evaluation of different methods.

Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks

no code implementations13 Oct 2022 Rui Qin, Bin Wang, Yu-Wing Tai

The CP Loss supervises the text reconstruction with content semantics by multi-scale text recognition features, which effectively incorporates content awareness into the framework.

Image Reconstruction Image Super-Resolution +1

Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation

no code implementations2 Oct 2022 Xinhang Liu, Jiaben Chen, Huai Yu, Yu-Wing Tai, Chi-Keung Tang

The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss, enabling an unsupervised partitioning of a scene into salient or meaningful regions corresponding to different object instances.

3D Object Editing Object +2

Occlusion-Aware Instance Segmentation via BiLayer Network Architectures

1 code implementation8 Aug 2022 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees).

Instance Segmentation Segmentation +2

Video Mask Transfiner for High-Quality Video Instance Segmentation

1 code implementation28 Jul 2022 Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.

Instance Segmentation Semantic Segmentation +2

Self-Support Few-Shot Semantic Segmentation

1 code implementation23 Jul 2022 Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang

Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching strategy to alleviate this problem, which uses query prototypes to match query features, where the query prototypes are collected from high-confidence query predictions.

Few-Shot Semantic Segmentation Segmentation +1

Learning Sequence Representations by Non-local Recurrent Neural Memory

1 code implementation20 Jul 2022 Wenjie Pei, Xin Feng, Canmiao Fu, Qiong Cao, Guangming Lu, Yu-Wing Tai

The key challenge of sequence representation learning is to capture the long-range temporal dependencies.

Representation Learning

GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector

2 code implementations30 May 2022 Peng Zheng, Huazhu Fu, Deng-Ping Fan, Qi Fan, Jie Qin, Yu-Wing Tai, Chi-Keung Tang, Luc van Gool

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes.

Co-Salient Object Detection Object +2

Human Instance Matting via Mutual Guidance and Multi-Instance Refinement

1 code implementation CVPR 2022 Yanan sun, Chi-Keung Tang, Yu-Wing Tai

A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality.

Image Matting Instance Segmentation +1

Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling

1 code implementation CVPR 2022 Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, Yu-Wing Tai

Instead of directly feeding consecutive frames into a VSR model, we propose to compute the temporal difference between frames and divide those pixels into two subsets according to the level of difference.

Motion Compensation Optical Flow Estimation +1

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

no code implementations15 Feb 2022 Mu-Ruei Tseng, Abhishek Gupta, Chi-Keung Tang, Yu-Wing Tai

All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction.

Atomic action recognition

Transcoded Video Restoration by Temporal Spatial Auxiliary Network

1 code implementation15 Dec 2021 Li Xu, Gang He, Jinjia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers.

Video Editing Video Restoration

NeRF-SR: High-Quality Neural Radiance Fields using Supersampling

1 code implementation3 Dec 2021 Chen Wang, Xian Wu, Yuan-Chen Guo, Song-Hai Zhang, Yu-Wing Tai, Shi-Min Hu

We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs.

Novel View Synthesis Vocal Bursts Intensity Prediction

Occlusion-Aware Video Object Inpainting

no code implementations ICCV 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

To facilitate this new research, we construct the first large-scale video object inpainting benchmark YouTube-VOI to provide realistic occlusion scenarios with both occluded and visible object masks available.

Object Texture Synthesis +1

Few-Shot Video Object Detection

1 code implementation30 Apr 2021 Qi Fan, Chi-Keung Tang, Yu-Wing Tai

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity.

Few-Shot Video Object Detection Object +2

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

1 code implementation CVPR 2021 Yanan sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.

Image Matting Optical Flow Estimation +1

Semantic Image Matting

1 code implementation CVPR 2021 Yanan sun, Chi-Keung Tang, Yu-Wing Tai

Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.

Semantic Image Matting Transparent objects

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

1 code implementation CVPR 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries.

Amodal Instance Segmentation Boundary Detection +4

Group Collaborative Learning for Co-Salient Object Detection

1 code implementation CVPR 2021 Qi Fan, Deng-Ping Fan, Huazhu Fu, Chi Keung Tang, Ling Shao, Yu-Wing Tai

We present a novel group collaborative learning framework (GCoNet) capable of detecting co-salient objects in real time (16ms), by simultaneously mining consensus representations at group level based on the two necessary criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module; 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module conditioning the inconsistent consensus.

Co-Salient Object Detection Object +2

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

5 code implementations CVPR 2021 Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance.

 Ranked #1 on Interactive Video Object Segmentation on DAVIS 2017 (using extra training data)

Interactive Video Object Segmentation Semantic Segmentation +2

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

2 code implementations24 Feb 2021 Yang You, Yujing Lou, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu

Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point.

3D Feature Matching Data Augmentation

Semi-Supervised Few-Shot Atomic Action Recognition

1 code implementation17 Nov 2020 Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang

Despite excellent progress has been made, the performance on action recognition still heavily relies on specific datasets, which are difficult to extend new action classes due to labor-intensive labeling.

Atomic action recognition

HAA500: Human-Centric Atomic Action Dataset with Curated Videos

no code implementations ICCV 2021 Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang

We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames.

Action Classification Action Recognition

Pose-Guided High-Resolution Appearance Transfer via Progressive Training

no code implementations27 Aug 2020 Ji Liu, Heshan Liu, Mang-Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

We propose a novel pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution (1024 * 1024), given respectively an image of the reference and target person.

Video Generation Vocal Bursts Intensity Prediction

Fully Convolutional Networks for Continuous Sign Language Recognition

no code implementations ECCV 2020 Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai

Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences.

Sentence Sign Language Recognition

Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking

2 code implementations ECCV 2020 Jian-Feng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

In this paper, we propose an efficient and effective dense hybrid recurrent multi-view stereo net with dynamic consistency checking, namely $D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction.

Point cloud reconstruction

Dive Deeper Into Box for Object Detection

no code implementations ECCV 2020 Ran Chen, Yong liu, Mengdan Zhang, Shu Liu, Bei Yu, Yu-Wing Tai

Anchor free methods have defined the new frontier in state-of-the-art object detection researches where accurate bounding box estimation is the key to the success of these methods.

Object object-detection +1

Cascaded deep monocular 3D human pose estimation with evolutionary training data

1 code implementation CVPR 2020 Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.

Data Augmentation Monocular 3D Human Pose Estimation +3

One-Shot Object Detection without Fine-Tuning

1 code implementation8 May 2020 Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.

Metric Learning Object +2

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

2 code implementations CVPR 2020 Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang

In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.

 Ranked #1 on Semantic Segmentation on BIG (using extra training data)

4k Land Cover Classification +3

Learning Video Object Segmentation from Unlabeled Videos

1 code implementation CVPR 2020 Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.

Object Representation Learning +6

Spatial-Scale Aligned Network for Fine-Grained Recognition

no code implementations5 Jan 2020 Lizhao Gao, Hai-Hua Xu, Chong Sun, Junling Liu, Yu-Wing Tai

Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations while neglecting the spatial and scale misalignments, leading to inferior performance.

Fine-Grained Visual Recognition

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

1 code implementation ECCV 2020 Hongwei Yi, Zizhuang Wei, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

n this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction.

3D Point Cloud Reconstruction Depth Estimation +1

Reflective Decoding Network for Image Captioning

no code implementations ICCV 2019 Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance.

Image Captioning Position +1

Cross-Domain Adaptation for Animal Pose Estimation

no code implementations ICCV 2019 Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai

Therefore, the easily available human pose dataset, which is of a much larger scale than our labeled animal dataset, provides important prior knowledge to boost up the performance on animal pose estimation.

Animal Pose Estimation Domain Adaptation

SF-Net: Structured Feature Network for Continuous Sign Language Recognition

no code implementations4 Aug 2019 Zhaoyang Yang, Zhenmei Shi, Xiaoyong Shen, Yu-Wing Tai

The proposed SF-Net extracts features in a structured manner and gradually encodes information at the frame level, the gloss level and the sentence level into the feature representation.

Sentence Sign Language Recognition

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking

no code implementations2 Aug 2019 Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang

Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.

Video Object Tracking Visual Tracking

StableNet: Semi-Online, Multi-Scale Deep Video Stabilization

no code implementations24 Jul 2019 Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang

Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions.

Video Stabilization

Landmark Assisted CycleGAN for Cartoon Face Generation

no code implementations2 Jul 2019 Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia

In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones.

Face Generation

Memory-Attended Recurrent Network for Video Captioning

1 code implementation CVPR 2019 Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed.

Video Captioning

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

1 code implementation ICCV 2019 Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details.

Style Transfer

Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution

1 code implementation23 Nov 2018 Yang You, Yujing Lou, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Cewu Lu, Weiming Wang

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown.

3D Feature Matching Data Augmentation

Physics-Based Generative Adversarial Models for Image Restoration and Beyond

no code implementations2 Aug 2018 Jinshan Pan, Jiangxin Dong, Yang Liu, Jiawei Zhang, Jimmy Ren, Jinhui Tang, Yu-Wing Tai, Ming-Hsuan Yang

We present an algorithm to directly solve numerous image restoration problems (e. g., image deblurring, image dehazing, image deraining, etc.).

Deblurring Image Deblurring +3

Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer

1 code implementation CVPR 2018 Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, Cewu Lu

In this paper, we present a novel method to generate synthetic human part segmentation data using easily-obtained human keypoint annotations.

Ranked #4 on Human Part Segmentation on PASCAL-Part (using extra training data)

Human Parsing Human Part Segmentation +3

Deep High Dynamic Range Imaging with Large Foreground Motions

1 code implementation ECCV 2018 Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang

In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions.

Translation Vocal Bursts Intensity Prediction

Image Generation from Sketch Constraint Using Contextual GAN

1 code implementation ECCV 2018 Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We train a generated adversarial network, i. e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images.

Image-to-Image Translation Translation

Deep Video Generation, Prediction and Completion of Human Action Sequences

no code implementations ECCV 2018 Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage.

Human action generation Video Generation +1

Adversarial Attacks Beyond the Image Space

no code implementations CVPR 2019 Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung Tang, Alan L. Yuille

Though image-space adversaries can be interpreted as per-pixel albedo change, we verify that they cannot be well explained along these physically meaningful dimensions, which often have a non-local effect.

Question Answering Visual Question Answering

Image Dehazing using Bilinear Composition Loss Function

no code implementations1 Oct 2017 Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy Ren, Yu-Wing Tai

In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing.

Blocking Image Dehazing

Attribute-Guided Face Generation Using Conditional CycleGAN

no code implementations ECCV 2018 Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang

We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that satisfies the given attributes.

Attribute Face Generation +2

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation

1 code implementation CVPR 2017 Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon

In this paper, we introduce robust and synergetic hand-crafted features and a simple but efficient deep feature from a convolutional neural network (CNN) architecture for defocus estimation.

Defocus Estimation Image Generation

RMPE: Regional Multi-person Pose Estimation

14 code implementations ICCV 2017 Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu

In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes.

2D Human Pose Estimation Human Detection +2

Refining Geometry from Depth Sensors using IR Shading Images

no code implementations18 Aug 2016 Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

To resolve the ambiguity in our model between the normals and distances, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not captured and reconstructed by the Kinect fusion.

Efficient and Robust Color Consistency for Community Photo Collections

no code implementations CVPR 2016 Jaesik Park, Yu-Wing Tai, Sudipta N. Sinha, In So Kweon

We present a robust low-rank matrix factorization method to estimate the unknown parameters of this model.

Deep Saliency with Encoded Low level Distance Map and High Level Features

2 code implementations CVPR 2016 Gayoung Lee, Yu-Wing Tai, Junmo Kim

Recent advances in saliency detection have utilized deep learning to obtain high level features to detect salient regions in a scene.

Saliency Detection

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

no code implementations13 Feb 2016 Jimmy Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, Qiong Yan

This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable.

Speaker Identification

RGB-Guided Hyperspectral Image Upsampling

no code implementations ICCV 2015 Hyeokhyen Kwon, Yu-Wing Tai

On the contrary, latest imaging sensors capture a RGB image with resolution of multiple times larger than a hyperspectral image.

Fast Randomized Singular Value Thresholding for Low-rank Optimization

no code implementations1 Sep 2015 Tae-Hyun Oh, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration.

Clustering

Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization

no code implementations CVPR 2015 Tae-Hyun Oh, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

The problems related to NNM (or WNNM) can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT) (or Weighted SVT), but they suffer from high computational cost to compute a Singular Value Decomposition (SVD) at each iteration.

Clustering

Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation

no code implementations CVPR 2015 Hyeokhyen Kwon, Yu-Wing Tai, Stephen Lin

Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantization.

Dictionary Learning Quantization

Partial Sum Minimization of Singular Values in Robust PCA: Algorithm and Applications

no code implementations4 Mar 2015 Tae-Hyun Oh, Yu-Wing Tai, Jean-Charles Bazin, Hyeongwoo Kim, In So Kweon

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers.

Edge Detection

Salient Region Detection via High-Dimensional Color Transform

no code implementations CVPR 2014 Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim

By mapping a low dimensional RGB color to a feature vector in a high-dimensional color space, we show that we can linearly separate the salient regions from the background by finding an optimal linear combination of color coefficients in the high-dimensional color space.

Vocal Bursts Intensity Prediction

Calibrating a Non-isotropic Near Point Light Source using a Plane

no code implementations CVPR 2014 Jaesik Park, Sudipta N. Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene.

Position

Exploiting Shading Cues in Kinect IR Images for Geometry Refinement

no code implementations CVPR 2014 Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

To resolve ambiguity in our model between normals and distance, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not reconstructed by the Kinect fusion.

Shading-Based Shape Refinement of RGB-D Images

no code implementations CVPR 2013 Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, Stephen Lin

We present a shading-based shape refinement algorithm which uses a noisy, incomplete depth map from Kinect to help resolve ambiguities in shape-from-shading.

Cannot find the paper you are looking for? You can Submit a new open access paper.