Search Results for author: Yehao Li

Found 31 papers, 12 papers with code

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

no code implementations • 26 Mar 2024 • Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

Diffusion models have recently brought a powerful revolution in image generation.

Paper
Add Code

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

no code implementations • 25 Mar 2024 • Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen

Despite this progress, mask strategy still suffers from two inherent limitations: (a) training-inference discrepancy and (b) fuzzy relations between mask reconstruction & generative diffusion process, resulting in sub-optimal training of DiT.

Image Generation

Paper
Add Code

HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs

no code implementations • 18 Mar 2024 • Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (namely HIRI-ViT), that upgrades prevalent four-stage ViT to five-stage ViT tailored for high-resolution inputs.

Paper
Add Code

Control3D: Towards Controllable Text-to-3D Generation

no code implementations • 9 Nov 2023 • Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, Tao Mei

In particular, a 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF, encouraging each view of 3D scene aligned with the given text prompt and hand-drawn sketch.

3D Generation Text to 3D

Paper
Add Code

HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces

no code implementations • CVPR 2023 • Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

Next, as every two neighbor edges compose a surface, we obtain the edge-level representation of each anchor edge via surface-to-edge aggregation over all neighbor surfaces.

3D Object Classification Semantic Segmentation

Paper
Add Code

Semantic-Conditional Diffusion Networks for Image Captioning

1 code implementation • CVPR 2023 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Jianlin Feng, Hongyang Chao, Tao Mei

The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process.

Cross-Modal Retrieval Image Captioning +3

1,004

Paper
Code

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement

1 code implementation • 15 Nov 2022 • Zhaofan Qiu, Yehao Li, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.

Paper
Code

Dual Vision Transformer

1 code implementation • 11 Jul 2022 • Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, Tao Mei

Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy.

179

Paper
Code

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

2 code implementations • 11 Jul 2022 • Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei

Motivated by the wavelet theory, we construct a new Wavelet Vision Transformer (\textbf{Wave-ViT}) that formulates the invertible down-sampling with wavelet transforms and self-attention learning in a unified way.

Ranked #212 on Image Classification on ImageNet

Image Classification Instance Segmentation +4

2,987

Paper
Code

Comprehending and Ordering Semantics for Image Captioning

1 code implementation • CVPR 2022 • Yehao Li, Yingwei Pan, Ting Yao, Tao Mei

In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.

Cross-Modal Retrieval Image Captioning +2

1,004

Paper
Code

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

1 code implementation • 13 Jun 2022 • Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.

Imitation Learning

Paper
Code

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

no code implementations • 11 Jan 2022 • Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei

Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks.

Image Captioning Language Modelling +3

Paper
Add Code

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

no code implementations • 14 Dec 2021 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.

Cross-Modal Retrieval Denoising +6

Paper
Add Code

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

2 code implementations • 18 Aug 2021 • Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

Cross-Modal Retrieval Image Captioning +5

1,004

Paper
Code

Contextual Transformer Networks for Visual Recognition

7 code implementations • 26 Jul 2021 • Yehao Li, Ting Yao, Yingwei Pan, Tao Mei

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

Ranked #288 on Image Classification on ImageNet

Image Classification Instance Segmentation +3

10,842

Paper
Code

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

1 code implementation • 27 Jan 2021 • Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.

Paper
Code

Pre-training for Video Captioning Challenge 2020 Summary

no code implementations • 27 Jul 2020 • Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei

The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.

Video Captioning

Paper
Add Code

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

no code implementations • 5 Jul 2020 • Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei

In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.

Question Answering Sentence +3

Paper
Add Code

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

no code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei

A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.

Clustering Unsupervised Domain Adaptation

Paper
Add Code

X-Linear Attention Networks for Image Captioning

2 code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Tao Mei

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.

Ranked #21 on Image Captioning on COCO Captions

Fine-Grained Visual Recognition Image Captioning +3

268

Paper
Code

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019

2 code implementations • 8 Oct 2019 • Yingwei Pan, Yehao Li, Qi Cai, Yang Chen, Ting Yao

Semi-Supervised Domain Adaptation: For this task, we adopt a standard self-learning framework to construct a classifier based on the labeled source and target data, and generate the pseudo labels for unlabeled target data.

Domain Adaptation Self-Learning +1

Paper
Code

Hierarchy Parsing for Image Captioning

no code implementations • ICCV 2019 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.

Image Captioning

Paper
Add Code

Deep Metric Learning with Density Adaptivity

no code implementations • 9 Sep 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.

Metric Learning

Paper
Add Code

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

no code implementations • 14 Jun 2019 • Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao

This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.

Action Recognition Dense Captioning +2

Paper
Add Code

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

1 code implementation • 3 May 2019 • Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei

Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.

Sentence Video Captioning

Paper
Code

Pointing Novel Objects in Image Captioning

no code implementations • CVPR 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

Image captioning has received significant attention with remarkable improvements in recent advances.

Image Captioning Object +2

Paper
Add Code

Transferrable Prototypical Networks for Unsupervised Domain Adaptation

no code implementations • CVPR 2019 • Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei

Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.

Pseudo Label Unsupervised Domain Adaptation

Paper
Add Code

Exploring Visual Relationship for Image Captioning

no code implementations • ECCV 2018 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.

Image Captioning Sentence

Paper
Add Code

Jointly Localizing and Describing Events for Dense Video Captioning

no code implementations • CVPR 2018 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

A valid question is how to temporally localize and then describe events, which is known as "dense video captioning."

Attribute Dense Video Captioning +3

Paper
Add Code

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

no code implementations • CVPR 2017 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Image captioning often requires a large set of training image-sentence pairs.

Image Captioning Object Recognition +1

Paper
Add Code

Boosting Image Captioning with Attributes

no code implementations • ICCV 2017 • Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.

Image Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.