Search Results for author: Long Zhao

Found 41 papers, 22 papers with code

VideoPrism: A Foundational Visual Encoder for Video Understanding

no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

Question Answering Video Question Answering +1

Paper
Add Code

Distilling Vision-Language Models on Millions of Videos

no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Language Modelling Retrieval +2

Paper
Add Code

Generating Enhanced Negatives for Training Language-Based Object Detectors

1 code implementation • 29 Dec 2023 • Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations.

Object object-detection +1

Paper
Code

Unwinding Stochastic Order Flow: When to Warehouse Trades

no code implementations • 22 Oct 2023 • Marcel Nutz, Kevin Webster, Long Zhao

We study how to unwind stochastic order flow with minimal transaction costs.

Paper
Add Code

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

no code implementations • 2 Sep 2023 • Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas

The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects.

Paper
Add Code

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

1 code implementation • ICCV 2023 • Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng

To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.

Multiview Learning Video Recognition

Paper
Code

Taming Self-Training for Open-Vocabulary Object Detection

2 code implementations • 11 Aug 2023 • Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs.

Object object-detection +1

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,588

Paper
Code

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Paper
Add Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

2,990

Paper
Code

Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning

2 code implementations • 16 Mar 2023 • Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas

In the context of continual learning, prototypes-as representative class embeddings-offer advantages in memory conservation and the mitigation of catastrophic forgetting.

Class Incremental Learning Contrastive Learning +1

Paper
Code

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

1 code implementation • 20 Jul 2022 • Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.

Action Detection Action Recognition +3

Paper
Code

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

1 code implementation • 18 Jul 2022 • Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.

Ranked #15 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Object object-detection +3

Paper
Code

Limits of Semistatic Trading Strategies

no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao

We show that pointwise limits of semistatic trading strategies in discrete time are again semistatic strategies.

Paper
Add Code

Martingale Schrödinger Bridges and Optimal Semistatic Portfolios

no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao

In a two-period financial market where a stock is traded dynamically and European options at maturity are traded statically, we study the so-called martingale Schr\"odinger bridge Q*; that is, the minimal-entropy martingale measure among all models calibrated to option prices.

Position

Paper
Add Code

Are Multimodal Transformers Robust to Missing Modality?

no code implementations • CVPR 2022 • Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng

Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.

Paper
Add Code

Global Matching with Overlapping Attention for Optical Flow Estimation

1 code implementation • CVPR 2022 • Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas

In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.

Ranked #4 on Optical Flow Estimation on KITTI 2015

Optical Flow Estimation regression

Paper
Code

Min-Max Latency Optimization Based on Sensed Position State Information in Internet of Vehicles

no code implementations • 19 Mar 2022 • Pengzun Gao, Long Zhao, Kan Zheng, Pingzhi Fan

The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV).

Position

Paper
Add Code

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

1 code implementation • 11 Dec 2021 • Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf

Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects.

Ranked #2 on Group Activity Recognition on Collective Activity

Group Activity Recognition Relational Reasoning

Paper
Code

Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach

no code implementations • 5 Aug 2021 • Xi Peng, Fengchun Qiao, Long Zhao

We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.

Domain Generalization Image Classification +5

Paper
Add Code

Improved Transformer for High-Resolution GANs

1 code implementation • NeurIPS 2021 • Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs).

Ranked #2 on Image Generation on CelebA 256x256 (FID metric)

Image Generation Vocal Bursts Intensity Prediction

Paper
Code

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

6 code implementations • 26 May 2021 • Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.

Ranked #84 on Image Classification on CIFAR-10

Image Classification Image Generation

29,713

Paper
Code

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

no code implementations • 20 May 2021 • Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.

Contrastive Learning Image Captioning +4

Paper
Add Code

SMIL: Multimodal Learning with Severely Missing Modality

1 code implementation • 9 Mar 2021 • Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng

A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.

Meta-Learning

Paper
Code

Box Re-Ranking: Unsupervised False Positive Suppression for Domain Adaptive Pedestrian Detection

no code implementations • 1 Feb 2021 • WeiJie Chen, Yilu Guo, Shicai Yang, Zhaoyang Li, Zhenxin Ma, Binbin Chen, Long Zhao, Di Xie, ShiLiang Pu, Yueting Zhuang

Therefore, it yields our attention to suppress false positive in each target domain in an unsupervised way.

object-detection Object Detection +2

Paper
Add Code

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.

Action Recognition Contrastive Learning +1

32,783

Paper
Code

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam

Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.

3D Pose Estimation Action Recognition +2

32,783

Paper
Code

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

1 code implementation • NeurIPS 2020 • Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas

In this paper, we propose a novel and effective regularization term for adversarial data augmentation.

Data Augmentation

Paper
Code

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

no code implementations • 10 Aug 2020 • Kuan Fang, Long Zhao, Zhan Shen, RuiXing Wang, RiKang Zhour, LiWen Fan

Search engine has become a fundamental component in various web and mobile applications.

Retrieval Semantic Retrieval

Paper
Add Code

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

no code implementations • CVPR 2020 • Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas

Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student.

3D Hand Pose Estimation Knowledge Distillation

Paper
Add Code

Learning to Learn Single Domain Generalization

1 code implementation • CVPR 2020 • Fengchun Qiao, Long Zhao, Xi Peng

Domain Generalization Meta-Learning

139

Paper
Code

Rethinking Kernel Methods for Node Representation Learning on Graphs

1 code implementation • NeurIPS 2019 • Yu Tian, Long Zhao, Xi Peng, Dimitris N. Metaxas

Graph kernels are kernel methods measuring graph similarity and serve as a standard tool for graph classification.

Ranked #8 on Link Prediction on Cora

General Classification Graph Classification +4

Paper
Code

Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention

1 code implementation • 20 Jul 2019 • Yuxiao Chen, Long Zhao, Xi Peng, Jianbo Yuan, Dimitris N. Metaxas

We propose a Dynamic Graph-Based Spatial-Temporal Attention (DG-STA) method for hand gesture recognition.

Ranked #3 on Hand Gesture Recognition on SHREC 2017

Hand Gesture Recognition Hand-Gesture Recognition +1

Paper
Code

Semantic Graph Convolutional Networks for 3D Human Pose Regression

5 code implementations • CVPR 2019 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas

In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression.

Ranked #26 on Monocular 3D Human Pose Estimation on Human3.6M

Monocular 3D Human Pose Estimation regression

456

Paper
Code

Short-term Road Traffic Prediction based on Deep Cluster at Large-scale Networks

no code implementations • 25 Feb 2019 • Lingyi Han, Kan Zheng, Long Zhao, Xianbin Wang, Xuemin Shen

Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper.

Clustering Deep Clustering +3

Paper
Add Code

A Driving Intention Prediction Method Based on Hidden Markov Model for Autonomous Driving

no code implementations • 25 Feb 2019 • Shiwen Liu, Kan Zheng, Long Zhao, Pingzhi Fan

Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions.

Autonomous Driving

Paper
Add Code

Learning to Forecast and Refine Residual Motion for Image-to-Video Generation

1 code implementation • ECCV 2018 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas

We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object.

Human Pose Forecasting Image to Video Generation +1

Paper
Code

CR-GAN: Learning Complete Representations for Multi-view Generation

1 code implementation • 28 Jun 2018 • Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris N. Metaxas

Generating multi-view images from a single-view input is an essential yet challenging problem.

Generative Adversarial Network Self-Supervised Learning

121

Paper
Code

Cartoonish sketch-based face editing in videos using identity deformation transfer

no code implementations • 25 Mar 2017 • Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas

We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame.

Face Model

Paper
Add Code

Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-paced Curriculum Learning

no code implementations • 3 Mar 2017 • Dingwen Zhang, Deyu Meng, Long Zhao, Junwei Han

Weakly-supervised object detection (WOD) is a challenging problems in computer vision.

Ranked #34 on Weakly Supervised Object Detection on PASCAL VOC 2007

Object object-detection +2

Paper
Add Code

Object Proposal by Multi-Branch Hierarchical Segmentation

no code implementations • CVPR 2015 • Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei

Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.

Object object-detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.