Search Results for author: Linjie Yang

Found 38 papers, 27 papers with code

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

no code implementations • 5 Mar 2024 • Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wang

Our MLM filter can generalize to different models and tasks, and be used as a drop-in replacement for CLIPScore.

Paper
Add Code

Video Recognition in Portrait Mode

1 code implementation • 21 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang

While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.

Data Augmentation Video Recognition

Paper
Code

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

1 code implementation • 16 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojun Chang, Heng Wang

A human need to capture both the event in every shot and associate them together to understand the story behind it.

Ranked #1 on video narration captioning on Shot2Story20K

Video Captioning video narration captioning +4

Paper
Code

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Text Generation Video Summarization

Paper
Add Code

Selective Feature Adapter for Dense Vision Transformers

no code implementations • 3 Oct 2023 • Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang

Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.

Depth Estimation

Paper
Add Code

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

no code implementations • 27 Sep 2023 • Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang

DataComp is a new benchmark dedicated to evaluating different methods for data filtering.

Paper
Add Code

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

1 code implementation • 23 Jul 2023 • Yiming Cui, Linjie Yang, Haichao Yu

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query.

Instance Segmentation Object +5

Paper
Code

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

1 code implementation • ICCV 2023 • Cheng-En Wu, Yu Tian, Haichao Yu, Heng Wang, Pedro Morgado, Yu Hen Hu, Linjie Yang

Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data.

Image Classification Language Modelling

Paper
Code

Exploring the Role of Audio in Video Captioning

no code implementations • 21 Jun 2023 • YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang

Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Paper
Add Code

FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors

1 code implementation • 15 Mar 2023 • Yiming Cui, Linjie Yang

With Transformerbased object detectors getting a better performance on the image domain tasks, recent works began to extend those methods to video object detection.

Object object-detection +1

Paper
Code

R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Paper
Code

Revisiting Training-free NAS Metrics: An Efficient Training-based Method

1 code implementation • 16 Nov 2022 • Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen

In this paper, we revisit these training-free metrics and find that: (1) the number of parameters (\#Param), which is the most straightforward training-free metric, is overlooked in previous works but is surprisingly effective, (2) recent training-free metrics largely rely on the \#Param information to rank networks.

Neural Architecture Search

Paper
Code

Dynamic Proposals for Efficient Object Detection

no code implementations • 12 Jul 2022 • Yiming Cui, Linjie Yang, Ding Liu

Object detection is a basic computer vision task to loccalize and categorize objects in a given image.

Object object-detection +1

Paper
Add Code

Robust High-Resolution Video Matting with Temporal Guidance

1 code implementation • 25 Aug 2021 • Shanchuan Lin, Linjie Yang, Imran Saleemi, Soumyadip Sengupta

We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.

4k Image Matting +2

8,151

Paper
Code

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo

Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.

Image Classification Neural Architecture Search +3

138

Paper
Code

Is In-Domain Data Really Needed? A Pilot Study on Cross-Domain Calibration for Network Quantization

no code implementations • 16 May 2021 • Haichao Yu, Linjie Yang, Humphrey Shi

Post-training quantization methods use a set of calibration data to compute quantization ranges for network parameters and activations.

Quantization

Paper
Add Code

Progressive Temporal Feature Alignment Network for Video Inpainting

1 code implementation • CVPR 2021 • Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee

To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.

Optical Flow Estimation Video Inpainting

Paper
Code

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.

Image Segmentation Neural Architecture Search +2

Paper
Code

DeepViT: Towards Deeper Vision Transformer

5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng

In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.

Ranked #423 on Image Classification on ImageNet

Image Classification Representation Learning

133

Paper
Code

AutoSpace: Neural Architecture Search with Less Human Interference

1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng

Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.

Neural Architecture Search

Paper
Code

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

1 code implementation • 7 Dec 2020 • Yang Fu, Linjie Yang, Ding Liu, Thomas S. Huang, Humphrey Shi

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.

Ranked #41 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +2

Paper
Code

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

1 code implementation • 4 Jul 2020 • Linjie Yang, Qing Jin

Model quantization helps to reduce model size and latency of deep neural networks.

Quantization

Paper
Code

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Ranked #60 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

105

Paper
Code

Towards Efficient Training for Neural Network Quantization

3 code implementations • 21 Dec 2019 • Qing Jin, Linjie Yang, Zhenyu Liao

To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.

Quantization

Paper
Code

AdaBits: Neural Network Quantization with Adaptive Bit-Widths

1 code implementation • CVPR 2020 • Qing Jin, Linjie Yang, Zhenyu Liao

With our proposed techniques applied on a bunch of models including MobileNet-V1/V2 and ResNet-50, we demonstrate that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.

Quantization

Paper
Code

AtomNAS: Fine-Grained End-to-End Neural Architecture Search

1 code implementation • ICLR 2020 • Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang

We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.

Ranked #61 on Neural Architecture Search on ImageNet

Neural Architecture Search

223

Paper
Code

Rethinking Neural Network Quantization

no code implementations • 25 Sep 2019 • Qing Jin, Linjie Yang, Zhenyu Liao

To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.

Quantization

Paper
Add Code

Weakly Supervised Body Part Segmentation with Pose based Part Priors

no code implementations • 30 Jul 2019 • Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo

The core idea is first converting the sparse weak labels such as keypoints to the initial estimate of body part masks, and then iteratively refine the part mask predictions.

Face Parsing Segmentation +1

Paper
Add Code

Video Instance Segmentation

5 code implementations • ICCV 2019 • Linjie Yang, Yuchen Fan, Ning Xu

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

Ranked #49 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +3

870

Paper
Code

Context-Aware Zero-Shot Recognition

1 code implementation • 19 Apr 2019 • Ruotian Luo, Ning Zhang, Bohyung Han, Linjie Yang

We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context.

Object Recognition Zero-Shot Learning

Paper
Code

Streamlined Dense Video Captioning

1 code implementation • CVPR 2019 • Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han

Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events.

Dense Video Captioning

Paper
Code

Slimmable Neural Networks

3 code implementations • ICLR 2019 • Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang

Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization.

Instance Segmentation Keypoint Detection +3

905

Paper
Code

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark

no code implementations • 6 Sep 2018 • Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, Thomas Huang

End-to-end sequential learning to explore spatialtemporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.

Image Segmentation Object +6

Paper
Add Code

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

4 code implementations • ECCV 2018 • Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas Huang

End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.

Ranked #12 on Video Object Segmentation on YouTube-VOS 2018 (F-Measure (Unseen) metric)

Image Segmentation Object +7

Paper
Code

Efficient Video Object Segmentation via Network Modulation

1 code implementation • CVPR 2018 • Linjie Yang, Yanran Wang, Xuehan Xiong, Jianchao Yang, Aggelos K. Katsaggelos

Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame.

Ranked #1 on One-shot visual object segmentation on YouTube-VOS 2018 (Jaccard (Seen) metric)

Object Segmentation +5

152

Paper
Code

Dense Captioning with Joint Inference and Visual Context

1 code implementation • CVPR 2017 • Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li

The goal is to densely detect visual concepts (e. g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.

Dense Captioning Descriptive

Paper
Code

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

3 code implementations • CVPR 2015 • Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou Tang

Updated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars.

Ranked #5 on Fine-Grained Image Classification on CompCars

Fine-Grained Image Classification General Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.