Search Results for author: Yong Jae Lee

Found 69 papers, 38 papers with code

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

no code implementations • 22 Mar 2024 • Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance.

Language Modelling Large Language Model +3

Paper
Add Code

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations • 26 Feb 2024 • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +3

151

Paper
Code

Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving

no code implementations • 23 Feb 2024 • Yichen Xie, Hongge Chen, Gregory P. Meyer, Yong Jae Lee, Eric M. Wolff, Masayoshi Tomizuka, Wei Zhan, Yuning Chai, Xin Huang

Observations from different angles enable the recovery of 3D object states from 2D image inputs if we can identify the same instance in different input frames.

Autonomous Driving Contrastive Learning +1

Paper
Add Code

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.

counterfactual Data Augmentation +2

Paper
Code

Edit One for All: Interactive Batch Image Editing

no code implementations • 18 Jan 2024 • Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner.

Paper
Add Code

Interfacing Foundation Models' Embeddings

1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

The proposed interface is adaptive to new tasks, and new models.

Image Segmentation Retrieval +2

Paper
Code

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

no code implementations • 4 Dec 2023 • Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao, Yong Jae Lee

We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts.

Domain Generalization Text-to-Image Generation

Paper
Add Code

Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations • 1 Dec 2023 • Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

Paper
Add Code

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

no code implementations • 13 Nov 2023 • Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee

The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations.

Autonomous Vehicles

Paper
Add Code

Improved Baselines with Visual Instruction Tuning

5 code implementations • 5 Oct 2023 • Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning.

Ranked #3 on visual instruction following on LLaVA-Bench

Factual Inconsistency Detection in Chart Captioning visual instruction following +1

124,593

Paper
Code

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

1 code implementation • ICCV 2023 • Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.

Ranked #15 on Domain Generalization on PACS

Domain Generalization Knowledge Distillation +2

Paper
Code

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

no code implementations • 19 Sep 2023 • Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).

Image Classification Language Modelling +1

Paper
Add Code

Visual Instruction Inversion: Image Editing via Visual Prompting

1 code implementation • 26 Jul 2023 • Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images.

Visual Prompting

Paper
Code

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations • 25 Jul 2023 • Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Paper
Add Code

Generate Anything Anywhere in Any Scene

no code implementations • 29 Jun 2023 • Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields.

Data Augmentation Object

Paper
Add Code

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

no code implementations • 9 Jun 2023 • Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.

Image Classification In-Context Learning +2

Paper
Add Code

Visual Instruction Tuning

9 code implementations • NeurIPS 2023 • Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

Ranked #4 on Visual Question Answering on BenchLMM

Video Question Answering visual instruction following +2

124,593

Paper
Code

Segment Everything Everywhere All at Once

2 code implementations • NeurIPS 2023 • Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Image Segmentation Interactive Segmentation +4

13,416

Paper
Code

InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

no code implementations • 13 Mar 2023 • Zhuoran Yu, Yin Li, Yong Jae Lee

Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i. e., close to the current training data.

Out-of-Distribution Detection

Paper
Add Code

Towards Universal Fake Image Detectors that Generalize Across Generative Models

1 code implementation • CVPR 2023 • Utkarsh Ojha, Yuheng Li, Yong Jae Lee

In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images.

Classification Language Modelling

113

Paper
Code

Learning Customized Visual Models with Retrieval-Augmented Knowledge

1 code implementation • CVPR 2023 • Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability.

Ranked #1 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (using extra training data)

Contrastive Learning Retrieval +3

117

Paper
Code

GLIGEN: Open-Set Grounded Text-to-Image Generation

1 code implementation • CVPR 2023 • Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

Large-scale text-to-image diffusion models have made amazing advances.

Ranked #4 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Image Inpainting

1,782

Paper
Code

Generalized Decoding for Pixel, Image, and Language

1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.

Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)

Image Segmentation Panoptic Segmentation +3

1,244

Paper
Code

Expeditious Saliency-guided Mix-up through Random Gradient Thresholding

1 code implementation • 9 Dec 2022 • Minh-Long Luu, Zeyi Huang, Eric P. Xing, Yong Jae Lee, Haohan Wang

Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.

Ranked #1 on Classifier calibration on CIFAR-100

Classifier calibration Image Classification +1

Paper
Code

Contrastive Learning for Diverse Disentangled Foreground Generation

no code implementations • 4 Nov 2022 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

We introduce a new method for diverse foreground generation with explicit control over various factors.

Contrastive Learning Facial Inpainting

Paper
Add Code

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning

no code implementations • 13 Jun 2022 • Zhuoran Yu, Yin Li, Yong Jae Lee

However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable.

Out-of-Distribution Detection

Paper
Add Code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Ranked #1 on Object Detection on ELEVATER

Fairness Few-Shot Image Classification +4

1,947

Paper
Code

The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

1 code implementation • 9 Apr 2022 • Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e. g., generalization across distributions) is valued.

BIG-bench Machine Learning Domain Generalization

Paper
Code

End-to-End Instance Edge Detection

no code implementations • 6 Apr 2022 • Xueyan Zou, Haotian Liu, Yong Jae Lee

We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.

Edge Detection Instance Segmentation +5

Paper
Add Code

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

1 code implementation • CVPR 2022 • Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation.

Disentanglement Image Generation +2

Paper
Code

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation • 21 Mar 2022 • Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

Ranked #12 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)

3D Shape Classification Binary Classification +4

Paper
Code

The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization

no code implementations • CVPR 2022 • Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

BIG-bench Machine Learning Domain Generalization

Paper
Add Code

Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features

1 code implementation • 5 Nov 2021 • Haohan Wang, Zeyi Huang, HANLIN ZHANG, Yong Jae Lee, Eric Xing

Machine learning has demonstrated remarkable prediction accuracy over i. i. d data, but the accuracy often drops when tested with data from another distribution.

BIG-bench Machine Learning

Paper
Code

Collaging Class-specific GANs for Semantic Image Synthesis

no code implementations • ICCV 2021 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

We propose a new approach for high resolution semantic image synthesis.

Image Generation Object

Paper
Add Code

Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation

1 code implementation • 30 Aug 2021 • Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, Yong Jae Lee

Consequently, a pragmatic equine pain classification system would use video of the unobserved horse and weak labels.

Classification

Paper
Code

Few-shot Image Generation via Cross-domain Correspondence

2 code implementations • CVPR 2021 • Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang

Training generative models, such as GANs, on a target domain containing limited examples (e. g., 10) can easily result in overfitting.

Ranked #3 on 10-shot image generation on Babies

10-shot image generation Image Generation

284

Paper
Code

Progressive Temporal Feature Alignment Network for Video Inpainting

1 code implementation • CVPR 2021 • Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee

To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.

Optical Flow Estimation Video Inpainting

Paper
Code

Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

no code implementations • 5 Apr 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).

Disentanglement Object

Paper
Add Code

Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains

no code implementations • ICLR 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).

Disentanglement Object

Paper
Add Code

YolactEdge: Real-time Instance Segmentation on the Edge

2 code implementations • 22 Dec 2020 • Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee

We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.

Real-time Instance Segmentation Semantic Segmentation

1,257

Paper
Code

Delving Deeper into Anti-aliasing in ConvNets

2 code implementations • 21 Aug 2020 • Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.

Instance Segmentation Segmentation +1

186

Paper
Code

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

2 code implementations • CVPR 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz

Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.

Ranked #1 on Weakly Supervised Object Detection on COCO test-dev

Object object-detection +3

358

Paper
Code

Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks

no code implementations • 4 Feb 2020 • Maheen Rashid, Hedvig Kjellström, Yong Jae Lee

We present a method for weakly-supervised action localization based on graph convolutions.

Action Localization Weakly Supervised Action Localization

Paper
Add Code

Audiovisual SlowFast Networks for Video Recognition

3 code implementations • 23 Jan 2020 • Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer

We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.

Action Classification Video Recognition

6,268

Paper
Code

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

1 code implementation • CVPR 2020 • Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram

Our key idea is to decorrelate feature representations of a category from its co-occurring context.

Attribute

Paper
Code

YOLACT++: Better Real-time Instance Segmentation

36 code implementations • 3 Dec 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

Then we produce instance masks by linearly combining the prototypes with the mask coefficients.

Ranked #15 on Real-time Instance Segmentation on MSCOCO (using extra training data)

Real-time Instance Segmentation Segmentation +1

4,919

Paper
Code

Password-conditioned Anonymization and Deanonymization with Face Identity Transformers

1 code implementation • 26 Nov 2019 • Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications.

Paper
Code

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

3 code implementations • CVPR 2020 • Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation.

Conditional Image Generation Disentanglement

979

Paper
Code

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

1 code implementation • NeurIPS 2020 • Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee

We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data.

Object Representation Learning

Paper
Code

YOLACT: Real-time Instance Segmentation

48 code implementations • ICCV 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

Then we produce instance masks by linearly combining the prototypes with the mask coefficients.

Ranked #21 on Real-time Instance Segmentation on MSCOCO (using extra training data)

Real-time Instance Segmentation Segmentation +2

27,716

Paper
Code

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

1 code implementation • CVPR 2019 • Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories.

Ranked #1 on Image Clustering on Stanford Cars

Conditional Image Generation Disentanglement +3

277

Paper
Code

Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond

2 code implementations • 6 Nov 2018 • Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, Yong Jae Lee

Our approach only needs to modify the input image and can work with any network to improve its performance.

Data Augmentation Emotion Recognition +5

5,249

Paper
Code

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

1 code implementation • EMNLP 2018 • Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu

The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.

Ranked #12 on Multimodal Machine Translation on Multi30K

Multimodal Machine Translation Translation

Paper
Code

DOCK: Detecting Objects by transferring Common-sense Knowledge

no code implementations • ECCV 2018 • Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee

We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.

Attribute Common Sense Reasoning +3

Paper
Add Code

Learning to Anonymize Faces for Privacy Preserving Action Detection

1 code implementation • ECCV 2018 • Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo

The end result is a video anonymizer that performs pixel-level modifications to anonymize each person's face, with minimal effect on action detection performance.

Action Detection Privacy Preserving

Paper
Code

Video Object Detection with an Aligned Spatial-Temporal Memory

no code implementations • ECCV 2018 • Fanyi Xiao, Yong Jae Lee

We introduce Spatial-Temporal Memory Networks for video object detection.

Object object-detection +1

Paper
Add Code

Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery

1 code implementation • CVPR 2018 • Zhongzheng Ren, Yong Jae Lee

In human learning, it is common to use multiple sources of information jointly.

Domain Adaptation Multi-Task Learning

Paper
Code

Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks

no code implementations • 25 May 2017 • Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee

Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest.

Paper
Add Code

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

no code implementations • CVPR 2017 • Fanyi Xiao, Leonid Sigal, Yong Jae Lee

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.

Sentence Visual Grounding

Paper
Add Code

Identifying First-person Camera Wearers in Third-person Videos

no code implementations • CVPR 2017 • Chenyou Fan, Jang-Won Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene.

Activity Recognition Object Tracking +1

Paper
Add Code

Interspecies Knowledge Transfer for Facial Keypoint Detection

1 code implementation • CVPR 2017 • Maheen Rashid, Xiuye Gu, Yong Jae Lee

Instead of directly finetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to first adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape.

Human Detection Keypoint Detection +1

Paper
Code

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization

3 code implementations • ICCV 2017 • Krishna Kumar Singh, Yong Jae Lee

We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.

Ranked #21 on Weakly Supervised Action Localization on THUMOS 2014

Action Localization Object +2

Paper
Code

End-to-End Localization and Ranking for Relative Attributes

no code implementations • 9 Aug 2016 • Krishna Kumar Singh, Yong Jae Lee

We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons.

Attribute

Paper
Add Code

Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals

no code implementations • CVPR 2016 • Fanyi Xiao, Yong Jae Lee

We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals---spatio-temporal tubes that localize the foreground objects---in an unannotated video.

Segmentation

Paper
Add Code

Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection

no code implementations • CVPR 2016 • Krishna Kumar Singh, Fanyi Xiao, Yong Jae Lee

The status quo approach to training object detectors requires expensive bounding box annotations.

Object object-detection +1

Paper
Add Code

Discovering the Spatial Extent of Relative Attributes

no code implementations • ICCV 2015 • Fanyi Xiao, Yong Jae Lee

We present a weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images.

Attribute

Paper
Add Code

FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences

no code implementations • CVPR 2015 • Tinghui Zhou, Yong Jae Lee, Stella X. Yu, Alyosha A. Efros

Given a set of poorly aligned images of the same visual concept without any annotations, we propose an algorithm to jointly bring them into pixel-wise correspondence by estimating a FlowWeb representation of the image set.

Optical Flow Estimation

Paper
Add Code

Predicting Important Objects for Egocentric Video Summarization

no code implementations • 18 May 2015 • Yong Jae Lee, Kristen Grauman

Our results on two egocentric video datasets show the method's promise relative to existing techniques for saliency and summarization.

Event Detection Video Summarization

Paper
Add Code

Weakly-supervised Discovery of Visual Pattern Configurations

no code implementations • NeurIPS 2014 • Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell

The increasing prominence of weakly labeled data nurtures a growing demand for object detection methods that can cope with minimal supervision.

Object object-detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.