no code implementations • 22 Mar 2024 • Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance.
2 code implementations • 26 Feb 2024 • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.
no code implementations • 23 Feb 2024 • Yichen Xie, Hongge Chen, Gregory P. Meyer, Yong Jae Lee, Eric M. Wolff, Masayoshi Tomizuka, Wei Zhan, Yuning Chai, Xin Huang
Observations from different angles enable the recovery of 3D object states from 2D image inputs if we can identify the same instance in different input frames.
1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee
We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.
no code implementations • 18 Jan 2024 • Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee
With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner.
1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang
The proposed interface is adaptive to new tasks, and new models.
no code implementations • 4 Dec 2023 • Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao, Yong Jae Lee
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts.
no code implementations • 1 Dec 2023 • Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee
Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.
no code implementations • 13 Nov 2023 • Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee
The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations.
5 code implementations • 5 Oct 2023 • Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning.
Ranked #3 on visual instruction following on LLaVA-Bench
Factual Inconsistency Detection in Chart Captioning visual instruction following +1
1 code implementation • ICCV 2023 • Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee
Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.
Ranked #15 on Domain Generalization on PACS
no code implementations • 19 Sep 2023 • Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).
1 code implementation • 26 Jul 2023 • Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images.
no code implementations • 25 Jul 2023 • Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.
no code implementations • 29 Jun 2023 • Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee
Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields.
no code implementations • 9 Jun 2023 • Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee
By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.
9 code implementations • NeurIPS 2023 • Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.
Ranked #4 on Visual Question Answering on BenchLMM
2 code implementations • NeurIPS 2023 • Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee
In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).
no code implementations • 13 Mar 2023 • Zhuoran Yu, Yin Li, Yong Jae Lee
Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i. e., close to the current training data.
1 code implementation • CVPR 2023 • Utkarsh Ojha, Yuheng Li, Yong Jae Lee
In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images.
1 code implementation • CVPR 2023 • Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li
Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability.
Ranked #1 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (using extra training data)
1 code implementation • CVPR 2023 • Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee
Large-scale text-to-image diffusion models have made amazing advances.
Ranked #4 on Conditional Text-to-Image Synthesis on COCO-MIG
1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.
Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)
1 code implementation • 9 Dec 2022 • Minh-Long Luu, Zeyi Huang, Eric P. Xing, Yong Jae Lee, Haohan Wang
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
Ranked #1 on Classifier calibration on CIFAR-100
no code implementations • 4 Nov 2022 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh
We introduce a new method for diverse foreground generation with explicit control over various factors.
no code implementations • 13 Jun 2022 • Zhuoran Yu, Yin Li, Yong Jae Lee
However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable.
8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao
In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.
Ranked #1 on Object Detection on ELEVATER
1 code implementation • 9 Apr 2022 • Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing
Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e. g., generalization across distributions) is valued.
no code implementations • 6 Apr 2022 • Xueyan Zou, Haotian Liu, Yong Jae Lee
We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.
1 code implementation • CVPR 2022 • Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee
3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation.
1 code implementation • 21 Mar 2022 • Haotian Liu, Mu Cai, Yong Jae Lee
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.
Ranked #12 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)
no code implementations • CVPR 2022 • Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing
Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e. g., generalization across distributions) is valued.
1 code implementation • 5 Nov 2021 • Haohan Wang, Zeyi Huang, HANLIN ZHANG, Yong Jae Lee, Eric Xing
Machine learning has demonstrated remarkable prediction accuracy over i. i. d data, but the accuracy often drops when tested with data from another distribution.
no code implementations • ICCV 2021 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh
We propose a new approach for high resolution semantic image synthesis.
1 code implementation • 30 Aug 2021 • Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, Yong Jae Lee
Consequently, a pragmatic equine pain classification system would use video of the unobserved horse and weak labels.
2 code implementations • CVPR 2021 • Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang
Training generative models, such as GANs, on a target domain containing limited examples (e. g., 10) can easily result in overfitting.
Ranked #3 on 10-shot image generation on Babies
1 code implementation • CVPR 2021 • Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee
To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.
no code implementations • 5 Apr 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee
We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).
no code implementations • ICLR 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee
We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).
2 code implementations • 22 Dec 2020 • Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee
We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.
2 code implementations • 21 Aug 2020 • Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee
Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.
2 code implementations • CVPR 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz
Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.
Ranked #1 on Weakly Supervised Object Detection on COCO test-dev
no code implementations • 4 Feb 2020 • Maheen Rashid, Hedvig Kjellström, Yong Jae Lee
We present a method for weakly-supervised action localization based on graph convolutions.
3 code implementations • 23 Jan 2020 • Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer
We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.
1 code implementation • CVPR 2020 • Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
36 code implementations • 3 Dec 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Ranked #15 on Real-time Instance Segmentation on MSCOCO (using extra training data)
1 code implementation • 26 Nov 2019 • Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee
Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications.
3 code implementations • CVPR 2020 • Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation.
1 code implementation • NeurIPS 2020 • Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee
We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data.
48 code implementations • ICCV 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Ranked #21 on Real-time Instance Segmentation on MSCOCO (using extra training data)
1 code implementation • CVPR 2019 • Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories.
Ranked #1 on Image Clustering on Stanford Cars
2 code implementations • 6 Nov 2018 • Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, Yong Jae Lee
Our approach only needs to modify the input image and can work with any network to improve its performance.
1 code implementation • EMNLP 2018 • Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu
The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.
Ranked #12 on Multimodal Machine Translation on Multi30K
no code implementations • ECCV 2018 • Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee
We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.
1 code implementation • ECCV 2018 • Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo
The end result is a video anonymizer that performs pixel-level modifications to anonymize each person's face, with minimal effect on action detection performance.
no code implementations • ECCV 2018 • Fanyi Xiao, Yong Jae Lee
We introduce Spatial-Temporal Memory Networks for video object detection.
1 code implementation • CVPR 2018 • Zhongzheng Ren, Yong Jae Lee
In human learning, it is common to use multiple sources of information jointly.
no code implementations • 25 May 2017 • Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee
Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest.
no code implementations • CVPR 2017 • Fanyi Xiao, Leonid Sigal, Yong Jae Lee
We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.
no code implementations • CVPR 2017 • Chenyou Fan, Jang-Won Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo
We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene.
1 code implementation • CVPR 2017 • Maheen Rashid, Xiuye Gu, Yong Jae Lee
Instead of directly finetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to first adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape.
3 code implementations • ICCV 2017 • Krishna Kumar Singh, Yong Jae Lee
We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.
Ranked #21 on Weakly Supervised Action Localization on THUMOS 2014
no code implementations • 9 Aug 2016 • Krishna Kumar Singh, Yong Jae Lee
We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons.
no code implementations • CVPR 2016 • Fanyi Xiao, Yong Jae Lee
We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals---spatio-temporal tubes that localize the foreground objects---in an unannotated video.
no code implementations • CVPR 2016 • Krishna Kumar Singh, Fanyi Xiao, Yong Jae Lee
The status quo approach to training object detectors requires expensive bounding box annotations.
no code implementations • ICCV 2015 • Fanyi Xiao, Yong Jae Lee
We present a weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images.
no code implementations • CVPR 2015 • Tinghui Zhou, Yong Jae Lee, Stella X. Yu, Alyosha A. Efros
Given a set of poorly aligned images of the same visual concept without any annotations, we propose an algorithm to jointly bring them into pixel-wise correspondence by estimating a FlowWeb representation of the image set.
no code implementations • 18 May 2015 • Yong Jae Lee, Kristen Grauman
Our results on two egocentric video datasets show the method's promise relative to existing techniques for saliency and summarization.
no code implementations • NeurIPS 2014 • Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell
The increasing prominence of weakly labeled data nurtures a growing demand for object detection methods that can cope with minimal supervision.