Search Results for author: Xi Yin

Found 33 papers, 14 papers with code

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

no code implementations17 Nov 2023 Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.

Text-to-Video Generation Video Generation

MaLP: Manipulation Localization Using a Proactive Scheme

1 code implementation CVPR 2023 Vishal Asnani, Xi Yin, Tal Hassner, Xiaoming Liu

Finally, we show that MaLP can be used as a discriminator for improving the generation quality of GMs.

Attribute

SpaText: Spatio-Textual Representation for Controllable Image Generation

no code implementations CVPR 2023 Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin

Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based.

Text-to-Image Generation

Make-A-Video: Text-to-Video Generation without Text-Video Data

2 code implementations29 Sep 2022 Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Ranked #3 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Super-Resolution +2

Proactive Image Manipulation Detection

1 code implementation CVPR 2022 Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, Xiaoming Liu

That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one.

Image Manipulation Image Manipulation Detection

Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images

1 code implementation15 Jun 2021 Vishal Asnani, Xi Yin, Tal Hassner, Xiaoming Liu

To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints.

DeepFake Detection Face Swapping

A Multiplexed Network for End-to-End, Multilingual OCR

1 code implementation CVPR 2021 Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner

Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results.

Optical Character Recognition (OCR) Text Detection

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

1 code implementation CVPR 2021 Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner

Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators.

3D Face Alignment Face Alignment +3

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation CVPR 2021 Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Language Modelling Masked Language Modeling +4

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations28 Sep 2020 Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Image Captioning Object +1

Hashing-based Non-Maximum Suppression for Crowded Object Detection

1 code implementation22 May 2020 Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang

Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.

object-detection Object Detection +1

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations ECCV 2020 Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

 Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

FAN: Feature Adaptation Network for Surveillance Face Recognition and Normalization

no code implementations26 Nov 2019 Xi Yin, Ying Tai, Yuge Huang, Xiaoming Liu

FAN can leverage both paired and unpaired data as we disentangle the features into identity and non-identity components and adapt the distribution of the identity features, which breaks the limit of current face super-resolution methods.

Face Recognition Super-Resolution

Feature Transfer Learning for Face Recognition With Under-Represented Data

no code implementations CVPR 2019 Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.

Disentanglement Face Recognition +1

Feature Transfer Learning for Deep Face Recognition with Under-Represented Data

no code implementations23 Mar 2018 Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.

Disentanglement Face Recognition +1

Illuminating Pedestrians via Simultaneous Detection & Segmentation

2 code implementations ICCV 2017 Garrick Brazil, Xi Yin, Xiaoming Liu

When placed properly, the additional supervision helps guide features in shared layers to become more sophisticated and helpful for the downstream pedestrian detector.

Autonomous Driving Pedestrian Detection +2

Representation Learning by Rotating Your Faces

no code implementations31 May 2017 Luan Tran, Xi Yin, Xiaoming Liu

First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition.

Face Recognition Generative Adversarial Network +4

Genus Two Modular Bootstrap

no code implementations16 May 2017 Minjae Cho, Scott Collier, Xi Yin

We study the Virasoro conformal block decomposition of the genus two partition function of a two-dimensional CFT by expanding around a Z3-invariant Riemann surface that is a three-fold cover of the Riemann sphere branched at four points, and explore constraints from genus two modular invariance and unitarity.

High Energy Physics - Theory

Towards Large-Pose Face Frontalization in the Wild

no code implementations ICCV 2017 Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments.

3D Reconstruction Face Recognition +1

Multi-Task Convolutional Neural Network for Pose-Invariant Face Recognition

1 code implementation15 Feb 2017 Xi Yin, Xiaoming Liu

First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks.

Face Recognition Multi-Task Learning +1

Bootstrapping the Spectral Function: On the Uniqueness of Liouville and the Universality of BTZ

no code implementations1 Feb 2017 Scott Collier, Petr Kravchuk, Ying-Hsuan Lin, Xi Yin

We introduce spectral functions that capture the distribution of OPE coefficients and density of states in two-dimensional conformal field theories, and show that nontrivial upper and lower bounds on the spectral function can be obtained from semidefinite programming.

High Energy Physics - Theory

(2,2) Superconformal Bootstrap in Two Dimensions

no code implementations17 Oct 2016 Ying-Hsuan Lin, Shu-Heng Shao, Yifan Wang, Xi Yin

We find a simple relation between two-dimensional BPS N=2 superconformal blocks and bosonic Virasoro conformal blocks, which allows us to analyze the crossing equations for BPS 4-point functions in unitary (2, 2) superconformal theories numerically with semidefinite programming.

High Energy Physics - Theory

Modular Bootstrap Revisited

no code implementations22 Aug 2016 Scott Collier, Ying-Hsuan Lin, Xi Yin

We constrain the spectrum of two-dimensional unitary, compact conformal field theories with central charge c > 1 using modular bootstrap.

High Energy Physics - Theory Strongly Correlated Electrons

Joint Multi-Leaf Segmentation, Alignment and Tracking from Fluorescence Plant Videos

1 code implementation2 May 2015 Xi Yin, Xiaoming Liu, Jin Chen, David M. Kramer

First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates.

Cannot find the paper you are looking for? You can Submit a new open access paper.