no code implementations • 21 Nov 2023 • Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie
Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others.
no code implementations • 17 Nov 2023 • Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.
no code implementations • 23 Jun 2023 • Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu
Current ASR systems are mainly trained and evaluated at the utterance level.
no code implementations • 17 Apr 2023 • Jie An, Songyang Zhang, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo, Xi Yin
In contrast, we propose a parameter-free temporal shift module that can leverage the spatial U-Net as is for video generation.
1 code implementation • CVPR 2023 • Vishal Asnani, Xi Yin, Tal Hassner, Xiaoming Liu
Finally, we show that MaLP can be used as a discriminator for improving the generation quality of GMs.
no code implementations • CVPR 2023 • Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin
Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based.
2 code implementations • 29 Sep 2022 • Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).
Ranked #3 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)
2 code implementations • 17 Apr 2022 • Thomas Hayes, Songyang Zhang, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, Devi Parikh
Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.
1 code implementation • 7 Apr 2022 • Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh
Videos are created to express emotion, exchange information, and share experiences.
Ranked #15 on Video Generation on UCF-101
1 code implementation • CVPR 2022 • Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, Xiaoming Liu
That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one.
no code implementations • 29 Sep 2021 • Samrudhdhi Bharatkumar Rangrej, Kevin J Liang, Xi Yin, Guan Pang, Theofanis Karaletsos, Lior Wolf, Tal Hassner
Few-shot learning (FSL) methods aim to generalize a model to new unseen classes using only a small number of support examples.
1 code implementation • 15 Jun 2021 • Vishal Asnani, Xi Yin, Tal Hassner, Xiaoming Liu
To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints.
1 code implementation • CVPR 2021 • Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner
Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results.
no code implementations • 16 Mar 2021 • Yiying Yang, Xi Yin, Haiqin Yang, Xingjian Fei, Hao Peng, Kaijie Zhou, Kunfeng Lai, Jianping Shen
Entity synonyms discovery is crucial for entity-leveraging applications.
1 code implementation • CVPR 2021 • Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner
Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators.
Ranked #5 on Head Pose Estimation on BIWI
1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.
no code implementations • 28 Sep 2020 • Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).
Ranked #3 on Image Captioning on nocaps-XD out-of-domain
1 code implementation • 22 May 2020 • Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang
Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.
4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.
Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)
no code implementations • 26 Nov 2019 • Xi Yin, Ying Tai, Yuge Huang, Xiaoming Liu
FAN can leverage both paired and unpaired data as we disentangle the features into identity and non-identity components and adapt the distribution of the identity features, which breaks the limit of current face super-resolution methods.
no code implementations • CVPR 2019 • Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
no code implementations • CVPR 2019 • Ziyuan Zhang, Luan Tran, Xi Yin, Yousef Atoum, Xiaoming Liu, Jian Wan, Nanxin Wang
Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features.
no code implementations • 23 Mar 2018 • Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
no code implementations • CVPR 2017 • Luan Tran, Xi Yin, Xiaoming Liu
The large pose discrepancy between two face images is one of the key challenges in face recognition.
2 code implementations • ICCV 2017 • Garrick Brazil, Xi Yin, Xiaoming Liu
When placed properly, the additional supervision helps guide features in shared layers to become more sophisticated and helpful for the downstream pedestrian detector.
Ranked #20 on Pedestrian Detection on Caltech
no code implementations • 31 May 2017 • Luan Tran, Xi Yin, Xiaoming Liu
First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition.
no code implementations • 16 May 2017 • Minjae Cho, Scott Collier, Xi Yin
We study the Virasoro conformal block decomposition of the genus two partition function of a two-dimensional CFT by expanding around a Z3-invariant Riemann surface that is a three-fold cover of the Riemann sphere branched at four points, and explore constraints from genus two modular invariance and unitarity.
High Energy Physics - Theory
no code implementations • ICCV 2017 • Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments.
1 code implementation • 15 Feb 2017 • Xi Yin, Xiaoming Liu
First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks.
no code implementations • 1 Feb 2017 • Scott Collier, Petr Kravchuk, Ying-Hsuan Lin, Xi Yin
We introduce spectral functions that capture the distribution of OPE coefficients and density of states in two-dimensional conformal field theories, and show that nontrivial upper and lower bounds on the spectral function can be obtained from semidefinite programming.
High Energy Physics - Theory
no code implementations • 17 Oct 2016 • Ying-Hsuan Lin, Shu-Heng Shao, Yifan Wang, Xi Yin
We find a simple relation between two-dimensional BPS N=2 superconformal blocks and bosonic Virasoro conformal blocks, which allows us to analyze the crossing equations for BPS 4-point functions in unitary (2, 2) superconformal theories numerically with semidefinite programming.
High Energy Physics - Theory
no code implementations • 22 Aug 2016 • Scott Collier, Ying-Hsuan Lin, Xi Yin
We constrain the spectrum of two-dimensional unitary, compact conformal field theories with central charge c > 1 using modular bootstrap.
High Energy Physics - Theory Strongly Correlated Electrons
1 code implementation • 2 May 2015 • Xi Yin, Xiaoming Liu, Jin Chen, David M. Kramer
First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates.