Search Results for author: Pengchuan Zhang

Found 55 papers, 34 papers with code

Evaluating Text-to-Visual Generation with Image-to-Text Generation

2 code implementations • 1 Apr 2024 • Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations.

Question Answering Text Generation +2

Paper
Code

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

no code implementations • 15 Nov 2023 • Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee, Yixin Nie

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning.

Ranked #1 on Visual Reasoning on Winoground

Visual Reasoning

Paper
Add Code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Ranked #10 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +4

24,940

Paper
Code

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

no code implementations • 20 Sep 2023 • Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, SerNam Lim

While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length.

Temporal Action Localization Video Classification +1

Paper
Add Code

UniVTG: Towards Unified Video-Language Temporal Grounding

1 code implementation • ICCV 2023 • Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou

Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.

Ranked #3 on Natural Language Moment Retrieval on TACoS

Highlight Detection Moment Retrieval +3

282

Paper
Code

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

1 code implementation • ICCV 2023 • Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.

Ranked #1 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Moment Queries +4

Paper
Code

Revisiting the Role of Language Priors in Vision-Language Models

1 code implementation • 2 Jun 2023 • Zhiqiu Lin, Xinyue Chen, Deepak Pathak, Pengchuan Zhang, Deva Ramanan

Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image.

Ranked #45 on Visual Reasoning on Winoground

Image-text matching Language Modelling +6

Paper
Code

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

no code implementations • 23 May 2023 • Harman Singh, Pengchuan Zhang, Qifan Wang, Mengjiao Wang, Wenhan Xiong, Jingfei Du, Yu Chen

Along with this, we propose novel negative mining techniques in the scene graph space for improving attribute binding and relation understanding.

Ranked #1 on Image Retrieval on CREPE (Compositional REPresentation Evaluation) (Recall@1 (HN-Comp, UC) metric)

Attribute Contrastive Learning +4

Paper
Add Code

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Paper
Add Code

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

Paper
Add Code

Unifying Tracking and Image-Video Object Detection

no code implementations • 20 Nov 2022 • Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim

We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model.

Multi-Object Tracking Object +2

Paper
Add Code

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

Vision-language (VL) pre-training has recently received considerable attention.

Ranked #1 on Phrase Grounding on Flickr30k Entities Dev

Described Object Detection Image Captioning +5

123

Paper
Code

GLIPv2: Unifying Localization and Vision-Language Understanding

1 code implementation • 12 Jun 2022 • Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning).

Ranked #1 on Phrase Grounding on Flickr30k Entities Test (using extra training data)

Contrastive Learning Image Captioning +7

1,980

Paper
Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

369

Paper
Code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Ranked #1 on Object Detection on ELEVATER

Fairness Few-Shot Image Classification +4

1,980

Paper
Code

Missingness Bias in Model Debugging

1 code implementation • ICLR 2022 • Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry

Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools.

Paper
Code

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation • CVPR 2022 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

369

Paper
Code

Parameter-efficient Model Adaptation for Vision Transformers

2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.

Benchmarking Classification +2

Paper
Code

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

no code implementations • 3 Mar 2022 • Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Pengchuan Zhang, Lei Zhang

This survey is inspired by the remarkable progress in both computer vision and natural language processing, and recent trends shifting from single modality processing to multiple modality comprehension.

Few-Shot Learning Representation Learning

Paper
Add Code

RegionCLIP: Region-based Language-Image Pretraining

1 code implementation • CVPR 2022 • Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.

Ranked #11 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Image Classification Object +3

650

Paper
Code

Grounded Language-Image Pre-training

2 code implementations • CVPR 2022 • Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Ranked #1 on 2D Object Detection on RF100

Described Object Detection Few-Shot Object Detection +1

1,980

Paper
Code

Focal Attention for Long-Range Interactions in Vision Transformers

1 code implementation • NeurIPS 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification object-detection +2

542

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

369

Paper
Code

An Empirical Study of Training End-to-End Vision-and-Language Transformers

2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.

Ranked #20 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Cross-Modal Retrieval Decoder +2

350

Paper
Code

Image Scene Graph Generation (SGG) Benchmark

1 code implementation • 27 Jul 2021 • Xiaotian Han, Jianwei Yang, Houdong Hu, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.

Attribute Graph Generation +6

376

Paper
Code

Focal Self-attention for Local-Global Interactions in Vision Transformers

3 code implementations • 1 Jul 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.

Ranked #17 on Instance Segmentation on COCO test-dev

Image Classification Instance Segmentation +3

1,188

Paper
Code

Efficient Self-supervised Vision Transformers for Representation Learning

1 code implementation • ICLR 2022 • Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Mei Gao, Bin Xiao, Xiyang Dai, Lu Yuan, Jianfeng Gao

This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning.

Ranked #16 on Self-Supervised Image Classification on ImageNet

Representation Learning Self-Supervised Image Classification

403

Paper
Code

3DB: A Framework for Debugging Computer Vision Models

1 code implementation • 7 Jun 2021 • Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.

123

Paper
Code

Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference

no code implementations • 12 May 2021 • Shumao Zhang, Pengchuan Zhang, Thomas Y. Hou

We propose a Multiscale Invertible Generative Network (MsIGN) and associated training algorithm that leverages multiscale structure to solve high-dimensional Bayesian inference.

Bayesian Inference Image Generation +1

Paper
Add Code

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations • ICCV 2021 • Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Ranked #45 on Instance Segmentation on COCO minival

Image Classification Instance Segmentation +2

403

Paper
Code

Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix

no code implementations • 16 Jan 2021 • Ruocheng Guo, Pengchuan Zhang, Hao liu, Emre Kiciman

Nevertheless, we find that the performance of IRM can be dramatically degraded under \emph{strong $\Lambda$ spuriousness} -- when the spurious correlation between the spurious features and the class label is strong due to the strong causal influence of their common cause, the domain label, on both of them (see Fig.

Paper
Add Code

VinVL: Revisiting Visual Representations in Vision-Language Models

7 code implementations • CVPR 2021 • Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao

In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks.

Ranked #2 on Image-text matching on CommercialAdsDataset

Image Captioning Image-text matching +4

1,031

Paper
Code

Dynamic DETR: End-to-End Object Detection With Dynamic Attention

no code implementations • ICCV 2021 • Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang

To mitigate the second limitation of learning difficulty, we introduce a dynamic decoder by replacing the cross-attention module with a ROI-based dynamic attention in the Transformer decoder.

Decoder object-detection +1

Paper
Add Code

MiniVLM: A Smaller and Faster Vision-Language Model

no code implementations • 13 Dec 2020 • JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.

Language Modelling

Paper
Add Code

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network

no code implementations • 3 Oct 2020 • Yi Wei, Zhe Gan, Wenbo Li, Siwei Lyu, Ming-Ching Chang, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

We present Mask-guided Generative Adversarial Network (MagGAN) for high-resolution face attribute editing, in which semantic facial masks from a pre-trained face parser are used to guide the fine-grained image editing process.

Attribute Generative Adversarial Network +1

Paper
Add Code

Training Sparse Neural Networks using Compressed Sensing

1 code implementation • 21 Aug 2020 • Jonathan W. Siegel, Jianhong Chen, Pengchuan Zhang, Jinchao Xu

The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights.

Paper
Code

Novel Human-Object Interaction Detection via Adversarial Domain Generalization

no code implementations • 22 May 2020 • Yuhang Song, Wenbo Li, Lei Zhang, Jianwei Yang, Emre Kiciman, Hamid Palangi, Jianfeng Gao, C. -C. Jay Kuo, Pengchuan Zhang

We study in this paper the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.

Domain Generalization Human-Object Interaction Detection +1

Paper
Add Code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

1,214

Paper
Code

Object-Centric Image Generation from Layouts

no code implementations • 16 Mar 2020 • Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio, R. Devon Hjelm, Shikhar Sharma

In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well.

Ranked #1 on Layout-to-Image Generation on COCO-Stuff 64x64

Generative Adversarial Network Layout-to-Image Generation +1

Paper
Add Code

Statistical Adaptive Stochastic Gradient Methods

1 code implementation • 25 Feb 2020 • Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao

We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods.

Scheduling

Paper
Code

Understanding the Role of Momentum in Stochastic Gradient Methods

1 code implementation • NeurIPS 2019 • Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning.

Stochastic Optimization

Paper
Code

Statistical Adaptive Stochastic Optimization

no code implementations • 25 Sep 2019 • Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao

We investigate statistical methods for automatically scheduling the learning rate (step size) in stochastic optimization.

Scheduling Stochastic Optimization

Paper
Add Code

Using Statistics to Automate Stochastic Optimization

no code implementations • NeurIPS 2019 • Hunter Lang, Pengchuan Zhang, Lin Xiao

Despite the development of numerous adaptive optimizers, tuning the learning rate of stochastic gradient methods remains a major roadblock to obtaining good practical performance in machine learning.

Stochastic Optimization

Paper
Add Code

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

1 code implementation • IJCNLP 2019 • Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao

This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems.

Image Captioning Text Matching

Paper
Code

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, huan zhang, Ilya Razenshteyn, Sebastien Bubeck

In this paper, we employ adversarial training to improve the performance of randomized smoothing.

Adversarial Attack Adversarial Defense

221

Paper
Code

Object-driven Text-to-Image Synthesis via Adversarial Training

1 code implementation • CVPR 2019 • Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao

In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes.

Image Generation Object

283

Paper
Code

A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks

3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, huan zhang, Cho-Jui Hsieh, Pengchuan Zhang

This framework works for neural networks with diverse architectures and nonlinearities and covers both primal and dual views of robustness verification.

206

Paper
Code

Towards Coherent and Cohesive Long-form Text Generation

no code implementations • WS 2019 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Michel Galley, Chris Brockett, Mengdi Wang, Jianfeng Gao

Generating coherent and cohesive long-form texts is a challenging task.

Language Modelling Sentence +1

Paper
Add Code

RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications

4 code implementations • 28 Oct 2018 • Huan Zhang, Pengchuan Zhang, Cho-Jui Hsieh

The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks.

Paper
Code

A bird's eye view on coherence, and a worm's eye view on cohesion

no code implementations • 27 Sep 2018 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Mengdi Wang, Jianfeng Gao

Generating coherent and cohesive long-form texts is a challenging problem in natural language generation.

Language Modelling Text Generation

Paper
Add Code

Turbo Learning for Captionbot and Drawingbot

no code implementations • NeurIPS 2018 • Qiuyuan Huang, Pengchuan Zhang, Dapeng Wu, Lei Zhang

We study in this paper the problems of both image captioning and text-to-image generation, and present a novel turbo learning approach to jointly training an image-to-text generator (a. k. a.

Image Captioning Text Generation +1

Paper
Add Code

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

19 code implementations • CVPR 2018 • Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Ranked #1 on Text-to-Image Generation on MS-COCO

Generative Adversarial Network Image-text matching +2

1,321

Paper
Code

On the Discrimination-Generalization Tradeoff in GANs

no code implementations • ICLR 2018 • Pengchuan Zhang, Qiang Liu, Dengyong Zhou, Tao Xu, Xiaodong He

When evaluated with neural distance, our bounds show that generalization is guaranteed as long as the discriminator set is small enough, regardless of the size of the generator or hypothesis set.

Generalization Bounds

Paper
Add Code

A sparse decomposition of low rank symmetric positive semi-definite matrices

1 code implementation • 3 Jul 2016 • Thomas Y. Hou, Qin Li, Pengchuan Zhang

In this paper, we partition the indices from 1 to $N$ into several patches and propose to quantify the sparseness of a vector by the number of patches on which it is nonzero, which is called patch-wise sparseness.

Numerical Analysis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.