Search Results for author: Zixiang Chen

Found 20 papers, 5 papers with code

Guided Discrete Diffusion for Electronic Health Record Generation

no code implementations18 Apr 2024 Zixiang Chen, Jun Han, YongQian Li, Yiwen Kou, Eran Halperin, Robert E. Tillman, Quanquan Gu

Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e. g., disease progression prediction, clinical trial design, and health economics and outcomes research.

Data Augmentation

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent

no code implementations18 Apr 2024 Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

We then demonstrate how a trained neural network with SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

no code implementations15 Feb 2024 Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs).

Reinforcement Learning (RL) Text-to-Image Generation

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

2 code implementations2 Jan 2024 Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu

In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data.

Fast Sampling via De-randomization for Discrete Diffusion Models

no code implementations14 Dec 2023 Zixiang Chen, Huizhuo Yuan, YongQian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu

Despite its success in continuous spaces, discrete diffusion models, which apply to domains such as texts and natural languages, remain under-studied and often suffer from slow generation speed.

Image Generation Machine Translation +1

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

3 code implementations7 Nov 2023 Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu

While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped.

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

no code implementations12 Oct 2023 Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters.

In-Context Learning regression

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

no code implementations2 Oct 2023 Zixiang Chen, Yihe Deng, Yuanzhi Li, Quanquan Gu

Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e. g., text and images) to improve the model performance.

Image Generation Representation Learning +1

Benign Overfitting for Two-layer ReLU Convolutional Neural Networks

1 code implementation7 Mar 2023 Yiwen Kou, Zixiang Chen, Yuanzhou Chen, Quanquan Gu

We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.

Vocal Bursts Valence Prediction

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

no code implementations3 Mar 2023 Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation.

regression Vocal Bursts Intensity Prediction

ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

no code implementations4 Nov 2022 Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao Shen, HaiYan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wangg, Dong Liang, Tianye Niu, Xiaohua Zhuc, Zhanli Hua

In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information.

Segmentation STS +1

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

no code implementations30 Sep 2022 Zixiang Chen, Chris Junchi Li, Angela Yuan, Quanquan Gu, Michael I. Jordan

With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Towards Understanding Mixture of Experts in Deep Learning

2 code implementations4 Aug 2022 Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li

To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.

Benign Overfitting in Two-layer Convolutional Neural Networks

no code implementations14 Feb 2022 Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu

In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN).

Vocal Bursts Valence Prediction

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

no code implementations NeurIPS 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.

Self-training Converts Weak Learners to Strong Learners in Mixture Models

no code implementations25 Jun 2021 Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu

We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension.

Binary Classification

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

no code implementations15 Feb 2021 Zixiang Chen, Dongruo Zhou, Quanquan Gu

To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

no code implementations NeurIPS 2020 Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang

In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a "kernel-like" behavior.

Learning Theory Vocal Bursts Valence Prediction

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

no code implementations ICLR 2021 Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target error $\epsilon^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees.

Open-Ended Question Answering

Stein Neural Sampler

1 code implementation8 Oct 2018 Tianyang Hu, Zixiang Chen, Hanxi Sun, Jincheng Bai, Mao Ye, Guang Cheng

We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density.

Cannot find the paper you are looking for? You can Submit a new open access paper.