Search Results for author: Jianfei Chen

Found 41 papers, 20 papers with code

SparseDM: Toward Sparse Efficient Diffusion Models

no code implementations16 Apr 2024 Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu

Diffusion models have been extensively used in data generation tasks and are recognized as one of the best generative models.

Accelerating Transformer Pre-Training with 2:4 Sparsity

no code implementations2 Apr 2024 Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu

Training large Transformers is slow, but recent innovations on GPU architecture gives us an advantage.

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

no code implementations19 Mar 2024 Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu

Moreover, for a standard transformer block, our method offers an end-to-end training speedup of 1. 42x and a 1. 49x memory reduction compared to the FP16 baseline.

Quantization

Efficient Backpropagation with Variance-Controlled Adaptive Sampling

1 code implementation27 Feb 2024 Ziteng Wang, Jianfei Chen, Jun Zhu

On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73. 87% FLOPs reduction of BP and 49. 58% FLOPs reduction of the whole training process.

DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

1 code implementation NeurIPS 2023 Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu

In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution.

Image Generation

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

no code implementations18 Oct 2023 Guande He, Peng Cui, Jianfei Chen, WenBo Hu, Jun Zhu

Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs.

Multiple-choice

Training Transformers with 4-bit Integers

1 code implementation NeurIPS 2023 Haocheng Xi, Changhao Li, Jianfei Chen, Jun Zhu

To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them.

Image Classification Machine Translation +1

Stabilizing GANs' Training with Brownian Motion Controller

no code implementations18 Jun 2023 Tianjiao Luo, Ziyu Zhu, Jianfei Chen, Jun Zhu

We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence.

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

1 code implementation30 May 2023 Guande He, Jianfei Chen, Jun Zhu

In light of these observations, we evaluate the calibration of several methods that preserve pre-trained features and show that preserving pre-trained features can improve the calibration of fine-tuned language models.

Language Modelling Masked Language Modeling +1

Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

1 code implementation6 May 2023 Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu

The probability flow ordinary differential equation (ODE) of diffusion models (i. e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation.

 Ranked #1 on Image Generation on ImageNet 32x32 (bpd metric)

Image Generation

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

3 code implementations25 Apr 2023 Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu

The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate.

D4RL Image Generation +1

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

1 code implementation2 Nov 2022 Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu

The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples.

Text-to-Image Generation

GACT: Activation Compressed Training for Generic Network Architectures

1 code implementation22 Jun 2022 Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.

Fast Lossless Neural Compression with Integer-Only Discrete Flows

1 code implementation17 Jun 2022 Siyu Wang, Jianfei Chen, Chongxuan Li, Jun Zhu, Bo Zhang

In this work, we propose Integer-only Discrete Flows (IODF), an efficient neural compressor with integer-only arithmetic.

Quantization

Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching

1 code implementation16 Jun 2022 Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu

To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs.

Denoising

Deep Ensemble as a Gaussian Process Posterior

no code implementations29 Sep 2021 Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu

Deep Ensemble (DE) is a flexible, feasible, and effective alternative to Bayesian neural networks (BNNs) for uncertainty estimation in deep learning.

Variational Inference

Implicit Normalizing Flows

1 code implementation ICLR 2021 Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, Jun Zhu

Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows.

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

no code implementations19 Jun 2020 Mong H. Ng, Kaahan Radia, Jianfei Chen, Dequan Wang, Ionel Gog, Joseph E. Gonzalez

Bird's-eye-view (BEV) is a powerful and widely adopted representation for road scenes that captures surrounding objects and their spatial locations, along with overall context in the scene.

Bird's-Eye View Semantic Segmentation Transfer Learning

VFlow: More Expressive Generative Flows with Variational Data Augmentation

1 code implementation ICML 2020 Jianfei Chen, Cheng Lu, Biqi Chenli, Jun Zhu, Tian Tian

Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations.

Ranked #30 on Image Generation on CIFAR-10 (bits/dimension metric)

Density Estimation Image Generation +2

Stochastic Expectation Maximization with Variance Reduction

1 code implementation NeurIPS 2018 Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang

However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune.

Towards Training Probabilistic Topic Models on Neuromorphic Multi-chip Systems

no code implementations10 Apr 2018 Zihao Xiao, Jianfei Chen, Jun Zhu

We also propose an extension to train pLSI and a method to prune the network to obey the limited fan-in of some NMSs.

Stochastic Optimization Topic Models

Stochastic Training of Graph Convolutional Networks

no code implementations ICLR 2018 Jianfei Chen, Jun Zhu

Previous attempts on reducing the receptive field size by subsampling neighbors do not have any convergence guarantee, and their receptive field size per node is still in the order of hundreds.

Population Matching Discrepancy and Applications in Deep Learning

no code implementations NeurIPS 2017 Jianfei Chen, Chongxuan Li, Yizhong Ru, Jun Zhu

In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective.

Domain Adaptation

Stochastic Training of Graph Convolutional Networks with Variance Reduction

2 code implementations ICML 2018 Jianfei Chen, Jun Zhu, Le Song

Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds.

ZhuSuan: A Library for Bayesian Deep Learning

1 code implementation18 Sep 2017 Jiaxin Shi, Jianfei Chen, Jun Zhu, Shengyang Sun, Yucen Luo, Yihong Gu, Yuhao Zhou

In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning.

Probabilistic Programming regression

Scalable Inference for Nested Chinese Restaurant Process Topic Models

no code implementations23 Feb 2017 Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu

Finally, we propose an efficient distributed implementation of PCGS through vectorization, pre-processing, and a careful design of the concurrent data structures and communication strategy.

Topic Models Variational Inference

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

no code implementations8 Oct 2016 Kaiwei Li, Jianfei Chen, WenGuang Chen, Jun Zhu

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images.

Topic Models

Scaling up Dynamic Topic Models

1 code implementation19 Feb 2016 Arnab Bhadury, Jianfei Chen, Jun Zhu, Shixia Liu

Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data.

Time Series Time Series Analysis +1

Streaming Gibbs Sampling for LDA Model

no code implementations6 Jan 2016 Yang Gao, Jianfei Chen, Jun Zhu

Streaming variational Bayes (SVB) is successful in learning LDA models in an online manner.

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation

no code implementations29 Oct 2015 Jianfei Chen, Kaiwei Li, Jun Zhu, WenGuang Chen

We then develop WarpLDA, an LDA sampler which achieves both the best O(1) time complexity per token and the best O(K) scope of random access.

Dropout Training for SVMs with Data Augmentation

no code implementations10 Aug 2015 Ning Chen, Jun Zhu, Jianfei Chen, Ting Chen

Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.

Data Augmentation Representation Learning

Big Learning with Bayesian Methods

no code implementations24 Nov 2014 Jun Zhu, Jianfei Chen, Wen-Bo Hu, Bo Zhang

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data.

Bayesian Inference BIG-bench Machine Learning +1

Dropout Training for Support Vector Machines

no code implementations16 Apr 2014 Ning Chen, Jun Zhu, Jianfei Chen, Bo Zhang

To deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.