Search Results for author: Qi Meng

Found 39 papers, 10 papers with code

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

1 code implementation24 Nov 2023 Rui Zhang, Qi Meng, Zhi-Ming Ma

To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms.

Operator learning Self-Supervised Learning

Power-law Dynamic arising from machine learning

no code implementations16 Jun 2023 Wei Chen, Weitao Du, Zhi-Ming Ma, Qi Meng

We study a kind of new SDE that was arisen from the research on optimization in machine learning, we call it power-law dynamic because its stationary distribution cannot have sub-Gaussian tail and obeys power-law.

O-GNN: Incorporating Ring Priors into Molecular Modeling

1 code implementation ICLR 2023 Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.

 Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Regression Molecular Property Prediction +3

NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition

no code implementations20 Feb 2023 Xinquan Huang, Wenlei Shi, Qi Meng, Yue Wang, Xiaotian Gao, Jia Zhang, Tie-Yan Liu

Neural networks have shown great potential in accelerating the solution of partial differential equations (PDEs).

Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation

1 code implementation10 Feb 2023 Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu

To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

no code implementations31 Oct 2022 Zihan Wang, Qi Meng, HaiFeng Lan, Xinrui Zhang, Kehao Guo, Akshat Gupta

While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i. e., languages with no pretrained speech-to-text recognition models.

Neural Architecture Search Speech Emotion Recognition

Provable Adaptivity in Adam

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

no code implementations20 Jun 2022 Rui Zhang, Peiyan Hu, Qi Meng, Yue Wang, Rongchan Zhu, Bingguang Chen, Zhi-Ming Ma, Tie-Yan Liu

To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

1 code implementation13 Apr 2022 Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

SE(3) Equivariant Graph Neural Networks with Complete Local Frames

1 code implementation26 Oct 2021 Weitao Du, He Zhang, Yuanqi Du, Qi Meng, Wei Chen, Bin Shao, Tie-Yan Liu

In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently.

Computational Efficiency

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

no code implementations8 Jun 2021 Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

1 code implementation11 Dec 2020 Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

no code implementations24 Jun 2020 Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.

Interpreting Basis Path Set in Neural Networks

no code implementations18 Oct 2019 Juanping Zhu, Qi Meng, Wei Chen, Zhi-Ming Ma

Based on basis path set, G-SGD algorithm significantly outperforms conventional SGD algorithm in optimizing neural networks.

Path Space for Recurrent Neural Networks with ReLU Activations

no code implementations25 Sep 2019 Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.

P-BN: Towards Effective Batch Normalization in the Path Space

no code implementations25 Sep 2019 Xufang Luo, Qi Meng, Wei Chen, Tie-Yan Liu

Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space.

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations ICLR 2019 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation14 Mar 2019 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +2

Positively Scale-Invariant Flatness of ReLU Neural Networks

no code implementations6 Mar 2019 Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.

Expressiveness in Deep Reinforcement Learning

no code implementations27 Sep 2018 Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu

Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.

Atari Games reinforcement-learning +2

A Convergent Variant of the Boltzmann Softmax Operator in Reinforcement Learning

no code implementations27 Sep 2018 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu

We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.

Atari Games Q-Learning +2

Target Transfer Q-Learning and Its Convergence Analysis

no code implementations21 Sep 2018 Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu

In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.

Q-Learning Reinforcement Learning (RL) +1

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations19 Sep 2018 Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Differential Equations for Modeling Asynchronous Algorithms

no code implementations8 May 2018 Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations11 Feb 2018 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

1 code implementation NeurIPS 2017 Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu

We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

no code implementations29 Sep 2017 Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu

First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.

BIG-bench Machine Learning

A Communication-Efficient Parallel Algorithm for Decision Tree

no code implementations NeurIPS 2016 Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.

2k Attribute

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

no code implementations27 Sep 2016 Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

Generalization Error Bounds for Optimization Algorithms via Stability

no code implementations27 Sep 2016 Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).

BIG-bench Machine Learning

Asynchronous Stochastic Gradient Descent with Delay Compensation

no code implementations ICML 2017 Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu

We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.

Cannot find the paper you are looking for? You can Submit a new open access paper.