no code implementations • 19 Nov 2022 • Lin Xiao, Pengyu Xu, Liping Jing, Xiangliang Zhang
In response, we propose a Pairwise Instance Relation Augmentation Network (PIRAN) to augment tailed-label documents for balancing tail labels and head labels.
Multi Label Text Classification Multi-Label Text Classification +2
no code implementations • 4 Oct 2022 • Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
no code implementations • 3 Oct 2022 • Shicong Cen, Yuejie Chi, Simon S. Du, Lin Xiao
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications.
no code implementations • 14 Jun 2022 • Aaron Defazio, Baoyu Zhou, Lin Xiao
The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients.
2 code implementations • 25 May 2022 • Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad
Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments.
no code implementations • ACL 2022 • Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting.
no code implementations • 27 Apr 2022 • Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat
The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL).
2 code implementations • 8 Apr 2022 • Krishna Pillutla, Kshitiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin Xiao
We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices.
no code implementations • 19 Jan 2022 • Lin Xiao
We consider infinite-horizon discounted Markov decision problems with finite state and action spaces and study the convergence rates of the projected policy gradient method and a general class of policy mirror descent methods, all with direct parametrization in the policy space.
no code implementations • EMNLP 2021 • Mingyang Song, Liping Jing, Lin Xiao
Keyphrase extraction is a fundamental task in Natural Language Processing, which usually contains two main parts: candidate keyphrase extraction and keyphrase importance estimation.
no code implementations • 19 May 2021 • Andres Ladino, Lin Xiao, Kingsley Adjenugwhure, Nicolás Deschle, Gerdien Klunder
Simulation-based traffic impact assessment studies of advanced technologies such as truck platooning need to be carried out to ascertain their benefits for traffic efficiency, safety and environment.
1 code implementation • 24 Jan 2021 • Lin Xiao, Xiangliang Zhang, Liping Jing, Chi Huang, Mingyang Song
To address the challenge of insufficient training data on tail label classification, we propose a Head-to-Tail Network (HTTN) to transfer the meta-knowledge from the data-rich head labels to data-poor tail labels.
no code implementations • 12 Oct 2020 • Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao
We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.
Ranked #1 on Sentence Classification on ACL-ARC
no code implementations • ACL 2020 • Boli Chen, Xin Huang, Lin Xiao, Liping Jing
Second, Hyperbolic Dynamic Routing (HDR) is introduced to aggregate hyperbolic capsules in a label-aware manner, so that the label-level discriminative information can be preserved along the depth of neural networks.
no code implementations • 30 May 2020 • Shoujin Wang, Longbing Cao, Liang Hu, Shlomo Berkovsky, Xiaoshui Huang, Lin Xiao, Wenpeng Lu
Most existing TBRSs recommend next item by only modeling the intra-transaction dependency within the current transaction while ignoring inter-transaction dependency with recent transactions that may also affect the next item.
1 code implementation • 25 Feb 2020 • Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao
We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods.
1 code implementation • IJCNLP 2019 • Lin Xiao, Xin Huang, Boli Chen, Liping Jing
Multi-label text classification (MLTC) aims to tag most relevant labels for the given document.
Ranked #1 on Multi-Label Text Classification on AAPD
1 code implementation • NeurIPS 2019 • Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao
The use of momentum in stochastic gradient methods has become a widespread practice in machine learning.
no code implementations • 25 Sep 2019 • Pengchuan Zhang, Hunter Lang, Qiang Liu, Lin Xiao
We investigate statistical methods for automatically scheduling the learning rate (step size) in stochastic optimization.
no code implementations • NeurIPS 2019 • Hunter Lang, Pengchuan Zhang, Lin Xiao
Despite the development of numerous adaptive optimizers, tuning the learning rate of stochastic gradient methods remains a major roadblock to obtaining good practical performance in machine learning.
no code implementations • 29 Aug 2019 • Junyu Zhang, Lin Xiao
We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings.
no code implementations • 31 Jul 2019 • Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang
Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation.
no code implementations • NeurIPS 2019 • Junyu Zhang, Lin Xiao
We show that this method achieves the same orders of complexity as the best known first-order methods for minimizing expected-value and finite-sum nonconvex functions, despite the additional outer composition which renders the composite gradient estimator biased.
1 code implementation • 26 May 2019 • Boli Chen, Xin Huang, Lin Xiao, Zixin Cai, Liping Jing
The main reason is that the tree-likeness of the hyperbolic space matches the complexity of symbolic data with hierarchical structures.
1 code implementation • 24 May 2019 • Xin Huang, Boli Chen, Lin Xiao, Liping Jing
Extreme multi-label text classification (XMTC) aims at tagging a document with most relevant labels from an extremely large-scale label set.
Ranked #1 on Multi-Label Text Classification on Amazon-12K
1 code implementation • NeurIPS 2018 • Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song
This flexible function class couples the variational distribution with the original parameters in the graphical models, allowing end-to-end learning of the graphical models by back-propagation through the variational distribution.
no code implementations • NeurIPS 2018 • Vikas K. Garg, Ofer Dekel, Lin Xiao
We present a new machine learning technique for training small resource-constrained predictors.
no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song
When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.
no code implementations • NeurIPS 2017 • Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng
In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions.
no code implementations • 13 Oct 2017 • Lin Xiao, Adams Wei Yu, Qihang Lin, Weizhu Chen
Machine learning with big data often involves large optimization models.
no code implementations • ICML 2017 • Jialei Wang, Lin Xiao
We consider empirical risk minimization of linear predictors with convex loss functions.
no code implementations • ICML 2017 • Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy.
1 code implementation • NeurIPS 2015 • Jianshu Chen, Ji He, Yelong Shen, Lin Xiao, Xiaodong He, Jianfeng Gao, Xinying Song, Li Deng
We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i. e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document.
no code implementations • 16 Jul 2015 • Amin Jalali, Maryam Fazel, Lin Xiao
We propose a new class of convex penalty functions, called \emph{variational Gram functions} (VGFs), that can promote pairwise relations, such as orthogonality, among a set of vectors in a vector space.
no code implementations • 1 Jan 2015 • Yuchen Zhang, Lin Xiao
We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning.
no code implementations • NeurIPS 2014 • Qihang Lin, Zhaosong Lu, Lin Xiao
We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems.
no code implementations • 10 Sep 2014 • Yuchen Zhang, Lin Xiao
We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors.
no code implementations • 19 Mar 2014 • Lin Xiao, Tong Zhang
We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping.
no code implementations • 17 Oct 2013 • Tianbing Xu, Jianfeng Gao, Lin Xiao, Amelia Regan
We propose a voted dual averaging method for online classification problems with explicit regularization.
no code implementations • 25 Jun 2013 • Zhaosong Lu, Lin Xiao
When the problem under consideration is convex, we show that the expected objective values generated by RNBPG converge to the optimal value of the problem.
no code implementations • 21 May 2013 • Zhaosong Lu, Lin Xiao
In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in [8, 11] for minimizing the sum of a smooth convex function and a block-separable convex function.
no code implementations • NeurIPS 2009 • Lin Xiao
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as L1-norm for sparsity.