Search Results for author: Kannan Ramchandran

Found 63 papers, 18 papers with code

Toward a Theory of Tokenization in LLMs

no code implementations12 Apr 2024 Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran

In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes.

Language Modelling

Learning to Understand: Identifying Interactions via the Mobius Transform

no code implementations4 Feb 2024 Justin S. Kang, Yigit E. Erginbas, Landon Butler, Ramtin Pedarsani, Kannan Ramchandran

In the case where all interactions are between at most $t = \Theta(n^{\alpha})$ inputs, for $\alpha < 0. 409$, we are able to leverage results from group testing to provide the first algorithm that computes the Mobius transform in $O(Kt\log n)$ sample complexity and $O(K\mathrm{poly}(n))$ time with vanishing error as $K \rightarrow \infty$.

Learning Theory

Towards Optimal Statistical Watermarking

no code implementations13 Dec 2023 Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

no code implementations30 Sep 2023 Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

reinforcement-learning World Knowledge

MRI Reconstruction with Side Information using Diffusion Models

no code implementations26 Mar 2023 Brett Levac, Ajil Jalal, Kannan Ramchandran, Jonathan I. Tamir

This leads to an improvement in image reconstruction fidelity over generative models that rely only on a marginal prior over the image contrast of interest.

Anatomy MRI Reconstruction

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

no code implementations12 Feb 2023 Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran

We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action.

Decision Making

The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

no code implementations30 Jan 2023 Justin Kang, Ramtin Pedarsani, Kannan Ramchandran

We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users.

Fairness Federated Learning

Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions

1 code implementation15 Jan 2023 Yigit Efe Erginbas, Justin Singh Kang, Amirali Aghazadeh, Kannan Ramchandran

Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences.

Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

no code implementations13 Dec 2022 Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback.

Collaborative Filtering Recommendation Systems

Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

no code implementations5 Oct 2022 Amirali Aghazadeh, Nived Rajaraman, Tony Tu, Kannan Ramchandran

Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces.

Interactive Recommendations for Optimal Allocations in Markets with Constraints

no code implementations8 Jul 2022 Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran

Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most.

Collaborative Filtering Recommendation Systems

Neurotoxin: Durable Backdoors in Federated Learning

2 code implementations12 Jun 2022 Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal

In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs.

Backdoor Attack Federated Learning +1

Decentralized Competing Bandits in Non-Stationary Matching Markets

no code implementations31 May 2022 Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi, Arya Mazumdar

We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.

Minimax Optimal Online Imitation Learning via Replay Estimation

1 code implementation30 May 2022 Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.

Continuous Control Imitation Learning

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

1 code implementation6 Feb 2022 Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.

Model Selection

On the Value of Interaction and Function Approximation in Imitation Learning

no code implementations NeurIPS 2021 Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran

In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting.

Imitation Learning Multi-class Classification

Taxonomizing local versus global structure in neural network loss landscapes

1 code implementation NeurIPS 2021 Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper.

Model Selection for Generic Reinforcement Learning

no code implementations13 Jul 2021 Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies.

Model Selection reinforcement-learning +1

Model Selection for Generic Contextual Bandits

no code implementations7 Jul 2021 Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption.

Model Selection Multi-Armed Bandits

Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

no code implementations15 Jun 2021 Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

We show that, for any agent, the regret scales as $\mathcal{O}(\sqrt{T/N})$, if the agent is in a `well separated' cluster, or scales as $\mathcal{O}(T^{\frac{1}{2} + \varepsilon}/(N)^{\frac{1}{2} -\varepsilon})$ if its cluster is not well separated, where $\varepsilon$ is positive and arbitrarily close to $0$.

Clustering

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

no code implementations16 May 2021 Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master.

Distributed Optimization

Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience

no code implementations17 Mar 2021 Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran

Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in iteration complexity.

Federated Learning

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

no code implementations25 Feb 2021 Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.

Imitation Learning

BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

1 code implementation26 Oct 2020 Amirali Aghazadeh, Vipul Gupta, Alex DeWeese, O. Ozan Koyluoglu, Kannan Ramchandran

We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine.

feature selection

FastSecAgg: Scalable Secure Aggregation for Privacy-Preserving Federated Learning

no code implementations23 Sep 2020 Swanand Kadhe, Nived Rajaraman, O. Ozan Koyluoglu, Kannan Ramchandran

In this paper, we propose a secure aggregation protocol, FastSecAgg, that is efficient in terms of computation and communication, and robust to client dropouts.

Federated Learning Privacy Preserving

Utility-based Resource Allocation and Pricing for Serverless Computing

1 code implementation18 Aug 2020 Vipul Gupta, Soham Phade, Thomas Courtade, Kannan Ramchandran

As one of the fastest-growing cloud services, serverless computing provides an opportunity to better serve both users and providers through the incorporation of market-based strategies for pricing and resource allocation.

Distributed, Parallel, and Cluster Computing Computer Science and Game Theory

Boundary thickness and robustness in learning models

1 code implementation NeurIPS 2020 Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.

Adversarial Defense Data Augmentation

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

no code implementations14 May 2020 Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran

When a particular code is used in this framework, its block-length determines the computation load, dimension determines the communication overhead, and minimum distance determines the straggler tolerance.

Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

no code implementations23 Apr 2020 Avishek Ghosh, Kannan Ramchandran

Furthermore, we compare AM with a gradient based heuristic algorithm empirically and show that AM dominates in iteration complexity as well as wall-clock time.

regression

Serverless Straggler Mitigation using Local Error-Correcting Codes

1 code implementation21 Jan 2020 Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan Ramchandran

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation.

Distributed, Parallel, and Cluster Computing Information Theory Information Theory

Communication-Efficient and Byzantine-Robust Distributed Learning with Error Feedback

no code implementations21 Nov 2019 Avishek Ghosh, Raj Kumar Maity, Swanand Kadhe, Arya Mazumdar, Kannan Ramchandran

Moreover, we analyze the compressed gradient descent algorithm with error feedback (proposed in \cite{errorfeed}) in a distributed setting and in the presence of Byzantine worker machines.

SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains

no code implementations28 Jun 2019 Swanand Kadhe, Jichan Chung, Kannan Ramchandran

In this paper, we propose an architecture based on 'fountain codes', a class of erasure codes, that enables any full node to 'encode' validated blocks into a small number of 'coded blocks', thereby reducing its storage costs by orders of magnitude.

Cryptography and Security Distributed, Parallel, and Cluster Computing Information Theory Information Theory

Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

no code implementations21 Jun 2019 Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, Kannan Ramchandran

Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$.

regression Retrieval

Robust Federated Learning in a Heterogeneous Environment

no code implementations16 Jun 2019 Avishek Ghosh, Justin Hong, Dong Yin, Kannan Ramchandran

Then, leveraging the statistical model, we solve the robust heterogeneous Federated Learning problem \emph{optimally}; in particular our algorithm matches the lower bound on the estimation error in dimension and the number of data points.

Clustering Federated Learning

Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

no code implementations9 May 2019 Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil Thomas, Kannan Ramchandran

Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset.

Voice Conversion

Cross-Entropy Loss Leads To Poor Margins

no code implementations ICLR 2019 Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran

In this work, we study the binary classification of linearly separable datasets and show that linear classifiers could also have decision boundaries that lie close to their training dataset if cross-entropy loss is used for training.

Binary Classification

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

no code implementations30 Apr 2019 Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran

In this work, our goal is to construct approximate gradient codes that are resilient to stragglers selected by a computationally unbounded adversary.

OverSketched Newton: Fast Convex Optimization for Serverless Systems

1 code implementation21 Mar 2019 Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran

Motivated by recent developments in serverless systems for large-scale computation as well as improvements in scalable randomized matrix algorithms, we develop OverSketched Newton, a randomized Hessian-based optimization algorithm to solve large-scale convex optimization problems in serverless systems.

Distributed Optimization

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

no code implementations24 Jan 2019 Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran

We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset.

Binary Classification

OverSketch: Approximate Matrix Multiplication for the Cloud

1 code implementation6 Nov 2018 Vipul Gupta, Shusen Wang, Thomas Courtade, Kannan Ramchandran

We propose OverSketch, an approximate algorithm for distributed matrix multiplication in serverless computing.

Distributed, Parallel, and Cluster Computing Information Theory Information Theory

Greedy Frank-Wolfe Algorithm for Exemplar Selection

2 code implementations6 Nov 2018 Gary Cheng, Armin Askari, Kannan Ramchandran, Laurent El Ghaoui

In this paper, we consider the problem of selecting representatives from a data set for arbitrary supervised/unsupervised learning tasks.

Dictionary Learning

Rademacher Complexity for Adversarially Robust Generalization

1 code implementation29 Oct 2018 Dong Yin, Kannan Ramchandran, Peter Bartlett

For binary linear classifiers, we prove tight bounds for the adversarial Rademacher complexity, and show that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm.

BIG-bench Machine Learning

Online Scoring with Delayed Information: A Convex Optimization Viewpoint

no code implementations9 Jul 2018 Avishek Ghosh, Kannan Ramchandran

We argue that the error in the score estimate accumulated over $T$ iterations is small if the regret of the online convex game is small.

Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning

no code implementations14 Jun 2018 Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett

In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used.

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

1 code implementation ICML 2018 Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett

In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses.

Approximate Ranking from Pairwise Comparisons

no code implementations4 Jan 2018 Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin J. Wainwright

Accordingly, we study the problem of finding approximate rankings from pairwise comparisons.

A Sequential Approximation Framework for Coded Distributed Optimization

no code implementations24 Oct 2017 Jingge Zhu, Ye Pu, Vipul Gupta, Claire Tomlin, Kannan Ramchandran

As an application of the results, we demonstrate solving optimization problems using a sequential approximation approach, which accelerates the algorithm in a distributed system with stragglers.

Distributed Optimization

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning

no code implementations18 Jun 2017 Dong Yin, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, Peter Bartlett

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size.

Quantization

The Sample Complexity of Online One-Class Collaborative Filtering

1 code implementation ICML 2017 Reinhard Heckel, Kannan Ramchandran

We consider the online one-class collaborative filtering (CF) problem that consists of recommending items to users over time in an online fashion based on positive ratings only.

Collaborative Filtering Recommendation Systems

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't Help

no code implementations28 Jun 2016 Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran, Martin J. Wainwright

We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point.

Open-Ended Question Answering

Speeding Up Distributed Machine Learning Using Codes

no code implementations8 Dec 2015 Kangwook Lee, Maximilian Lam, Ramtin Pedarsani, Dimitris Papailiopoulos, Kannan Ramchandran

We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling.

BIG-bench Machine Learning

An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching

no code implementations NeurIPS 2015 Xiao Li, Kannan Ramchandran

By writing the cut function as a polynomial and exploiting the graph structure, we propose a sketching algorithm to learn the an arbitrary $n$-node unknown graph using only few cut queries, which scales {\it almost linearly} in the number of edges and {\it sub-linearly} in the graph size $n$.

Active Learning

Fast and Efficient Sparse 2D Discrete Fourier Transform using Sparse-Graph Codes

no code implementations19 Sep 2015 Frank Ong, Sameer Pawar, Kannan Ramchandran

For the case when the spatial-domain measurements are corrupted by additive noise, our 2D-FFAST framework extends to a noise-robust version in sub-linear time of O(k log4 N ) using O(k log3 N ) measurements.

Information Theory Multimedia Systems and Control Information Theory

SPRIGHT: A Fast and Robust Framework for Sparse Walsh-Hadamard Transform

3 code implementations26 Aug 2015 Xiao Li, Joseph K. Bradley, Sameer Pawar, Kannan Ramchandran

We consider the problem of computing the Walsh-Hadamard Transform (WHT) of some $N$-length input vector in the presence of noise, where the $N$-point Walsh spectrum is $K$-sparse with $K = {O}(N^{\delta})$ scaling sub-linearly in the input dimension $N$ for some $0<\delta<1$.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

no code implementations24 Jul 2015 Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

Stochastic Optimization

Parallel Correlation Clustering on Big Graphs

no code implementations NeurIPS 2015 Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.

Clustering

Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence

no code implementations6 May 2015 Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin J. Wainwright

Data in the form of pairwise comparisons arises in many domains, including preference elicitation, sporting competitions, and peer grading among others.

A robust sub-linear time R-FFAST algorithm for computing a sparse DFT

no code implementations1 Jan 2015 Sameer Pawar, Kannan Ramchandran

If the DFT X of the signal x has only k non-zero coefficients (where k < n), can we do better?

When is it Better to Compare than to Score?

no code implementations25 Jun 2014 Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright

When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements.

Cannot find the paper you are looking for? You can Submit a new open access paper.