Search Results for author: Sanjiv Kumar

Found 123 papers, 20 papers with code

Language Model Cascades: Token-level uncertainty and beyond

no code implementations • 15 Apr 2024 • Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks.

Language Modelling

Paper
Add Code

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

no code implementations • 14 Apr 2024 • Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation.

Paper
Add Code

SOAR: Improved Indexing for Approximate Nearest Neighbor Search

no code implementations • NeurIPS 2023 • Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, Sanjiv Kumar

This paper introduces SOAR: Spilling with Orthogonality-Amplified Residuals, a novel data indexing technique for approximate nearest neighbor (ANN) search.

Paper
Add Code

Metric-aware LLM inference for regression and scoring

no code implementations • 7 Mar 2024 • Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks.

regression

Paper
Add Code

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

no code implementations • 14 Feb 2024 • Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache.

Paper
Add Code

Tandem Transformers for Inference Efficient LLMs

no code implementations • 13 Feb 2024 • Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3. 3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1. 16x speedup compared to a PaLM2-Otter model with comparable downstream performance.

Paper
Add Code

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations • 8 Feb 2024 • Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.

Paper
Add Code

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

no code implementations • 24 Jan 2024 • Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia Desalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar

In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss.

Paper
Add Code

A Weighted K-Center Algorithm for Data Subset Selection

no code implementations • 17 Dec 2023 • Srikumar Ramalingam, Pranjal Awasthi, Sanjiv Kumar

The success of deep learning hinges on enormous data and large models, which require labor-intensive annotations and heavy computation costs.

Paper
Add Code

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

no code implementations • 15 Dec 2023 • Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar

Answering complex natural language questions often necessitates multi-step reasoning and integrating external information.

Ranked #1 on Question Answering on Bamboogle

Language Modelling Large Language Model +2

Paper
Add Code

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

2 code implementations • 30 Nov 2023 • Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar

It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient.

Image Generation

32,758

Paper
Code

It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models

no code implementations • 13 Oct 2023 • Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}.

Paper
Add Code

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

no code implementations • 12 Oct 2023 • Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

Knowledge Distillation Language Modelling +1

Paper
Add Code

What do larger image classifiers memorise?

no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.

Image Classification Knowledge Distillation +2

Paper
Add Code

Functional Interpolation for Relative Positions Improves Long Context Transformers

no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.

Language Modelling Position

Paper
Add Code

Think before you speak: Training Language Models With Pause Tokens

no code implementations • 3 Oct 2023 • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.

GSM8K Question Answering

Paper
Add Code

MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

no code implementations • 14 Aug 2023 • Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar

Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts.

Structured Prediction Text-to-Image Generation

Paper
Add Code

When Does Confidence-Based Cascade Deferral Suffice?

no code implementations • NeurIPS 2023 • Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn.

Paper
Add Code

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Paper
Add Code

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Paper
Add Code

Plugin estimators for selective classification with out-of-distribution detection

no code implementations • 29 Jan 2023 • Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

Recent work on selective classification with OOD detection (SCOD) has argued for the unified study of these problems; however, the formal underpinnings of this problem are still nascent, and existing techniques are heuristic in nature.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Add Code

Supervision Complexity and its Role in Knowledge Distillation

no code implementations • 28 Jan 2023 • Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps.

Image Classification Knowledge Distillation

Paper
Add Code

Leveraging Importance Weights in Subset Selection

no code implementations • 28 Jan 2023 • Gui Citovsky, Giulia Desalvo, Sanjiv Kumar, Srikumar Ramalingam, Afshin Rostamizadeh, Yunjuan Wang

In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i. e. further train model weights) once a large enough batch of examples is selected.

Active Learning

Paper
Add Code

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

no code implementations • 27 Jan 2023 • Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR).

Information Retrieval Knowledge Distillation +2

Paper
Add Code

Automating Nearest Neighbor Search Configuration with Constrained Optimization

no code implementations • 4 Jan 2023 • Philip Sun, Ruiqi Guo, Sanjiv Kumar

The approximate nearest neighbor (ANN) search problem is fundamental to efficiently serving many real-world machine learning applications.

Quantization

Paper
Add Code

Large Language Models with Controllable Working Memory

no code implementations • 9 Nov 2022 • Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.

counterfactual World Knowledge

Paper
Add Code

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

no code implementations • 1 Nov 2022 • Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts.

Binary Classification Domain Generalization +5

Paper
Add Code

When does mixup promote local linearity in learned representations?

no code implementations • 28 Oct 2022 • Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?

Representation Learning

Paper
Add Code

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

no code implementations • 12 Oct 2022 • Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.

Paper
Add Code

Decoupled Context Processing for Context Augmented Language Modeling

no code implementations • 11 Oct 2022 • Zonglin Li, Ruiqi Guo, Sanjiv Kumar

Language models can be augmented with a context retriever to incorporate knowledge from large external databases.

Language Modelling Open-Domain Question Answering +2

Paper
Add Code

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

no code implementations • 14 Aug 2022 • Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.

Generalization Bounds Image Classification +4

Paper
Add Code

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

no code implementations • 28 Jun 2022 • Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall.

Paper
Add Code

ELM: Embedding and Logit Margins for Long-Tail Learning

no code implementations • 27 Apr 2022 • Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.

Contrastive Learning Long-tail Learning

Paper
Add Code

Predicting on the Edge: Identifying Where a Larger Model Does Better

no code implementations • 15 Feb 2022 • Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar

Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity.

Paper
Add Code

Robust Training of Neural Networks Using Scale Invariant Architectures

no code implementations • 2 Feb 2022 • Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.

Paper
Add Code

Efficient Training of Retrieval Models using Negative Cache

2 code implementations • NeurIPS 2021 • Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

These models are typically trained by optimizing the model parameters to score relevant positive" pairs higher than the irrelevantnegative" ones.

Information Retrieval Retrieval

32,753

Paper
Code

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations • 19 Oct 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

Paper
Add Code

Leveraging redundancy in attention with Reuse Transformers

1 code implementation • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

76,582

Paper
Code

In defense of dual-encoders for neural ranking

no code implementations • 29 Sep 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.

Information Retrieval Natural Questions +1

Paper
Add Code

Less data is more: Selecting informative and diverse subsets with balancing constraints

no code implementations • 29 Sep 2021 • Srikumar Ramalingam, Daniel Glasner, Kaushal Patel, Raviteja Vemulapalli, Sadeep Jayasumana, Sanjiv Kumar

Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost.

Paper
Add Code

When in Doubt, Summon the Titans: A Framework for Efficient Inference with Large Models

no code implementations • 29 Sep 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

Image Classification

Paper
Add Code

Model-Efficient Deep Learning with Kernelized Classification

no code implementations • 29 Sep 2021 • Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

We investigate the possibility of using the embeddings produced by a lightweight network more effectively with a nonlinear classification layer.

Classification

Paper
Add Code

Batch Active Learning at Scale

1 code implementation • NeurIPS 2021 • Gui Citovsky, Giulia Desalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources.

Active Learning

Paper
Code

Teacher's pet: understanding and mitigating biases in distillation

no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Image Classification Knowledge Distillation

Paper
Add Code

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

no code implementations • 16 Jun 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.

Paper
Add Code

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

no code implementations • 25 May 2021 • Baris Sumengen, Anand Rajagopalan, Gui Citovsky, David Simcha, Olivier Bachem, Pradipta Mitra, Sam Blasiak, Mason Liang, Sanjiv Kumar

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods.

Clustering

Paper
Add Code

Balancing Robustness and Sensitivity using Feature Contrastive Learning

no code implementations • 19 May 2021 • Seungyeon Kim, Daniel Glasner, Srikumar Ramalingam, Cho-Jui Hsieh, Kishore Papineni, Sanjiv Kumar

It is generally believed that robust training of extremely large networks is critical to their success in real-world applications.

Contrastive Learning

Paper
Add Code

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

no code implementations • 12 May 2021 • Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.

Retrieval

Paper
Add Code

Less is more: Selecting informative and diverse subsets with balancing constraints

no code implementations • 26 Apr 2021 • Srikumar Ramalingam, Daniel Glasner, Kaushal Patel, Raviteja Vemulapalli, Sadeep Jayasumana, Sanjiv Kumar

Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost.

Image Classification

Paper
Add Code

RankDistil: Knowledge Distillation for Ranking

no code implementations • AISTATS 2021 • Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

Paper
Add Code

On the Reproducibility of Neural Network Predictions

no code implementations • 5 Feb 2021 • Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.

Data Augmentation Image Classification

Paper
Add Code

Overparameterisation and worst-case generalisation: friend or foe?

no code implementations • ICLR 2021 • Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.

Structured Prediction

Paper
Add Code

Kernelized Classification in Deep Networks

no code implementations • 8 Dec 2020 • Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

We propose a kernelized classification layer for deep networks.

Classification General Classification

Paper
Add Code

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Paper
Add Code

Modifying Memories in Transformer Models

no code implementations • 1 Dec 2020 • Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.

Memorization

Paper
Add Code

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Paper
Code

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Paper
Add Code

Learning discrete distributions: user vs item-level privacy

no code implementations • NeurIPS 2020 • Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

Paper
Add Code

Multi-Stage Influence Function

no code implementations • NeurIPS 2020 • Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.

Transfer Learning

Paper
Add Code

Long-tail learning via logit adjustment

3 code implementations • ICLR 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Ranked #48 on Long-tail Learning on ImageNet-LT

Long-tail Learning

32,753

Paper
Code

$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Paper
Add Code

Evaluations and Methods for Explanation through Robustness Analysis

no code implementations • ICLR 2021 • Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis.

Adversarial Attack

Paper
Add Code

Why distillation helps: a statistical perspective

no code implementations • 21 May 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.

Knowledge Distillation Retrieval

Paper
Add Code

Can gradient clipping mitigate label noise?

1 code implementation • ICLR 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.

Paper
Code

Doubly-stochastic mining for heterogeneous retrieval

no code implementations • 23 Apr 2020 • Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Retrieval Stochastic Optimization

Paper
Add Code

Federated Learning with Only Positive Labels

1 code implementation • ICML 2020 • Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.

Federated Learning Multi-class Classification

Paper
Code

Robust Large-Margin Learning in Hyperbolic Space

no code implementations • NeurIPS 2020 • Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.

Representation Learning

Paper
Add Code

Does label smoothing mitigate label noise?

no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Ranked #12 on Learning with noisy labels on CIFAR-10N-Random3

Learning with noisy labels

Paper
Add Code

Adaptive Federated Optimization

5 code implementations • ICLR 2021 • Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.

Federated Learning

4,152

Paper
Code

Low-Rank Bottleneck in Multi-head Attention Models

no code implementations • ICML 2020 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Attention based Transformer architecture has enabled significant advances in the field of natural language processing.

Paper
Add Code

Pre-training Tasks for Embedding-based Large-scale Retrieval

no code implementations • ICLR 2020 • Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar

We consider the large-scale query-document retrieval problem: given a query (e. g., a question), return the set of relevant documents (e. g., paragraphs containing the answer) from a large document corpus.

Information Retrieval Link Prediction +1

Paper
Add Code

New Loss Functions for Fast Maximum Inner Product Search

no code implementations • ICLR 2020 • Ruiqi Guo, Quan Geng, David Simcha, Felix Chern, Phil Sun, Sanjiv Kumar

In this work, we focus directly on minimizing error in inner product approximation and derive a new class of quantization loss functions.

Benchmarking Quantization

Paper
Add Code

Are Transformers universal approximators of sequence-to-sequence functions?

no code implementations • ICLR 2020 • Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models.

Paper
Add Code

Why are Adaptive Methods Good for Attention Models?

no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Multilabel reductions: what is my loss optimising?

no code implementations • NeurIPS 2019 • Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging.

General Classification Information Retrieval +1

Paper
Add Code

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

no code implementations • NeurIPS 2019 • Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.

Attribute Classification +2

Paper
Add Code

Learning to Learn by Zeroth-Order Oracle

1 code implementation • ICLR 2020 • Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

In the learning to learn (L2L) framework, we cast the design of optimization algorithms as a machine learning problem and use deep neural networks to learn the update rules.

Adversarial Attack

Paper
Code

Concise Multi-head Attention Models

no code implementations • 25 Sep 2019 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Attention based Transformer architecture has enabled significant advances in the field of natural language processing.

Paper
Add Code

LEARNING TO LEARN WITH BETTER CONVERGENCE

no code implementations • 25 Sep 2019 • Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models.

Paper
Add Code

Why ADAM Beats SGD for Attention Models

no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

Paper
Add Code

Online Hierarchical Clustering Approximations

no code implementations • 20 Sep 2019 • Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.

Clustering

Paper
Add Code

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

3 code implementations • ICML 2020 • Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar

Based on the observation that for a given query, the database points that have the largest inner products are more relevant, we develop a family of anisotropic quantization loss functions.

Benchmarking Quantization

32,753

Paper
Code

AdaCliP: Adaptive Clipping for Private SGD

1 code implementation • 20 Aug 2019 • Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.

BIG-bench Machine Learning Privacy Preserving

Paper
Code

Sampled Softmax with Random Fourier Features

no code implementations • NeurIPS 2019 • Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Paper
Add Code

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

1 code implementation • 5 Jun 2019 • Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection.

Paper
Code

On the Convergence of Adam and Beyond

3 code implementations • ICLR 2018 • Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.

Stochastic Optimization

47,627

Paper
Code

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

24 code implementations • ICLR 2020 • Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Ranked #11 on Question Answering on SQuAD1.1 dev (F1 metric)

Question Answering Stochastic Optimization

1,846

Paper
Code

Local Orthogonal Decomposition for Maximum Inner Product Search

no code implementations • 25 Mar 2019 • Xiang Wu, Ruiqi Guo, Sanjiv Kumar, David Simcha

More specifically, we decompose a residual vector locally into two orthogonal components and perform uniform quantization and multiscale quantization to each component respectively.

Quantization

Paper
Add Code

Efficient Inner Product Approximation in Hybrid Spaces

no code implementations • 20 Mar 2019 • Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar

In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy.

Network Embedding

Paper
Add Code

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Paper
Add Code

Adaptive Methods for Nonconvex Optimization

1 code implementation • NeurIPS 2018 • Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

Stochastic Optimization

Paper
Code

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

no code implementations • ICLR 2019 • Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh

The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.

Clustering Machine Translation +1

Paper
Add Code

Stochastic Negative Mining for Learning with Large Output Spaces

no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.

Retrieval

Paper
Add Code

Privacy and Utility Tradeoff in Approximate Differential Privacy

no code implementations • 1 Oct 2018 • Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar

We show that the multiplicative gap of the lower bounds and upper bounds goes to zero in various high privacy regimes, proving the tightness of the lower and upper bounds and thus establishing the optimality of the truncated Laplacian mechanism.

Paper
Add Code

Optimal Noise-Adding Mechanism in Additive Differential Privacy

no code implementations • 26 Sep 2018 • Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar

We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework.

Paper
Add Code

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Paper
Add Code

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

1 code implementation • 26 Jun 2018 • Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.

Extreme Multi-Label Classification Multi-Label Learning +1

Paper
Code

cpSGD: Communication-efficient and differentially-private distributed SGD

no code implementations • NeurIPS 2018 • Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan McMahan

Distributed stochastic gradient descent is an important subroutine in distributed learning.

Privacy Preserving

Paper
Add Code

Nonlinear Online Learning with Adaptive Nyström Approximation

no code implementations • 21 Feb 2018 • Si Si, Sanjiv Kumar, Yang Li

Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks.

Paper
Add Code

Multiscale Quantization for Fast Similarity Search

no code implementations • NeurIPS 2017 • Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, Felix Yu

We propose a multiscale quantization approach for fast similarity search on large, high-dimensional datasets.

Quantization

Paper
Add Code

Now Playing: Continuous low-power music recognition

no code implementations • 29 Nov 2017 • Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović

To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present.

Paper
Add Code

Learning Spread-out Local Feature Descriptors

2 code implementations • ICCV 2017 • Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang

We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors.

Paper
Code

Efficient Natural Language Response Suggestion for Smart Reply

no code implementations • 1 May 2017 • Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil

This paper presents a computationally efficient machine-learned method for natural language response suggestion.

Paper
Add Code

Stochastic Generative Hashing

2 code implementations • ICML 2017 • Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song

Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.

Retrieval

Paper
Code

Distributed Mean Estimation with Limited Communication

no code implementations • ICML 2017 • Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.

Quantization

Paper
Add Code

Orthogonal Random Features

no code implementations • NeurIPS 2016 • Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Paper
Add Code

Fast Orthogonal Projection Based on Kronecker Product

no code implementations • ICCV 2015 • Xu Zhang, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar, Shengjin Wang, Shi-Fu Chang

We propose a family of structured matrices to speed up orthogonal projections for high-dimensional data commonly seen in computer vision applications.

Image Retrieval Quantization

Paper
Add Code

Spherical Random Features for Polynomial Kernels

no code implementations • NeurIPS 2015 • Jeffrey Pennington, Felix Xinnan X. Yu, Sanjiv Kumar

Among the commonly used kernels for nonlinear classification are polynomial kernels, for which low approximation error has thus far necessitated explicit feature maps of large dimensionality, especially for higher-order polynomials.

General Classification

Paper
Add Code

On Binary Embedding using Circulant Matrices

no code implementations • 20 Nov 2015 • Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

Paper
Add Code

Binary embeddings with structured hashed projections

no code implementations • 16 Nov 2015 • Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann Lecun

We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors.

LEMMA

Paper
Add Code

Structured Transforms for Small-Footprint Deep Learning

no code implementations • NeurIPS 2015 • Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices.

Keyword Spotting speech-recognition +1

Paper
Add Code

Learning to Hash for Indexing Big Data - A Survey

no code implementations • 17 Sep 2015 • Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang

Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions.

Paper
Add Code

Quantization based Fast Inner Product Search

no code implementations • 4 Sep 2015 • Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, David Simcha

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).

Quantization

Paper
Add Code

Fast Online Clustering with Randomized Skeleton Sets

no code implementations • 10 Jun 2015 • Krzysztof Choromanski, Sanjiv Kumar, Xiaofeng Liu

To achieve fast clustering, we propose to represent each cluster by a skeleton set which is updated continuously as new data is seen.

Clustering Nonparametric Clustering +1

Paper
Add Code

Compact Nonlinear Maps and Circulant Extensions

no code implementations • 12 Mar 2015 • Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang

This leads to much more compact maps without hurting the performance.

Paper
Add Code

An exploration of parameter redundancy in deep networks with circulant projections

no code implementations • ICCV 2015 • Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.

Paper
Add Code

Discrete Graph Hashing

no code implementations • NeurIPS 2014 • Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang

Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases.

Paper
Add Code

Circulant Binary Embedding

no code implementations • 13 May 2014 • Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

Paper
Add Code

On Learning from Label Proportions

1 code implementation • 24 Feb 2014 • Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.

Marketing

Paper
Code

$\propto$SVM for learning with label proportions

no code implementations • 4 Jun 2013 • Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known.

Paper
Add Code

Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

no code implementations • CVPR 2013 • Yunchao Gong, Sanjiv Kumar, Henry A. Rowley, Svetlana Lazebnik

Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on largescale datasets like ImageNet, extremely high-dimensional visual descriptors, e. g., Fisher Vectors, are needed.

Classification Code Generation +4

Paper
Add Code

Angular Quantization-based Binary Codes for Fast Similarity Search

no code implementations • NeurIPS 2012 • Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Such data typically arises in a large number of vision and text applications where counts or frequencies are used as features.

Quantization Retrieval +1

Paper
Add Code

Ensemble Nystrom Method

no code implementations • NeurIPS 2009 • Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar

A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.