Search Results for author: Randall Balestriero

Found 70 papers, 16 papers with code

Deep Networks Always Grok and Here is Why

no code implementations23 Feb 2024 Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

Our local complexity measures the density of the so-called 'linear regions' (aka, spline partition regions) that tile the DNN input space, and serves as a utile progress measure for training.

Learning by Reconstruction Produces Uninformative Features For Perception

no code implementations17 Feb 2024 Randall Balestriero, Yann Lecun

Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstruction, and learning for perception.

Denoising Representation Learning

Fast and Exact Enumeration of Deep Networks Partitions Regions

no code implementations20 Jan 2024 Randall Balestriero, Yann Lecun

One fruitful formulation of Deep Networks (DNs) enabling their theoretical study and providing practical guidelines to practitioners relies on Piecewise Affine Splines.

Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

1 code implementation4 Dec 2023 Randall Balestriero, Romain Cosentino, Sarath Shekkizhar

We obtain in closed form (i) the intrinsic dimension in which the Multi-Head Attention embeddings are constrained to exist and (ii) the partition and per-region affine mappings of the per-layer feedforward networks.

Language Modelling Large Language Model

Training Dynamics of Deep Network Linear Regions

no code implementations19 Oct 2023 Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

First, we present a novel statistic that encompasses the local complexity (LC) of the DN based on the concentration of linear regions inside arbitrary dimensional neighborhoods around data points.

Memorization

Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

1 code implementation ICCV 2023 Vivien Cabannes, Leon Bottou, Yann Lecun, Randall Balestriero

Third, it provides a proper active learning framework yielding low-cost solutions to annotate datasets, arguably bringing the gap between theory and practice of active learning that is based on simple-to-answer-by-non-experts queries of semantic relationships between inputs.

Active Learning Self-Supervised Learning

Towards Democratizing Joint-Embedding Self-Supervised Learning

1 code implementation3 Mar 2023 Florian Bordes, Randall Balestriero, Pascal Vincent

Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage large unlabeled data.

Data Augmentation Misconceptions +1

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

no code implementations1 Mar 2023 Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model.

Data Augmentation Fairness

An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

no code implementations1 Mar 2023 Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann Lecun

In this paper, we provide an information-theoretic perspective on Variance-Invariance-Covariance Regularization (VICReg) for self-supervised learning.

Self-Supervised Learning Transfer Learning

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

1 code implementation CVPR 2023 Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk

In this paper, we go one step further by developing the first provably exact method for computing the geometry of a DN's mapping - including its decision boundary - over a specified region of the data space.

Unsupervised Learning on a DIET: Datum IndEx as Target Free of Self-Supervision, Reconstruction, Projector Head

no code implementations20 Feb 2023 Randall Balestriero

Costly, noisy, and over-specialized, labels are to be set aside in favor of unsupervised learning if we hope to learn cheap, reliable, and transferable models.

Self-Supervised Learning

The SSL Interplay: Augmentations, Inductive Bias, and Generalization

no code implementations6 Feb 2023 Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann Lecun, Alberto Bietti

Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision.

Data Augmentation Inductive Bias +1

On minimal variations for unsupervised representation learning

no code implementations7 Nov 2022 Vivien Cabannes, Alberto Bietti, Randall Balestriero

Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks.

Representation Learning Self-Supervised Learning

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

no code implementations3 Nov 2022 Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

Equipped with ImageNet-X, we investigate 2, 200 current recognition models and study the types of mistakes as a function of model's (1) architecture, e. g. transformer vs. convolutional, (2) learning paradigm, e. g. supervised vs. self-supervised, and (3) training procedures, e. g., data augmentation.

Data Augmentation

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

1 code implementation2 Nov 2022 Randall Balestriero, Yann Lecun

In this paper we propose the first provable affine constraint enforcement method for DNNs that only requires minimal changes into a given DNN's forward-pass, that is computationally friendly, and that leaves the optimization of the DNN's parameter to be unconstrained, i. e. standard gradient-based method can be employed.

The Hidden Uniform Cluster Prior in Self-Supervised Learning

no code implementations13 Oct 2022 Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e. g., SimCLR, VICReg, SwAV, MSN).

Clustering Representation Learning +1

RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

no code implementations5 Oct 2022 Quentin Garrido, Randall Balestriero, Laurent Najman, Yann Lecun

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them.

Self-Supervised Learning

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

no code implementations29 Sep 2022 Grégoire Mialon, Randall Balestriero, Yann Lecun

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output.

Domain Generalization Self-Supervised Learning

Batch Normalization Explained

no code implementations29 Sep 2022 Randall Balestriero, Richard G. Baraniuk

A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps.

Joint Embedding Self-Supervised Learning in the Kernel Regime

no code implementations29 Sep 2022 Bobak T. Kiani, Randall Balestriero, Yubei Chen, Seth Lloyd, Yann Lecun

The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data.

Self-Supervised Learning

What Do We Maximize in Self-Supervised Learning?

no code implementations20 Jul 2022 Ravid Shwartz-Ziv, Randall Balestriero, Yann Lecun

In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction.

Self-Supervised Learning Transfer Learning

Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning

no code implementations27 Jun 2022 Florian Bordes, Randall Balestriero, Quentin Garrido, Adrien Bardes, Pascal Vincent

This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream.

Self-Supervised Learning Transfer Learning

Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods

no code implementations23 May 2022 Randall Balestriero, Yann Lecun

Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations.

Self-Supervised Learning

The Effects of Regularization and Data Augmentation are Class Dependent

no code implementations7 Apr 2022 Randall Balestriero, Leon Bottou, Yann Lecun

The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e. g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training.

Data Augmentation

DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

no code implementations7 Apr 2022 Vishwanath Saragadam, Randall Balestriero, Ashok Veeraraghavan, Richard G. Baraniuk

DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks.

Hyperspectral Image Denoising Image Classification +2

projUNN: efficient method for training deep networks with unitary matrices

1 code implementation10 Mar 2022 Bobak Kiani, Randall Balestriero, Yann Lecun, Seth Lloyd

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability.

Singular Value Perturbation and Deep Network Optimization

no code implementations7 Mar 2022 Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk

Building on our earlier work connecting deep networks with continuous piecewise-affine splines, we develop an exact local linear representation of a deep network layer for a family of modern deep networks that includes ConvNets at one end of a spectrum and ResNets, DenseNets, and other networks with skip connections at the other.

No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

1 code implementation4 Mar 2022 Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by the centroids, i. e., samples from the same cluster should not be more than $2r$ apart in terms of $\ell_2$ distance.

Clustering

Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

1 code implementation CVPR 2022 Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of pre-trained deep generative networks DGNs).

Image Generation Unconditional Image Generation

NeuroView-RNN: It's About Time

no code implementations23 Feb 2022 CJ Barberan, Sina AlEMohammad, Naiming Liu, Randall Balestriero, Richard G. Baraniuk

A key interpretability issue with RNNs is that it is not clear how each hidden state per time step contributes to the decision-making process in a quantitative manner.

Decision Making Time Series +1

Spatial Transformer K-Means

no code implementations16 Feb 2022 Romain Cosentino, Randall Balestriero, Yanis Bahroun, Anirvan Sengupta, Richard Baraniuk, Behnaam Aazhang

This enables (i) the reduction of intrinsic nuisances associated with the data, reducing the complexity of the clustering task and increasing performances and producing state-of-the-art results, (ii) clustering in the input space of the data, leading to a fully interpretable clustering algorithm, and (iii) the benefit of convergence guarantees.

Clustering

A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments

no code implementations16 Feb 2022 Randall Balestriero, Ishan Misra, Yann Lecun

We show that for a training loss to be stable under DA sampling, the model's saliency map (gradient of the loss with respect to the model's input) must align with the smallest eigenvector of the sample variance under the considered DA augmentation, hinting at a possible explanation on why models tend to shift their focus from edges to textures.

Data Augmentation

Learning in High Dimension Always Amounts to Extrapolation

no code implementations18 Oct 2021 Randall Balestriero, Jerome Pesenti, Yann Lecun

The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation.

Vocal Bursts Intensity Prediction

MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining

1 code implementation ICLR 2022 Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold and distribution.

Data Augmentation Domain Adaptation +2

NeuroView: Explainable Deep Network Decision Making

no code implementations15 Oct 2021 CJ Barberan, Randall Balestriero, Richard G. Baraniuk

Each member of the family is derived from a standard DN architecture by vector quantizing the unit output values and feeding them into a global linear classifier.

Decision Making

Fast Jacobian-Vector Product for Deep Networks

no code implementations1 Apr 2021 Randall Balestriero, Richard Baraniuk

Jacobian-vector products (JVPs) form the backbone of many recent developments in Deep Networks (DNs), with applications including faster constrained optimization, regularization with generalization guarantees, and adversarial example sensitivity assessments.

Max-Affine Spline Insights Into Deep Network Pruning

no code implementations7 Jan 2021 Haoran You, Randall Balestriero, Zhihan Lu, Yutong Kou, Huihong Shi, Shunyao Zhang, Shang Wu, Yingyan Lin, Richard Baraniuk

In this paper, we study the importance of pruning in Deep Networks (DNs) and the yin & yang relationship between (1) pruning highly overparametrized DNs that have been trained from random initialization and (2) training small DNs that have been "cleverly" initialized.

Network Pruning

Sparse Multi-Family Deep Scattering Network

no code implementations14 Dec 2020 Romain Cosentino, Randall Balestriero

The SMF-DSN enhances the DSN by (i) increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.

Translation

Enhanced Recurrent Neural Tangent Kernels for Non-Time-Series Data

2 code implementations9 Dec 2020 Sina AlEMohammad, Randall Balestriero, Zichao Wang, Richard Baraniuk

Kernels derived from deep neural networks (DNNs) in the infinite-width regime provide not only high performance in a range of machine learning tasks but also new theoretical insights into DNN training dynamics and generalization.

Time Series Time Series Analysis

Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks

no code implementations NeurIPS 2020 Randall Balestriero, Sebastien Paris, Richard Baraniuk

Deep Generative Networks (DGNs) with probabilistic modeling of their output and latent space are currently trained via Variational Autoencoders (VAEs).

Anomaly Detection Imputation +1

Deep Autoencoders: From Understanding to Generalization Guarantees

no code implementations20 Sep 2020 Romain Cosentino, Randall Balestriero, Richard Baraniuk, Behnaam Aazhang

Our regularizations leverage recent advances in the group of transformation learning to enable AEs to better approximate the data manifold without explicitly defining the group underlying the manifold.

Denoising

Ensembles of Generative Adversarial Networks for Disconnected Data

no code implementations25 Jun 2020 Lorenzo Luzi, Randall Balestriero, Richard G. Baraniuk

They can be represented in two ways: With an ensemble of networks or with a single network with truncated latent space.

The Recurrent Neural Tangent Kernel

no code implementations ICLR 2021 Sina Al-E-Mohammad, Zichao Wang, Randall Balestriero, Richard Baraniuk

The study of deep neural networks (DNNs) in the infinite-width limit, via the so-called neural tangent kernel (NTK) approach, has provided new insights into the dynamics of learning, generalization, and the impact of initialization.

Analytical Probability Distributions and EM-Learning for Deep Generative Networks

no code implementations NeurIPS 2020 Randall Balestriero, Sebastien Paris, Richard G. Baraniuk

Deep Generative Networks (DGNs) with probabilistic modeling of their output and latent space are currently trained via Variational Autoencoders (VAEs).

Anomaly Detection Imputation +1

Interpretable Super-Resolution via a Learned Time-Series Representation

no code implementations13 Jun 2020 Randall Balestriero, Herve Glotin, Richard G. Baraniuk

We develop an interpretable and learnable Wigner-Ville distribution that produces a super-resolved quadratic signal representation for time-series analysis.

Super-Resolution Time Series +1

SymJAX: symbolic CPU/GPU/TPU programming

1 code implementation21 May 2020 Randall Balestriero

SymJAX is a symbolic programming version of JAX simplifying graph input/output/updates and providing additional functionalities for general machine learning and deep learning applications.

BIG-bench Machine Learning

Max-Affine Spline Insights into Deep Generative Networks

1 code implementation26 Feb 2020 Randall Balestriero, Sebastien Paris, Richard Baraniuk

We also derive the output probability density mapped onto the generated manifold in terms of the latent space density, which enables the computation of key statistics such as its Shannon entropy.

Disentanglement

A GOODNESS OF FIT MEASURE FOR GENERATIVE NETWORKS

no code implementations25 Sep 2019 Lorenzo Luzi, Randall Balestriero, Richard Baraniuk

We define a goodness of fit measure for generative networks which captures how well the network can generate the training data, which is necessary to learn the true data distribution.

Implicit Rugosity Regularization via Data Augmentation

no code implementations28 May 2019 Daniel LeJeune, Randall Balestriero, Hamid Javadi, Richard G. Baraniuk

Deep (neural) networks have been applied productively in a wide range of supervised and unsupervised learning tasks.

Data Augmentation

The Geometry of Deep Networks: Power Diagram Subdivision

1 code implementation NeurIPS 2019 Randall Balestriero, Romain Cosentino, Behnaam Aazhang, Richard Baraniuk

The subdivision process constrains the affine maps on the (exponentially many) power diagram regions to greatly reduce their complexity.

A MAX-AFFINE SPLINE PERSPECTIVE OF RECURRENT NEURAL NETWORKS

no code implementations ICLR 2019 Zichao Wang, Randall Balestriero, Richard Baraniuk

Second, we show that the affine parameter of an RNN corresponds to an input-specific template, from which we can interpret an RNN as performing a simple template matching (matched filtering) given the input.

L2 Regularization Template Matching

From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference

no code implementations ICLR 2019 Randall Balestriero, Richard G. Baraniuk

We show that, under a GMM, piecewise affine, convex nonlinearities like ReLU, absolute value, and max-pooling can be interpreted as solutions to certain natural "hard" VQ inference problems, while sigmoid, hyperbolic tangent, and softmax can be interpreted as solutions to corresponding "soft" VQ inference problems.

Quantization

A Spline Theory of Deep Learning

no code implementations ICML 2018 Randall Balestriero, baraniuk

This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization.

General Classification Memorization

Spline Filters For End-to-End Deep Learning

no code implementations ICML 2018 Randall Balestriero, Romain Cosentino, Herve Glotin, Richard Baraniuk

We propose to tackle the problem of end-to-end learning for raw waveform signals by introducing learnable continuous time-frequency atoms.

Mad Max: Affine Spline Insights into Deep Learning

no code implementations17 May 2018 Randall Balestriero, Richard Baraniuk

For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input.

Clustering General Classification +2

Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion

no code implementations27 Feb 2018 Randall Balestriero, Herve Glotin, Richard Baraniuk

Deep Neural Networks (DNNs) provide state-of-the-art solutions in several difficult machine perceptual tasks.

Semi-Supervised Learning via New Deep Network Inversion

no code implementations12 Nov 2017 Randall Balestriero, Vincent Roger, Herve G. Glotin, Richard G. Baraniuk

We exploit a recently derived inversion scheme for arbitrary deep neural networks to develop a new semi-supervised learning framework that applies to a wide range of systems and problems.

Deep Neural Networks

no code implementations25 Oct 2017 Randall Balestriero, Richard Baraniuk

Deep Neural Networks (DNNs) are universal function approximators providing state-of- the-art solutions on wide range of applications.

Image Classification Object Tracking +2

Linear Time Complexity Deep Fourier Scattering Network and Extension to Nonlinear Invariants

no code implementations18 Jul 2017 Randall Balestriero, Herve Glotin

In this paper we propose a scalable version of a state-of-the-art deterministic time-invariant feature extraction approach based on consecutive changes of basis and nonlinearities, namely, the scattering network.

Neural Decision Trees

no code implementations23 Feb 2017 Randall Balestriero

NDT is an architecture a la decision tree where each splitting node is an independent multilayer perceptron allowing oblique decision functions or arbritrary nonlinear decision function if more than one layer is used.

Clustering

Robust Unsupervised Transient Detection With Invariant Representation based on the Scattering Network

no code implementations23 Nov 2016 Randall Balestriero, Behnaam Aazhang

We present a sparse and invariant representation with low asymptotic complexity for robust unsupervised transient and onset zone detection in noisy environments.

Cannot find the paper you are looking for? You can Submit a new open access paper.