no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.
no code implementations • 20 Feb 2024 • Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi
Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.
no code implementations • 12 Oct 2023 • Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie
Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks.
no code implementations • 31 Jul 2023 • Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar
Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective.
1 code implementation • 18 Jul 2023 • Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models.
no code implementations • 3 Dec 2022 • JiHye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien
We approach the problem of improving robustness of deep learning algorithms in the presence of label noise.
1 code implementation • 19 Sep 2022 • Thomas George, Guillaume Lajoie, Aristide Baratin
Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization.
no code implementations • 2 Jun 2022 • Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni
We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.
no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville
In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.
no code implementations • 10 Feb 2021 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes
Attention is a powerful component of modern neural networks across a wide variety of domains.
1 code implementation • NeurIPS Workshop DL-IG 2020 • Aristide Baratin, Thomas George, César Laurent, R. Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien
We approach the problem of implicit regularization in deep learning from a geometrical viewpoint.
no code implementations • 6 Jul 2020 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes
Attention is a powerful component of modern neural networks across a wide variety of domains.
no code implementations • 19 Oct 2018 • Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas
The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve.
no code implementations • ICML 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.
no code implementations • 12 Jan 2018 • Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien
Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier.
21 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.