1 code implementation • ACL 2022 • Lihu Chen, Gael Varoquaux, Fabian Suchanek
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants.
1 code implementation • 9 Feb 2024 • Riccardo Cappuzzo, Gael Varoquaux, Aimee Coelho, Paolo Papotti
We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks.
no code implementations • 11 Sep 2023 • Russell A. Poldrack, Christopher J. Markiewicz, Stefan Appelhoff, Yoni K. Ashar, Tibor Auer, Sylvain Baillet, Shashank Bansal, Leandro Beltrachini, Christian G. Benar, Giacomo Bertazzoli, Suyash Bhogawar, Ross W. Blair, Marta Bortoletto, Mathieu Boudreau, Teon L. Brooks, Vince D. Calhoun, Filippo Maria Castelli, Patricia Clement, Alexander L Cohen, Julien Cohen-Adad, Sasha D'Ambrosio, Gilles de Hollander, María de la iglesia-Vayá, Alejandro de la Vega, Arnaud Delorme, Orrin Devinsky, Dejan Draschkow, Eugene Paul Duff, Elizabeth Dupre, Eric Earl, Oscar Esteban, Franklin W. Feingold, Guillaume Flandin, anthony galassi, Giuseppe Gallitto, Melanie Ganz, Rémi Gau, James Gholam, Satrajit S. Ghosh, Alessio Giacomel, Ashley G Gillman, Padraig Gleeson, Alexandre Gramfort, Samuel Guay, Giacomo Guidali, Yaroslav O. Halchenko, Daniel A. Handwerker, Nell Hardcastle, Peer Herholz, Dora Hermes, Christopher J. Honey, Robert B. Innis, Horea-Ioan Ioanas, Andrew Jahn, Agah Karakuzu, David B. Keator, Gregory Kiar, Balint Kincses, Angela R. Laird, Jonathan C. Lau, Alberto Lazari, Jon Haitz Legarreta, Adam Li, Xiangrui Li, Bradley C. Love, Hanzhang Lu, Camille Maumet, Giacomo Mazzamuto, Steven L. Meisler, Mark Mikkelsen, Henk Mutsaerts, Thomas E. Nichols, Aki Nikolaidis, Gustav Nilsonne, Guiomar Niso, Martin Norgaard, Thomas W Okell, Robert Oostenveld, Eduard Ort, Patrick J. Park, Mateusz Pawlik, Cyril R. Pernet, Franco Pestilli, Jan Petr, Christophe Phillips, Jean-Baptiste Poline, Luca Pollonini, Pradeep Reddy Raamana, Petra Ritter, Gaia Rizzo, Kay A. Robbins, Alexander P. Rockhill, Christine Rogers, Ariel Rokem, Chris Rorden, Alexandre Routier, Jose Manuel Saborit-Torres, Taylor Salo, Michael Schirner, Robert E. Smith, Tamas Spisak, Julia Sprenger, Nicole C. Swann, Martin Szinte, Sylvain Takerkart, Bertrand Thirion, Adam G. Thomas, Sajjad Torabian, Gael Varoquaux, Bradley Voytek, Julius Welzel, Martin Wilson, Tal Yarkoni, Krzysztof J. Gorgolewski
The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities.
1 code implementation • NeurIPS 2022 • Leo Grinsztajn, Edouard Oyallon, Gael Varoquaux
While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear.
no code implementations • NeurIPS 2021 • Marine Le Morvan, Julie Josse, Erwan Scornet, Gael Varoquaux
In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.
no code implementations • NeurIPS Workshop AI4Scien 2021 • Gael Varoquaux
Science has progressed by reasoning on what models could not predict because they were missing important ingredients.
no code implementations • NeurIPS 2020 • Marine Le Morvan, Julie Josses, Thomas Moreau, Erwan Scornet, Gael Varoquaux
We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
1 code implementation • NeurIPS 2019 • Meyer Scetbon, Gael Varoquaux
Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence.
1 code implementation • NeurIPS 2019 • David Sabbagh, Pierre Ablin, Gael Varoquaux, Alexandre Gramfort, Denis A. Engemann
We show that Wasserstein and geometric distances allow perfect out-of-sample prediction on the generative models.
no code implementations • 24 Sep 2018 • Russell A. Poldrack, Krzysztof J. Gorgolewski, Gael Varoquaux
We argue that openness and transparency are critical for reproducibility, and we outline an ecosystem for open and transparent science that has emerged within the human neuroimaging community.
1 code implementation • 31 Jul 2018 • Sergul Aydore, Bertrand Thirion, Gael Varoquaux
In many applications where collecting data is expensive, for example neuroscience or medical imaging, the sample size is typically small compared to the feature dimension.
1 code implementation • 19 Jan 2017 • Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux
We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns.
no code implementations • NeurIPS 2016 • Elvis Dohmatob, Arthur Mensch, Gael Varoquaux, Bertrand Thirion
We propose a multivariate online dictionary-learning method for obtaining decompositions of brain images with structured and sparse components (aka atoms).
1 code implementation • NeurIPS 2015 • Danilo Bzdok, Michael Eickenberg, Olivier Grisel, Bertrand Thirion, Gael Varoquaux
Imaging neuroscience links human behavior to aspects of brain biology in ever-increasing datasets.
no code implementations • 16 Nov 2015 • Bertrand Thirion, Andrés Hoyos-Idrobo, Jonas Kahn, Gael Varoquaux
The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts.
no code implementations • 12 Dec 2014 • Alexandre Abraham, Elvis Dohmatob, Bertrand Thirion, Dimitris Samaras, Gael Varoquaux
Functional Magnetic Resonance Images acquired during resting-state provide information about the functional organization of the brain through measuring correlations between brain areas.
no code implementations • NeurIPS 2013 • Yannick Schwartz, Bertrand Thirion, Gael Varoquaux
Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies.
no code implementations • NeurIPS 2010 • Gael Varoquaux, Alexandre Gramfort, Jean-Baptiste Poline, Bertrand Thirion
We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population.