Search Results for author: Thomas Hofmann

Found 95 papers, 36 papers with code

Language Imbalance Can Boost Cross-lingual Generalisation

2 code implementations • 11 Apr 2024 • Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

In controlled experiments on perfectly equivalent cloned languages, we observe that the existence of a predominant language during training boosts the performance of less frequent languages and leads to stronger alignment of model representations across languages.

Language Modelling

131

Paper
Code

On the Effect of (Near) Duplicate Subwords in Language Modelling

2 code implementations • 9 Apr 2024 • Anton Schäfer, Thomas Hofmann, Imanol Schlag, Tiago Pimentel

In this paper, we study the impact of near duplicate subwords on LM training efficiency.

Language Modelling

131

Paper
Code

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

no code implementations • 12 Mar 2024 • Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories.

Paper
Add Code

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

no code implementations • 27 Feb 2024 • Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i. e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time.

Paper
Add Code

A Language Model's Guide Through Latent Space

no code implementations • 22 Feb 2024 • Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time.

Novel Concepts

Paper
Add Code

Towards Meta-Pruning via Optimal Transport

1 code implementation • 12 Feb 2024 • Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.

Neural Network Compression

Paper
Code

How Good is a Single Basin?

no code implementations • 5 Feb 2024 • Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann

The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles.

Paper
Add Code

Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures

1 code implementation • 29 Jan 2024 • Michael Hersche, Francesco Di Stefano, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge.

Attribute

Paper
Code

Disentangling Linear Mode-Connectivity

no code implementations • 15 Dec 2023 • Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes.

Linear Mode Connectivity

Paper
Add Code

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

no code implementations • 14 Dec 2023 • Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.

Denoising Semantic Segmentation +1

Paper
Add Code

Recurrent Distance Filtering for Graph Representation Learning

no code implementations • 3 Dec 2023 • Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively.

Graph Representation Learning Inductive Bias +1

Paper
Add Code

Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

no code implementations • 10 Nov 2023 • Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality.

Paper
Add Code

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

no code implementations • 6 Nov 2023 • Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

This leads to the notion of a `compute-optimal' model, i. e. a model that allocates a given level of compute during training optimally to maximize performance.

Paper
Add Code

Simplifying Transformer Blocks

1 code implementation • 3 Nov 2023 • Bobby He, Thomas Hofmann

A simple design recipe for deep Transformers is to compose identical building blocks.

269

Paper
Code

Transformer Fusion with Optimal Transport

no code implementations • 9 Oct 2023 • Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures -- in principle -- and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies.

Image Classification Language Modelling

Paper
Add Code

Towards guarantees for parameter isolation in continual learning

no code implementations • 2 Oct 2023 • Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning.

Continual Learning

Paper
Add Code

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

1 code implementation • 20 Sep 2023 • Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.

Language Modelling

Paper
Code

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

no code implementations • NeurIPS 2023 • Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width.

Deep Attention Learning Theory

Paper
Add Code

Scaling MLPs: A Tale of Inductive Bias

1 code implementation • NeurIPS 2023 • Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on ImageNet ReaL), highlighting that lack of inductive bias can indeed be compensated.

Computational Efficiency Inductive Bias +1

Paper
Code

Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

no code implementations • 4 Jun 2023 • Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore.

Common Sense Reasoning Question Answering +2

Paper
Add Code

The Hessian perspective into the Nature of Convolutional Neural Networks

no code implementations • 16 May 2023 • Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps.

Paper
Add Code

CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

1 code implementation • 12 Apr 2023 • Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore.

Question Answering Visual Question Answering

Paper
Code

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

1 code implementation • CVPR 2023 • Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task.

Continual Learning

Paper
Code

Random Teachers are Good Teachers

1 code implementation • 23 Feb 2023 • Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.

Data Augmentation Self-Supervised Learning

Paper
Code

Cosmology from Galaxy Redshift Surveys with PointNet

no code implementations • 22 Nov 2022 • Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

In this work, we aim to improve upon two-point statistics by employing a \textit{PointNet}-like neural network to regress the values of the cosmological parameters directly from point cloud data.

Paper
Add Code

The Curious Case of Benign Memorization

no code implementations • 25 Oct 2022 • Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.

Data Augmentation Memorization

Paper
Add Code

Decoding a Neural Retriever's Latent Space for Query Suggestion

1 code implementation • 21 Oct 2022 • Leonard Adolphs, Michelle Chen Huebscher, Christian Buck, Sertan Girgin, Olivier Bachem, Massimiliano Ciaramita, Thomas Hofmann

Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice.

Information Retrieval Retrieval

Paper
Code

Mastering Spatial Graph Prediction of Road Networks

no code implementations • ICCV 2023 • Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

Accurately predicting road networks from satellite images requires a global understanding of the network topology.

Reinforcement Learning (RL)

Paper
Add Code

OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters

1 code implementation • 19 Jul 2022 • Piera Riccio, Bill Psomas, Francesco Galati, Francisco Escolano, Thomas Hofmann, Nuria Oliver

Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics.

Marketing

Paper
Code

How Tempering Fixes Data Augmentation in Bayesian Neural Networks

no code implementations • 27 May 2022 • Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing.

Data Augmentation

Paper
Add Code

Phenomenology of Double Descent in Finite-Width Neural Networks

no code implementations • ICLR 2022 • Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.

Paper
Add Code

Generalization Through The Lens Of Leave-One-Out Error

1 code implementation • ICLR 2022 • Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi

Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited.

Generalization Bounds Transfer Learning

Paper
Code

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

2 code implementations • 26 Jan 2022 • Dimitri von Rütte, Luca Biggio, Yannic Kilcher, Thomas Hofmann

Generating music with deep neural networks has been an area of active research in recent years.

Inductive Bias Music Generation

581

Paper
Code

On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics

no code implementations • 2 Jan 2022 • Enea Monzio Compagnoni, Anna Scampicchio, Luca Biggio, Antonio Orvieto, Thomas Hofmann, Josef Teichmann

Many finance, physics, and engineering phenomena are modeled by continuous-time dynamical systems driven by highly irregular (stochastic) inputs.

LEMMA Time Series +1

Paper
Add Code

Boosting Search Engines with Interactive Agents

no code implementations • 1 Sep 2021 • Leonard Adolphs, Benjamin Boerschinger, Christian Buck, Michelle Chen Huebscher, Massimiliano Ciaramita, Lasse Espeholt, Thomas Hofmann, Yannic Kilcher, Sascha Rothe, Pier Giuseppe Sessa, Lierni Sestorain Saralegui

This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks.

Information Retrieval Reading Comprehension +3

Paper
Add Code

How to Query Language Models?

1 code implementation • 4 Aug 2021 • Leonard Adolphs, Shehzaad Dhuliawala, Thomas Hofmann

We apply this approach of querying by example to the LAMA probe and obtain substantial improvements of up to 37. 8% for BERT-large on the T-REx data when providing only 10 demonstrations--even outperforming a baseline that queries the model with up to 40 paraphrases of the question.

Paper
Code

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

no code implementations • NeurIPS 2021 • Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks.

Paper
Add Code

Precise characterization of the prior predictive distribution of deep ReLU networks

no code implementations • NeurIPS 2021 • Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas Hofmann

Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture.

Paper
Add Code

Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

no code implementations • NeurIPS 2021 • Lorenzo Noci, Kevin Roth, Gregor Bachmann, Sebastian Nowozin, Thomas Hofmann

The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.

Data Augmentation

Paper
Add Code

Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks

no code implementations • 7 Jun 2021 • Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi

This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks.

Paper
Add Code

Uniform Convergence, Adversarial Spheres and a Simple Remedy

no code implementations • 7 May 2021 • Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Thomas Hofmann

By considering a specific dataset, it was observed that a neural network completely misclassifies a projection of the training data (adversarial set), rendering any existing generalization bound based on uniform convergence vacuous.

Paper
Add Code

Learning Generative Models of Textured 3D Meshes from Real-World Images

1 code implementation • ICCV 2021 • Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi

Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections.

Pose Estimation

154

Paper
Code

Generative Minimization Networks: Training GANs Without Competition

no code implementations • 23 Mar 2021 • Paulina Grnarova, Yannic Kilcher, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann

Among known problems experienced by practitioners is the lack of convergence guarantees or convergence to a non-optimum cycle.

Paper
Add Code

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

no code implementations • 21 Mar 2021 • Pelin Dogan-Schönberger, Julian Mäder, Thomas Hofmann

Swiss German is a dialect continuum whose natively acquired dialects significantly differ from the formal variety of the language.

Speech Synthesis

Paper
Add Code

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.

Numerical Integration

Paper
Add Code

Batch normalization provably avoids ranks collapse for randomly initialised deep networks

no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Paper
Add Code

Convolutional Generation of Textured 3D Meshes

1 code implementation • NeurIPS 2020 • Dario Pavllo, Graham Spinks, Thomas Hofmann, Marie-Francine Moens, Aurelien Lucchi

A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.

114

Paper
Code

BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

no code implementations • 5 Mar 2020 • Florian Schmidt, Thomas Hofmann

Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination.

Paper
Add Code

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Paper
Add Code

Controlling Style and Semantics in Weakly-Supervised Image Generation

1 code implementation • ECCV 2020 • Dario Pavllo, Aurelien Lucchi, Thomas Hofmann

We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene.

Conditional Image Generation

143

Paper
Code

Mixing of Stochastic Accelerated Gradient Descent

no code implementations • 31 Oct 2019 • Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann

We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.

Stochastic Optimization

Paper
Add Code

Adversarial Training Generalizes Data-dependent Spectral Norm Regularization

no code implementations • 25 Sep 2019 • Kevin Roth, Yannic Kilcher, Thomas Hofmann

We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks.

Paper
Add Code

LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games

no code implementations • 4 Sep 2019 • Leonard Adolphs, Thomas Hofmann

We, however, consider the task of designing an agent that not just succeeds in a single game, but performs well across a whole family of games, sharing the same theme.

Atari Games Hierarchical Reinforcement Learning +3

Paper
Add Code

Autoregressive Text Generation Beyond Feedback Loops

1 code implementation • IJCNLP 2019 • Florian Schmidt, Stephan Mandt, Thomas Hofmann

Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models.

Sentence Text Generation

Paper
Code

Cosmological N-body simulations: a challenge for scalable generative models

1 code implementation • 15 Aug 2019 • Nathanaël Perraudin, Ankit Srivastava, Aurelien Lucchi, Tomasz Kacprzak, Thomas Hofmann, Alexandre Réfrégier

Our results show that the proposed model produces samples of high visual quality, although the statistical analysis reveals that capturing rare features in the data poses significant problems for the generative models.

Paper
Code

Cosmological constraints with deep learning from KiDS-450 weak lensing maps

no code implementations • 7 Jun 2019 • Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann, Aurel Schneider

We present the cosmological results with a CNN from the KiDS-450 tomographic weak lensing dataset, constraining the total matter density $\Omega_m$, the fluctuation amplitude $\sigma_8$, and the intrinsic alignment amplitude $A_{\rm{IA}}$.

Cosmology and Nongalactic Astrophysics

Paper
Add Code

Adversarial Training is a Form of Data-dependent Operator Norm Regularization

no code implementations • NeurIPS 2020 • Kevin Roth, Yannic Kilcher, Thomas Hofmann

We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks.

Paper
Add Code

Escaping Flat Areas via Function-Preserving Structural Network Modifications

no code implementations • ICLR 2019 • Yannic Kilcher, Gary Bécigneul, Thomas Hofmann

We develop our method for fully-connected as well as convolutional layers.

Paper
Add Code

Evaluating GANs via Duality

no code implementations • ICLR 2019 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Thomas Hofmann, Andreas Krause

Generative Adversarial Networks (GANs) have shown great results in accurately modeling complex distributions, but their training is known to be difficult due to instabilities caused by a challenging minimax optimization problem.

Paper
Add Code

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

1 code implementation • 13 Feb 2019 • Kevin Roth, Yannic Kilcher, Thomas Hofmann

We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack.

Paper
Code

A domain agnostic measure for monitoring and evaluating GANs

1 code implementation • NeurIPS 2019 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause

Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training.

Paper
Code

Learning and Evaluating Sparse Interpretable Sentence Embeddings

no code implementations • WS 2018 • Valentin Trifonov, Octavian-Eugen Ganea, Anna Potapenko, Thomas Hofmann

Previous research on word embeddings has shown that sparse representations, which can be either learned on top of existing dense embeddings or obtained through model constraints during training time, have the benefit of increased interpretability properties: to some degree, each dimension can be understood by a human and associated with a recognizable feature in the data.

Sentence Sentence Embedding +2

Paper
Add Code

End-to-End Neural Entity Linking

1 code implementation • CONLL 2018 • Nikolaos Kolitsas, Octavian-Eugen Ganea, Thomas Hofmann

Entity Linking (EL) is an essential task for semantic text understanding and information extraction.

Ranked #1 on Entity Linking on OKE-2015

Entity Disambiguation Entity Embeddings +2

226

Paper
Code

Cosmological constraints from noisy convergence maps through deep learning

no code implementations • 23 Jul 2018 • Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann

We find that, for a shape noise level corresponding to 8. 53 galaxies/arcmin$^2$ and the smoothing scale of $\sigma_s = 2. 34$ arcmin, the network is able to generate 45% tighter constraints.

Cosmology and Nongalactic Astrophysics

Paper
Add Code

A Distributed Second-Order Algorithm You Can Trust

no code implementations • ICML 2018 • Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years.

Distributed Optimization Second-order methods

Paper
Add Code

Deep State Space Models for Unconditional Word Generation

no code implementations • NeurIPS 2018 • Florian Schmidt, Thomas Hofmann

Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models.

Text Generation Variational Inference

Paper
Add Code

Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization

no code implementations • 27 May 2018 • Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann

Normalization techniques such as Batch Normalization have been applied successfully for training deep neural networks.

Paper
Add Code

Zero-Shot Dual Machine Translation

1 code implementation • 25 May 2018 • Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann

Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available.

Machine Translation NMT +1

Paper
Code

Hyperbolic Neural Networks

3 code implementations • NeurIPS 2018 • Octavian-Eugen Ganea, Gary Bécigneul, Thomas Hofmann

However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers.

Graph Representation Learning Natural Language Inference +2

550

Paper
Code

Adversarially Robust Training through Structured Gradient Regularization

no code implementations • 22 May 2018 • Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations.

Paper
Add Code

Local Saddle Point Optimization: A Curvature Exploitation Approach

1 code implementation • 15 May 2018 • Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.

Paper
Code

Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

3 code implementations • ICML 2018 • Octavian-Eugen Ganea, Gary Bécigneul, Thomas Hofmann

Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning.

Ranked #1 on Link Prediction on WordNet

Graph Embedding Hypernym Discovery +2

125

Paper
Code

Escaping Saddles with Stochastic Gradients

no code implementations • ICML 2018 • Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.

Paper
Add Code

Fast cosmic web simulations with generative adversarial networks

no code implementations • 27 Jan 2018 • Andres C. Rodriguez, Tomasz Kacprzak, Aurelien Lucchi, Adam Amara, Raphael Sgier, Janis Fluri, Thomas Hofmann, Alexandre Réfrégier

Computational models of the underlying physical processes, such as classical N-body simulations, are extremely resource intensive, as they track the action of gravity in an expanding universe using billions of particles as tracers of the cosmic matter distribution.

Paper
Add Code

The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels

no code implementations • 15 Nov 2017 • Yannic Kilcher, Thomas Hofmann

Black-Box attacks on machine learning models occur when an attacker, despite having no access to the inner workings of a model, can successfully craft an attack by means of model theft.

Paper
Add Code

Semantic Interpolation in Implicit Models

no code implementations • ICLR 2018 • Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

In implicit models, one often interpolates between sampled points in latent space.

Paper
Add Code

Parametrizing filters of a CNN with a GAN

no code implementations • ICLR 2018 • Yannic Kilcher, Gary Becigneul, Thomas Hofmann

It is commonly agreed that the use of relevant invariances as a good statistical bias is important in machine-learning.

Generative Adversarial Network

Paper
Add Code

Flexible Prior Distributions for Deep Generative Models

no code implementations • ICLR 2018 • Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

We consider the problem of training generative models with deep neural networks as generators, i. e. to map latent codes to data points.

Paper
Add Code

Generator Reversal

no code implementations • 28 Jul 2017 • Yannic Kilcher, Aurélien Lucchi, Thomas Hofmann

We consider the problem of training generative models with deep neural networks as generators, i. e. to map latent codes to data points.

Paper
Add Code

Learning Aerial Image Segmentation from Online Maps

2 code implementations • 21 Jul 2017 • Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler

We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations.

General Classification Image Segmentation +2

Paper
Code

Cosmological model discrimination with Deep Learning

no code implementations • 17 Jul 2017 • Jorit Schmelzle, Aurelien Lucchi, Tomasz Kacprzak, Adam Amara, Raphael Sgier, Alexandre Réfrégier, Thomas Hofmann

We find that our implementation of DCNN outperforms the skewness and kurtosis statistics, especially for high noise levels.

Paper
Add Code

Accelerated Dual Learning by Homotopic Initialization

no code implementations • 13 Jun 2017 • Hadi Daneshmand, Hamed Hassani, Thomas Hofmann

Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning.

Paper
Add Code

An Online Learning Approach to Generative Adversarial Networks

1 code implementation • ICLR 2018 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Thomas Hofmann, Andreas Krause

We consider the problem of training generative models with a Generative Adversarial Network (GAN).

Generative Adversarial Network

Paper
Code

Stabilizing Training of Generative Adversarial Networks through Regularization

1 code implementation • NeurIPS 2017 • Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters.

Image Generation

Paper
Code

Deep Joint Entity Disambiguation with Local Neural Attention

3 code implementations • EMNLP 2017 • Octavian-Eugen Ganea, Thomas Hofmann

We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations.

Ranked #4 on Entity Disambiguation on WNED-CWEB

Entity Disambiguation

224

Paper
Code

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

1 code implementation • 7 Mar 2017 • Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi

This paper presents a novel approach for multi-lingual sentiment classification in short texts.

General Classification Language Modelling +2

Paper
Code

A Semi-supervised Framework for Image Captioning

1 code implementation • 16 Nov 2016 • Wenhu Chen, Aurelien Lucchi, Thomas Hofmann

We here propose a novel way of using such textual data by artificially generating missing visual information.

Image Captioning Word Embeddings

Paper
Code

Fully Character-Level Neural Machine Translation without Explicit Segmentation

2 code implementations • TACL 2017 • Jason Lee, Kyunghyun Cho, Thomas Hofmann

We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.

Machine Translation NMT +1

144

Paper
Code

DynaNewton - Accelerating Newton's Method for Machine Learning

no code implementations • 20 May 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.

BIG-bench Machine Learning

Paper
Add Code

Starting Small -- Learning with Adaptive Sample Sizes

no code implementations • 9 Mar 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.

BIG-bench Machine Learning

Paper
Add Code

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

1 code implementation • 8 Sep 2015 • Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, Thomas Hofmann

We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods.

Entity Disambiguation Entity Linking +3

Paper
Code

Variance Reduced Stochastic Gradient Descent with Neighbors

no code implementations • NeurIPS 2015 • Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms.

Memorization

Paper
Add Code

A Variance Reduced Stochastic Newton Method

no code implementations • 28 Mar 2015 • Aurelien Lucchi, Brian McWilliams, Thomas Hofmann

Quasi-Newton methods are widely used in practise for convex loss minimization problems.

Paper
Add Code

Communication-Efficient Distributed Dual Coordinate Ascent

no code implementations • NeurIPS 2014 • Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael. I. Jordan

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning.

BIG-bench Machine Learning Distributed Optimization

Paper
Add Code

Probabilistic Latent Semantic Analysis

3 code implementations • 23 Jan 2013 • Thomas Hofmann

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas.

Information Retrieval Retrieval

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.