1 code implementation • 28 Feb 2024 • Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
Pareto front profiling in multi-objective optimization (MOO), i. e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training.
no code implementations • 9 Feb 2024 • Gresa Shala, André Biedenkapp, Josif Grabocka
We introduce Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach.
no code implementations • 6 Feb 2024 • Guri Zabërgja, Arlind Kadra, Josif Grabocka
In this paper, we introduce a large-scale empirical study comparing neural networks against gradient-boosted decision trees on tabular data, but also transformer-based architectures against traditional multi-layer perceptrons (MLP) with residual connections.
1 code implementation • 6 Jun 2023 • Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter, Josif Grabocka
With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset.
1 code implementation • 23 May 2023 • Sebastian Pineda Arango, Josif Grabocka
As a remedy, this paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
Automatic Machine Learning Model Selection Bayesian Optimization +2
1 code implementation • 22 May 2023 • Arlind Kadra, Sebastian Pineda Arango, Josif Grabocka
Through extensive experiments, we demonstrate that our explainable deep networks are as accurate as state-of-the-art classifiers on tabular data.
no code implementations • 14 Apr 2023 • Mofassir ul Islam Arif, Mohsan Jameel, Josif Grabocka, Lars Schmidt-Thieme
We create phantom embeddings from a subset of homogenous samples and use these phantom embeddings to decrease the inter-class similarity of instances in their latent embedding space.
1 code implementation • 27 Mar 2023 • Abdus Salam Khazi, Sebastian Pineda Arango, Josif Grabocka
Automatically optimizing the hyperparameters of Machine Learning algorithms is one of the primary open questions in AI.
1 code implementation • 16 Jun 2022 • Ekrem Öztürk, Fabio Ferreira, Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka, Frank Hutter
Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small?
1 code implementation • 20 Feb 2022 • Martin Wistuba, Arlind Kadra, Josif Grabocka
Multi-fidelity (gray-box) hyperparameter optimization techniques (HPO) have recently emerged as a promising direction for tuning Deep Learning methods.
1 code implementation • ICLR 2022 • Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, Frank Hutter
Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points.
1 code implementation • 14 Oct 2021 • Michael Ruchte, Josif Grabocka
These works also use Multi-Task Learning (MTL) problems to benchmark MOO algorithms treating each task as independent objective.
no code implementations • 29 Sep 2021 • Hadi Samer Jomaa, Sebastian Pineda Arango, Lars Schmidt-Thieme, Josif Grabocka
As a result, our novel DKLM can learn contextualized dataset-specific similarity representations for hyperparameter configurations.
1 code implementation • NeurIPS 2021 • Arlind Kadra, Marius Lindauer, Frank Hutter, Josif Grabocka
Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures.
1 code implementation • 11 Jun 2021 • Sebastian Pineda Arango, Hadi S. Jomaa, Martin Wistuba, Josif Grabocka
Hyperparameter optimization (HPO) is a core problem for the machine learning community and remains largely unsolved due to the significant computational resources required to evaluate hyperparameter configurations.
1 code implementation • 24 Mar 2021 • Michael Ruchte, Josif Grabocka
Prior work either demand optimizing a new network for every point on the Pareto front, or induce a large overhead to the number of trainable parameters by using hyper-networks conditioned on modifiable preferences.
no code implementations • 7 Feb 2021 • Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka
In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss, linking manifold regularization with a dataset similarity measure learned via an auxiliary dataset identification meta-task, effectively enforcing the response approximation for similar datasets to be similar.
1 code implementation • ICLR 2021 • Martin Wistuba, Josif Grabocka
Hyperparameter optimization (HPO) is a central pillar in the automation of machine learning solutions and is mainly performed via Bayesian optimization, where a parametric surrogate is learned to approximate the black box response function (e. g. validation error).
1 code implementation • 1 Jan 2021 • Michael Ruchte, Arber Zela, Julien Niklas Siems, Josif Grabocka, Frank Hutter
Neural Architecture Search (NAS) is one of the focal points for the Deep Learning community, but reproducing NAS methods is extremely challenging due to numerous low-level implementation details.
no code implementations • 1 Jan 2021 • Arlind Kadra, Marius Lindauer, Frank Hutter, Josif Grabocka
The regularization of prediction models is arguably the most crucial ingredient that allows Machine Learning solutions to generalize well on unseen data.
no code implementations • 1 Jan 2021 • Hadi Samer Jomaa, Lars Schmidt-Thieme, Josif Grabocka
Zero-shot hyper-parameter optimization refers to the process of selecting hyper- parameter configurations that are expected to perform well for a given dataset upfront, without access to any observations of the losses of the target response.
1 code implementation • 28 Oct 2019 • Rafael Rego Drumond, Lukas Brinkmeyer, Josif Grabocka, Lars Schmidt-Thieme
In this paper, we present HIDRA, a meta-learning approach that enables training and evaluating across tasks with any number of target variables.
1 code implementation • 30 Sep 2019 • Lukas Brinkmeyer, Rafael Rego Drumond, Randolf Scholz, Josif Grabocka, Lars Schmidt-Thieme
Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization.
no code implementations • 25 Sep 2019 • Jonas Falkner, Josif Grabocka, Lars Schmidt-Thieme
Compressed forms of deep neural networks are essential in deploying large-scale computational models on resource-constrained devices.
1 code implementation • 27 Jun 2019 • Hadi S. Jomaa, Josif Grabocka, Lars Schmidt-Thieme
More recently, methods have been introduced that build a so-called surrogate model that predicts the validation loss for a specific hyperparameter setting, model and dataset and then sequentially select the next hyperparameter to test, based on a heuristic function of the expected value and the uncertainty of the surrogate model called acquisition function (sequential model-based Bayesian optimization, SMBO).
no code implementations • 24 Jun 2019 • Hadi S. Jomaa, Josif Grabocka, Lars Schmidt-Thieme
In classical Q-learning, the objective is to maximize the sum of discounted rewards through iteratively using the Bellman equation as an update, in an attempt to estimate the action value function of the optimal policy.
1 code implementation • 27 May 2019 • Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka
As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task.
no code implementations • 24 May 2019 • Josif Grabocka, Randolf Scholz, Lars Schmidt-Thieme
Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization.
no code implementations • 25 Feb 2019 • Ahmed Rashed, Josif Grabocka, Lars Schmidt-Thieme
It can be formalized as a multi-relational learning task for predicting nodes labels based on their relations within the network.
no code implementations • 9 Feb 2019 • Shayan Jawed, Eya Boumaiza, Josif Grabocka, Lars Schmidt-Thieme
An active area of research is to increase the safety of self-driving vehicles.
2 code implementations • 20 Dec 2018 • Josif Grabocka, Lars Schmidt-Thieme
Research on time-series similarity measures has emphasized the need for elastic methods which align the indices of pairs of time series and a plethora of non-parametric have been proposed for the task.
no code implementations • 2 Nov 2017 • Dripta S. Raychaudhuri, Josif Grabocka, Lars Schmidt-Thieme
Time series shapelets are discriminative sub-sequences and their similarity to time series can be used for time series classification.
no code implementations • 3 May 2015 • Josif Grabocka, Nicolas Schilling, Lars Schmidt-Thieme
We demonstrate that searching is non-optimal since the domain of motifs is restricted, and instead we propose a principled optimization approach able to find optimal motifs.
no code implementations • 17 Mar 2015 • Martin Wistuba, Josif Grabocka, Lars Schmidt-Thieme
A method for using shapelets for multivariate time series is proposed and Ultra-Fast Shapelets is proven to be successful in comparison to state-of-the-art multivariate time series classifiers on 15 multivariate time series datasets from various domains.
no code implementations • 11 Mar 2015 • Josif Grabocka, Martin Wistuba, Lars Schmidt-Thieme
Time-series classification is an important problem for the data mining community due to the wide range of application domains involving time-series data.
no code implementations • 23 Dec 2013 • Josif Grabocka, Lars Schmidt-Thieme
Time-series classification is an important domain of machine learning and a plethora of methods have been developed for the task.
no code implementations • 24 Jul 2013 • Josif Grabocka, Martin Wistuba, Lars Schmidt-Thieme
The coefficients of the polynomial functions are converted to symbolic words via equivolume discretizations of the coefficients' distributions.