no code implementations • 5 Mar 2024 • Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones, Ehsan Amid
It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem.
2 code implementations • 29 Jan 2024 • Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński
We consider the optimization of complex performance metrics in multi-label classification under the population utility framework.
2 code implementations • NeurIPS 2023 • Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński
As such, it is characterized by long-tail labels, i. e., most labels have very few positive instances.
no code implementations • 13 Feb 2022 • Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth
We present the surprising result that randomly initialized neural networks are good feature extractors in expectation.
no code implementations • 5 Jul 2021 • Tim van Erven, Sarah Sachs, Wouter M. Koolen, Wojciech Kotłowski
If the outliers are chosen adversarially, we show that a simple filtering strategy on extreme gradients incurs O(k) additive overhead compared to the usual regret bounds, and that this is unimprovable, which means that k needs to be sublinear in the number of rounds.
no code implementations • 16 Oct 2020 • Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid
It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i. e. very sparse.
no code implementations • 20 Feb 2019 • Michał Kempka, Wojciech Kotłowski, Manfred K. Warmuth
We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight.
no code implementations • 8 Feb 2019 • Wojciech Kotłowski, Gergely Neu
We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices.
no code implementations • 21 Feb 2018 • Dirk van der Hoeven, Tim van Erven, Wojciech Kotłowski
A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods.
no code implementations • 23 Aug 2017 • Wojciech Kotłowski
We first give a negative result, showing that no algorithm can achieve a meaningful bound in terms of scale-invariant norm of the comparator in the worst case.
no code implementations • ICML 2017 • Krzysztof Dembczyński, Wojciech Kotłowski, Oluwasanmi Koyejo, Nagarajan Natarajan
Statistical learning theory is at an inflection point enabled by recent advances in understanding and optimizing a wide range of metrics.
no code implementations • 14 Mar 2016 • Wojciech Kotłowski, Wouter M. Koolen, Alan Malek
We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $O\big(T^{1/3} \log^{2/3}(T)\big)$ and present a matching $\Omega(T^{1/3})$ lower bound on regret.
no code implementations • 16 Jun 2015 • Wojciech Kotłowski, Manfred K. Warmuth
We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$.
no code implementations • 27 Apr 2015 • Wojciech Kotłowski, Krzysztof Dembczyński
We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\widehat{\theta}$) measured with respect to the target metric is upperbounded by the regret of $f$ measured with respect to the surrogate loss.
no code implementations • 5 Dec 2014 • Wojciech Kotłowski
First, a real-valued function is learned by minimizing a surrogate loss for binary classification, such as logistic loss, on the training sample.