Search Results for author: Sebastian Schelter

Found 14 papers, 9 papers with code

Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

no code implementations • 30 Apr 2024 • Stefan Grafberger, Paul Groth, Sebastian Schelter

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings.

Paper
Add Code

Hierarchical Forecasting at Scale

1 code implementation • 19 Oct 2023 • Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke

We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level.

Time Series

Paper
Code

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

1 code implementation • 6 Jul 2023 • Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang

There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function.

Imputation Question Answering +1

Paper
Code

On the Impact of Outlier Bias on User Clicks

1 code implementation • 1 May 2023 • Fatemeh Sarvi, Ali Vardasbi, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke

We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM).

counterfactual Learning-To-Rank +1

Paper
Code

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation • 23 Apr 2022 • Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.

BIG-bench Machine Learning Fairness

Paper
Code

Efficiently Maintaining Next Basket Recommendations under Additions and Deletions of Baskets and Items

1 code implementation • 27 Jan 2022 • Benjamin Longxiang Wang, Sebastian Schelter

Our results show that our method provides constant update time efficiency with respect to an additional user basket in the incremental case, and linear efficiency in the decremental case where we delete existing baskets.

Next-basket recommendation Sequential Recommendation

Paper
Code

Understanding and Mitigating the Effect of Outliers in Fair Ranking

1 code implementation • 21 Dec 2021 • Fatemeh Sarvi, Maria Heuss, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke

We formalize outlierness in a ranking, show that outliers are present in realistic datasets, and present the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers.

Fairness Outlier Detection +1

Paper
Code

Parameter Efficient Deep Probabilistic Forecasting

1 code implementation • 6 Dec 2021 • Olivier Sprangers, Sebastian Schelter, Maarten de Rijke

However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models.

Probabilistic Time Series Forecasting Time Series

Paper
Code

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

1 code implementation • 3 Jun 2021 • Olivier Sprangers, Sebastian Schelter, Maarten de Rijke

We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner.

regression Time Series Analysis

135

Paper
Code

Analyzing and Predicting Purchase Intent in E-commerce: Anonymous vs. Identified Customers

no code implementations • 16 Dec 2020 • Mariya Hendriksen, Ernst Kuiper, Pim Nauts, Sebastian Schelter, Maarten de Rijke

In this paper, we focus on purchase prediction for both anonymous and identified sessions on an e-commerce platform.

Descriptive

Paper
Add Code

A Comparison of Supervised Learning to Match Methods for Product Search

1 code implementation • 20 Jul 2020 • Fatemeh Sarvi, Nikos Voskarides, Lois Mooiman, Sebastian Schelter, Maarten de Rijke

As recent learning to match methods have made important advances in bridging the vocabulary gap for these traditional IR areas, we investigate their potential in the context of product search.

Attribute Information Retrieval +2

Paper
Code

FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

no code implementations • 28 Nov 2019 • Sebastian Schelter, Yuxuan He, Jatin Khilnani, Julia Stoyanovich

FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning.

BIG-bench Machine Learning Decision Making +2

Paper
Add Code

Doubly stochastic large scale kernel learning with the empirical kernel map

no code implementations • 2 Sep 2016 • Nikolaas Steenbergen, Sebastian Schelter, Felix Bießmann

With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again.

Stochastic Optimization

Paper
Add Code

Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

no code implementations • 3 Nov 2014 • Sebastian Schelter, Venu Satuluri, Reza Zadeh

We present Factorbird, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.