Search Results for author: Arie van Deursen

Found 27 papers, 14 papers with code

An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets

1 code implementation • 22 Mar 2024 • Jonathan Katzy, Răzvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi

Based on the findings of our study, which highlights the pervasive issue of license inconsistencies in large language models trained on code, our recommendation for both researchers and the community is to prioritize the development and adoption of best practices for dataset creation and management.

Language Modelling Large Language Model

Paper
Code

Language Models for Code Completion: A Practical Evaluation

1 code implementation • 25 Feb 2024 • Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen

InCoder outperformed the other models across all programming languages, highlighting the significance of training data and objectives.

Code Completion valid

Paper
Code

McUDI: Model-Centric Unsupervised Degradation Indicator for Failure Prediction AIOps Solutions

1 code implementation • 25 Jan 2024 • Lorena Poenaru-Olaru, Luis Cruz, Jan Rellermeyer, Arie van Deursen

Due to the continuous change in operational data, AIOps solutions suffer from performance degradation over time.

Paper
Code

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

no code implementations • 15 Jan 2024 • Arumoy Shome, Luis Cruz, Arie van Deursen

We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes.

Fairness

Paper
Add Code

Traces of Memorisation in Large Language Models for Code

1 code implementation • 18 Dec 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

We find that large language models for code are vulnerable to data extraction attacks, like their natural language counterparts.

Code Completion

Paper
Code

Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals

1 code implementation • 17 Dec 2023 • Patrick Altmeyer, Mojtaba Farmanbar, Arie van Deursen, Cynthia C. S. Liem

We formalise this notion of faithfulness through the introduction of a tailored evaluation metric and propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits.

Conformal Prediction counterfactual

Paper
Code

Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World

1 code implementation • 17 Nov 2023 • Lorena Poenaru-Olaru, Natalia Karpova, Luis Cruz, Jan Rellermeyer, Arie van Deursen

Anomaly detection techniques are essential in automating the monitoring of IT systems and operations.

Anomaly Detection

Paper
Code

On the Impact of Language Selection for Training and Evaluating Programming Language Models

no code implementations • 25 Aug 2023 • Jonathan Katzy, Maliheh Izadi, Arie van Deursen

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models.

Paper
Add Code

Endogenous Macrodynamics in Algorithmic Recourse

1 code implementation • 16 Aug 2023 • Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia C. S. Liem

Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata.

counterfactual valid

Paper
Code

Explaining Black-Box Models through Counterfactuals

1 code implementation • 14 Aug 2023 • Patrick Altmeyer, Arie van Deursen, Cynthia C. S. Liem

We present CounterfactualExplanations. jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia.

counterfactual Explainable artificial intelligence

104

Paper
Code

Batching for Green AI -- An Exploratory Study on Inference

no code implementations • 21 Jul 2023 • Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen

In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication.

Paper
Add Code

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

1 code implementation • 24 Apr 2023 • Tim van Dam, Maliheh Izadi, Arie van Deursen

For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes).

Code Completion

Paper
Code

Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI

no code implementations • 24 Mar 2023 • Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen

To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.

Bayesian Optimisation

Paper
Add Code

STACC: Code Comment Classification using SentenceTransformers

1 code implementation • 25 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

Code comments are a key resource for information about software artefacts.

Classification

Paper
Code

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

no code implementations • 13 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge.

Inference Attack Language Modelling +2

Paper
Add Code

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

1 code implementation • 4 Jan 2023 • Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Premkumar Devanbu, Arie van Deursen

While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.

Paper
Code

Are Concept Drift Detectors Reliable Alarming Systems? -- A Comparative Study

no code implementations • 23 Nov 2022 • Lorena Poenaru-Olaru, Luis Cruz, Arie van Deursen, Jan S. Rellermeyer

We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors.

Management

Paper
Add Code

An Empirical Study on Data Leakage and Generalizability of Link Prediction Models for Issues and Commits

no code implementations • 1 Nov 2022 • Maliheh Izadi, Pooya Rostami Mazrae, Tom Mens, Arie van Deursen

However, these approaches primarily focused on improving prediction accuracy on randomly-split datasets, with limited attention given to the impact of data leakage and the generalizability of the predictive models.

Link Prediction Transfer Learning

Paper
Add Code

Code Smells for Machine Learning Applications

1 code implementation • 25 Mar 2022 • Haiyin Zhang, Luís Cruz, Arie van Deursen

Hence ensuring code quality is quintessential to avoid issues in the long run.

BIG-bench Machine Learning

Paper
Code

Data Smells in Public Datasets

no code implementations • 15 Mar 2022 • Arumoy Shome, Luis Cruz, Arie van Deursen

The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI.

Autonomous Driving

Paper
Add Code

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

no code implementations • 4 Feb 2022 • Jiyang Zhang, Chandra Maddila, Ram Bairi, Christian Bird, Ujjwal Raizada, Apoorva Agrawal, Yamini Jhawar, Kim Herzig, Arie van Deursen

Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community.

Graph Representation Learning Management +1

Paper
Add Code

"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint

1 code implementation • 20 Jan 2022 • Bart van Oort, Luís Cruz, Babak Loni, Arie van Deursen

We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint.

Management

Paper
Code

The Prevalence of Code Smells in Machine Learning projects

2 code implementations • 6 Mar 2021 • Bart van Oort, Luís Cruz, Maurício Aniche, Arie van Deursen

Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation.

BIG-bench Machine Learning Management

Paper
Code

An Exploratory Study of Log Placement Recommendation in an Enterprise System

no code implementations • 2 Mar 2021 • Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen

In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company.

Paper
Add Code

ConE: A Concurrent Edit Detection Tool for Large Scale Software Development

no code implementations • 16 Jan 2021 • Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, Arie van Deursen

We study half a year of changes made to six large repositories in Microsoft in which at least 1, 000 pull requests are created each month.

Paper
Add Code

Nudge: Accelerating Overdue Pull Requests Towards Completion

no code implementations • 25 Nov 2020 • Chandra Maddila, Sai Surya Upadrasta, Chetan Bansal, Nachiappan Nagappan, Georgios Gousios, Arie van Deursen

The key novelty of Nudge is that it succeeds in reducing pull request resolution time, while ensuring that developers perceive the notifications sent as useful, at the scale of thousands of repositories.

Action Detection Activity Detection

Paper
Add Code

AI Lifecycle Models Need To Be Revised. An Exploratory Study in Fintech

no code implementations • 3 Oct 2020 • Mark Haakman, Luís Cruz, Hennie Huijgens, Arie van Deursen

Thus, the same development processes and standards in software engineering ought to be complied in artificial intelligence systems.

Software Engineering 68T01 I.2.0; D.2.9

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.