Search Results for author: Arie van Deursen

Found 27 papers, 14 papers with code

An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets

1 code implementation22 Mar 2024 Jonathan Katzy, Răzvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi

Based on the findings of our study, which highlights the pervasive issue of license inconsistencies in large language models trained on code, our recommendation for both researchers and the community is to prioritize the development and adoption of best practices for dataset creation and management.

Language Modelling Large Language Model

Language Models for Code Completion: A Practical Evaluation

1 code implementation25 Feb 2024 Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen

InCoder outperformed the other models across all programming languages, highlighting the significance of training data and objectives.

Code Completion valid

McUDI: Model-Centric Unsupervised Degradation Indicator for Failure Prediction AIOps Solutions

1 code implementation25 Jan 2024 Lorena Poenaru-Olaru, Luis Cruz, Jan Rellermeyer, Arie van Deursen

Due to the continuous change in operational data, AIOps solutions suffer from performance degradation over time.

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

no code implementations15 Jan 2024 Arumoy Shome, Luis Cruz, Arie van Deursen

We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes.

Fairness

Traces of Memorisation in Large Language Models for Code

1 code implementation18 Dec 2023 Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

We find that large language models for code are vulnerable to data extraction attacks, like their natural language counterparts.

Code Completion

Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals

1 code implementation17 Dec 2023 Patrick Altmeyer, Mojtaba Farmanbar, Arie van Deursen, Cynthia C. S. Liem

We formalise this notion of faithfulness through the introduction of a tailored evaluation metric and propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits.

Conformal Prediction counterfactual

On the Impact of Language Selection for Training and Evaluating Programming Language Models

no code implementations25 Aug 2023 Jonathan Katzy, Maliheh Izadi, Arie van Deursen

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models.

Endogenous Macrodynamics in Algorithmic Recourse

1 code implementation16 Aug 2023 Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia C. S. Liem

Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata.

counterfactual valid

Explaining Black-Box Models through Counterfactuals

1 code implementation14 Aug 2023 Patrick Altmeyer, Arie van Deursen, Cynthia C. S. Liem

We present CounterfactualExplanations. jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia.

counterfactual Explainable artificial intelligence

Batching for Green AI -- An Exploratory Study on Inference

no code implementations21 Jul 2023 Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen

In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication.

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

1 code implementation24 Apr 2023 Tim van Dam, Maliheh Izadi, Arie van Deursen

For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes).

Code Completion

Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI

no code implementations24 Mar 2023 Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen

To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.

Bayesian Optimisation

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

no code implementations13 Feb 2023 Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge.

Inference Attack Language Modelling +2

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

1 code implementation4 Jan 2023 Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Premkumar Devanbu, Arie van Deursen

While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.

Are Concept Drift Detectors Reliable Alarming Systems? -- A Comparative Study

no code implementations23 Nov 2022 Lorena Poenaru-Olaru, Luis Cruz, Arie van Deursen, Jan S. Rellermeyer

We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors.

Management

An Empirical Study on Data Leakage and Generalizability of Link Prediction Models for Issues and Commits

no code implementations1 Nov 2022 Maliheh Izadi, Pooya Rostami Mazrae, Tom Mens, Arie van Deursen

However, these approaches primarily focused on improving prediction accuracy on randomly-split datasets, with limited attention given to the impact of data leakage and the generalizability of the predictive models.

Link Prediction Transfer Learning

Code Smells for Machine Learning Applications

1 code implementation25 Mar 2022 Haiyin Zhang, Luís Cruz, Arie van Deursen

Hence ensuring code quality is quintessential to avoid issues in the long run.

BIG-bench Machine Learning

Data Smells in Public Datasets

no code implementations15 Mar 2022 Arumoy Shome, Luis Cruz, Arie van Deursen

The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI.

Autonomous Driving

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

no code implementations4 Feb 2022 Jiyang Zhang, Chandra Maddila, Ram Bairi, Christian Bird, Ujjwal Raizada, Apoorva Agrawal, Yamini Jhawar, Kim Herzig, Arie van Deursen

Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community.

Graph Representation Learning Management +1

"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint

1 code implementation20 Jan 2022 Bart van Oort, Luís Cruz, Babak Loni, Arie van Deursen

We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint.

Management

The Prevalence of Code Smells in Machine Learning projects

2 code implementations6 Mar 2021 Bart van Oort, Luís Cruz, Maurício Aniche, Arie van Deursen

Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation.

BIG-bench Machine Learning Management

An Exploratory Study of Log Placement Recommendation in an Enterprise System

no code implementations2 Mar 2021 Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen

In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company.

ConE: A Concurrent Edit Detection Tool for Large Scale Software Development

no code implementations16 Jan 2021 Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, Arie van Deursen

We study half a year of changes made to six large repositories in Microsoft in which at least 1, 000 pull requests are created each month.

Nudge: Accelerating Overdue Pull Requests Towards Completion

no code implementations25 Nov 2020 Chandra Maddila, Sai Surya Upadrasta, Chetan Bansal, Nachiappan Nagappan, Georgios Gousios, Arie van Deursen

The key novelty of Nudge is that it succeeds in reducing pull request resolution time, while ensuring that developers perceive the notifications sent as useful, at the scale of thousands of repositories.

Action Detection Activity Detection

AI Lifecycle Models Need To Be Revised. An Exploratory Study in Fintech

no code implementations3 Oct 2020 Mark Haakman, Luís Cruz, Hennie Huijgens, Arie van Deursen

Thus, the same development processes and standards in software engineering ought to be complied in artificial intelligence systems.

Software Engineering 68T01 I.2.0; D.2.9

Cannot find the paper you are looking for? You can Submit a new open access paper.