1 code implementation • 22 Mar 2024 • Jonathan Katzy, Răzvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi
Based on the findings of our study, which highlights the pervasive issue of license inconsistencies in large language models trained on code, our recommendation for both researchers and the community is to prioritize the development and adoption of best practices for dataset creation and management.
1 code implementation • 25 Feb 2024 • Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen
InCoder outperformed the other models across all programming languages, highlighting the significance of training data and objectives.
1 code implementation • 25 Jan 2024 • Lorena Poenaru-Olaru, Luis Cruz, Jan Rellermeyer, Arie van Deursen
Due to the continuous change in operational data, AIOps solutions suffer from performance degradation over time.
no code implementations • 15 Jan 2024 • Arumoy Shome, Luis Cruz, Arie van Deursen
We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes.
1 code implementation • 18 Dec 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen
We find that large language models for code are vulnerable to data extraction attacks, like their natural language counterparts.
1 code implementation • 17 Dec 2023 • Patrick Altmeyer, Mojtaba Farmanbar, Arie van Deursen, Cynthia C. S. Liem
We formalise this notion of faithfulness through the introduction of a tailored evaluation metric and propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits.
1 code implementation • 17 Nov 2023 • Lorena Poenaru-Olaru, Natalia Karpova, Luis Cruz, Jan Rellermeyer, Arie van Deursen
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations.
no code implementations • 25 Aug 2023 • Jonathan Katzy, Maliheh Izadi, Arie van Deursen
The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models.
1 code implementation • 16 Aug 2023 • Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia C. S. Liem
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata.
1 code implementation • 14 Aug 2023 • Patrick Altmeyer, Arie van Deursen, Cynthia C. S. Liem
We present CounterfactualExplanations. jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia.
no code implementations • 21 Jul 2023 • Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen
In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication.
1 code implementation • 24 Apr 2023 • Tim van Dam, Maliheh Izadi, Arie van Deursen
For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes).
no code implementations • 24 Mar 2023 • Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen
To expand the application of Green AI, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.
1 code implementation • 25 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen
Code comments are a key resource for information about software artefacts.
no code implementations • 13 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen
In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge.
1 code implementation • 4 Jan 2023 • Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Premkumar Devanbu, Arie van Deursen
While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.
no code implementations • 23 Nov 2022 • Lorena Poenaru-Olaru, Luis Cruz, Arie van Deursen, Jan S. Rellermeyer
We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors.
no code implementations • 1 Nov 2022 • Maliheh Izadi, Pooya Rostami Mazrae, Tom Mens, Arie van Deursen
However, these approaches primarily focused on improving prediction accuracy on randomly-split datasets, with limited attention given to the impact of data leakage and the generalizability of the predictive models.
1 code implementation • 25 Mar 2022 • Haiyin Zhang, Luís Cruz, Arie van Deursen
Hence ensuring code quality is quintessential to avoid issues in the long run.
no code implementations • 15 Mar 2022 • Arumoy Shome, Luis Cruz, Arie van Deursen
The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI.
no code implementations • 4 Feb 2022 • Jiyang Zhang, Chandra Maddila, Ram Bairi, Christian Bird, Ujjwal Raizada, Apoorva Agrawal, Yamini Jhawar, Kim Herzig, Arie van Deursen
Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community.
1 code implementation • 20 Jan 2022 • Bart van Oort, Luís Cruz, Babak Loni, Arie van Deursen
We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint.
2 code implementations • 6 Mar 2021 • Bart van Oort, Luís Cruz, Maurício Aniche, Arie van Deursen
Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation.
no code implementations • 2 Mar 2021 • Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen
In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company.
no code implementations • 16 Jan 2021 • Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, Arie van Deursen
We study half a year of changes made to six large repositories in Microsoft in which at least 1, 000 pull requests are created each month.
no code implementations • 25 Nov 2020 • Chandra Maddila, Sai Surya Upadrasta, Chetan Bansal, Nachiappan Nagappan, Georgios Gousios, Arie van Deursen
The key novelty of Nudge is that it succeeds in reducing pull request resolution time, while ensuring that developers perceive the notifications sent as useful, at the scale of thousands of repositories.
no code implementations • 3 Oct 2020 • Mark Haakman, Luís Cruz, Hennie Huijgens, Arie van Deursen
Thus, the same development processes and standards in software engineering ought to be complied in artificial intelligence systems.
Software Engineering 68T01 I.2.0; D.2.9