Search Results for author: Foutse khomh

Found 65 papers, 41 papers with code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation18 Apr 2024 Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Machine Learning Robustness: A Primer

no code implementations1 Apr 2024 Houssem Ben Braiek, Foutse khomh

This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness in Artificial Intelligence (AI) systems.

Transfer Learning

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code

1 code implementation14 Feb 2024 Vahid Majdinasab, Amin Nikanjam, Foutse khomh

Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models.

Clone Detection

ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions

1 code implementation13 Feb 2024 Leuson Da Silva, Jordan Samhi, Foutse khomh

Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers' queries on programming and software development.

Language Modelling Large Language Model

Deep Learning Model Reuse in the HuggingFace Community: Challenges, Benefit and Trends

1 code implementation24 Jan 2024 Mina Taraghi, Gianolli Dorcelus, Armstrong Foundjem, Florian Tambon, Foutse khomh

Based on our qualitative analysis, we present a taxonomy of the challenges and benefits associated with PTM reuse within this community.

Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection

1 code implementation22 Dec 2023 Xingfang Wu, Heng Li, Nobukazu Yoshioka, Hironori Washizaki, Foutse khomh

When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23. 1%, 43. 9%, and 68. 9%, respectively.

Characterizing and Classifying Developer Forum Posts with their Intentions

1 code implementation21 Dec 2023 Xingfang Wu, Eric Laufer, Heng Li, Foutse khomh, Santhosh Srinivasan, Jayden Luo

The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy.

TAG

Studying the Practices of Testing Machine Learning Software in the Wild

1 code implementation19 Dec 2023 Moses Openja, Foutse khomh, Armstrong Foundjem, Zhen Ming, Jiang, Mouna Abidi, Ahmed E. Hassan

Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing practices in the wild, to identify the ML properties being tested, the followed testing strategies, and their implementation throughout the ML workflow.

Autonomous Driving Fairness

GIST: Generated Inputs Sets Transferability in Deep Learning

1 code implementation1 Nov 2023 Florian Tambon, Foutse khomh, Giuliano Antoniol

As the demand for verifiability and testability of neural networks continues to rise, an increasing number of methods for generating test sets are being developed.

Detection and Evaluation of bias-inducing Features in Machine learning

no code implementations19 Oct 2023 Moses Openja, Gabriel Laberge, Foutse khomh

In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts.

An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems

1 code implementation23 Aug 2023 Ahmed Haj Yahmed, Rached Bouchoucha, Houssem Ben Braiek, Foutse khomh

Dr. DRL successfully helps agents to adapt to 19. 63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.

Continual Learning reinforcement-learning

Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges

1 code implementation23 Aug 2023 Ahmed Haj Yahmed, Altaf Allah Abbassi, Amin Nikanjam, Heng Li, Foutse khomh

In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems.

reinforcement-learning

On the Effectiveness of Log Representation for Log-based Anomaly Detection

1 code implementation17 Aug 2023 Xingfang Wu, Heng Li, Foutse khomh

We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.

Anomaly Detection Log Parsing

Bug Characterization in Machine Learning-based Systems

1 code implementation26 Jul 2023 Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse khomh, Zhen Ming, Jiang

Based on our results, fixing ML bugs are more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively.

Bug fixing

An Empirical Study on Bugs Inside PyTorch: A Replication Study

no code implementations25 Jul 2023 Sharon Chee Yin Ho, Vahid Majdinasab, Mohayeminul Islam, Diego Elias Costa, Emad Shihab, Foutse khomh, Sarah Nadi, Muhammad Raza

Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour.

Responsible Design Patterns for Machine Learning Pipelines

1 code implementation31 May 2023 Saud Hakem Al Harbi, Lionel Nganyewou Tidjon, Foutse khomh

In this paper, we propose a comprehensive framework incorporating RDPs into ML pipelines to mitigate risks and ensure the ethical development of AI systems.

Ethics Management

Leveraging Data Mining Algorithms to Recommend Source Code Changes

no code implementations29 Apr 2023 AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal, Latifa Guerrouj, Foutse khomh

Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms.

On Codex Prompt Engineering for OCL Generation: An Empirical Study

no code implementations28 Mar 2023 Seif Abukhalaf, Mohammad Hamdaqa, Foutse khomh

We investigate the reliability of OCL constraints generated by Codex from natural language specifications.

Few-Shot Learning Prompt Engineering +5

Mutation Testing of Deep Reinforcement Learning Based on Real Faults

1 code implementation13 Jan 2023 Florian Tambon, Vahid Majdinasab, Amin Nikanjam, Foutse khomh, Giuliano Antonio

This allows us to compare different mutation killing definitions based on existing approaches, as well as to analyze the behavior of the obtained mutation operators and their potential combinations called Higher Order Mutation(s) (HOM).

reinforcement-learning Reinforcement Learning (RL)

AmbieGen: A Search-based Framework for Autonomous Systems Testing

1 code implementation1 Jan 2023 Dmytro Humeniuk, Foutse khomh, Giuliano Antoniol

To address this challenge, we introduce AmbieGen, a search-based test case generation framework for autonomous systems.

Self-Driving Cars

Can Ensembling Pre-processing Algorithms Lead to Better Machine Learning Fairness?

no code implementations5 Dec 2022 Khaled Badran, Pierre-Olivier Côté, Amanda Kolopanis, Rached Bouchoucha, Antonio Collante, Diego Elias Costa, Emad Shihab, Foutse khomh

As machine learning (ML) systems get adopted in more critical areas, it has become increasingly crucial to address the bias that could occur in these systems.

Fairness

An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks

no code implementations28 Nov 2022 Mohamed Raed El aoun, Lionel Nganyewou Tidjon, Ben Rombaut, Foutse khomh, Ahmed E. Hassan

In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases.

Reliable Malware Analysis and Detection using Topology Data Analysis

1 code implementation3 Nov 2022 Lionel Nganyewou Tidjon, Foutse khomh

Next, we compare the different TDA techniques (i. e., persistence homology, tomato, TDA Mapper) and existing techniques (i. e., PCA, UMAP, t-SNE) using different classifiers including random forest, decision tree, xgboost, and lightgbm.

Intrusion Detection Malware Analysis +1

SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design

no code implementations7 Sep 2022 Houssem Ben Braiek, Ali Tfaily, Foutse khomh, Thomas Reid, Ciro Guida

Hybrid surrogate optimization maintains high results quality while providing rapid design assessments when both the surrogate model and the switch mechanism for eventually transitioning to the HF model are calibrated properly.

Out-of-Distribution Detection

Physics-Guided Adversarial Machine Learning for Aircraft Systems Simulation

no code implementations7 Sep 2022 Houssem Ben Braiek, Thomas Reid, Foutse khomh

In the context of aircraft system performance assessment, deep learning technologies allow to quickly infer models from experimental measurements, with less detailed system knowledge than usually required by physics-based modeling.

An Empirical Study on the Usage of Automated Machine Learning Tools

1 code implementation28 Aug 2022 Forough Majidi, Moses Openja, Foutse khomh, Heng Li

Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on.

Feature Engineering Hyperparameter Optimization +1

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

1 code implementation25 Aug 2022 Paulina Stevia Nouwou Mindom, Amin Nikanjam, Foutse khomh

In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing.

reinforcement-learning Reinforcement Learning (RL)

Quality issues in Machine Learning Software Systems

1 code implementation18 Aug 2022 Pierre-Olivier Côté, Amin Nikanjam, Rached Bouchoucha, Foutse khomh

This empirical study aims to identify a catalog of bad-practices related to poor quality in MLSSs.

A Probabilistic Framework for Mutation Testing in Deep Neural Networks

1 code implementation11 Aug 2022 Florian Tambon, Foutse khomh, Giuliano Antoniol

Methods: In this work, we propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem and allows for a more consistent decision on whether a mutant is killed or not.

DiverGet: A Search-Based Software Testing Approach for Deep Neural Network Quantization Assessment

no code implementations13 Jul 2022 Ahmed Haj Yahmed, Houssem Ben Braiek, Foutse khomh, Sonia Bouzidi, Rania Zaatour

Quantization is one of the most applied Deep Neural Network (DNN) compression strategies, when deploying a trained DNN model on an embedded system or a cell phone.

Astronomy Quantization

Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

1 code implementation11 Jul 2022 Arghavan Moradi Dakhel, Michel C. Desmarais, Foutse khomh

Moreover, our findings suggest that ``issue resolving history'' of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.

Threat Assessment in Machine Learning based Systems

1 code implementation30 Jun 2022 Lionel Nganyewou Tidjon, Foutse khomh

Attacks from the AI Incident Database and the literature are used to identify vulnerabilities and new types of threats that were not documented in ATLAS.

BIG-bench Machine Learning

GitHub Copilot AI pair programmer: Asset or Liability?

1 code implementation30 Jun 2022 Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse khomh, Michel C. Desmarais, Zhen Ming, Jiang

In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks.

Program Synthesis

An Empirical Study of Challenges in Converting Deep Learning Models

1 code implementation28 Jun 2022 Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse khomh, Zhen Ming, Jiang

Usually DL models are developed and trained using DL frameworks that have their own internal mechanisms/formats to represent and train DL models, and usually those formats cannot be recognized by other frameworks.

Bugs in Machine Learning-based Systems: A Faultload Benchmark

no code implementations24 Jun 2022 Mohammad Mehdi Morovati, Amin Nikanjam, Foutse khomh, Zhen Ming, Jiang

Although most of these tools use bugs' lifecycle, there is no standard benchmark of bugs to assess their performance, compare them and discuss their advantages and weaknesses.

BIG-bench Machine Learning Fairness

Never trust, always verify : a roadmap for Trustworthy AI?

1 code implementation23 Jun 2022 Lionel Nganyewou Tidjon, Foutse khomh

In this paper, we examine trust in the context of AI-based systems to understand what it means for an AI system to be trustworthy and identify actions that need to be undertaken to ensure that AI systems are trustworthy.

Autonomous Vehicles Selection bias

Studying the Practices of Deploying Machine Learning Projects on Docker

no code implementations1 Jun 2022 Moses Openja, Forough Majidi, Foutse khomh, Bhagya Chembakottu, Heng Li

Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects.

BIG-bench Machine Learning Management

Fool SHAP with Stealthily Biased Sampling

1 code implementation30 May 2022 Gabriel Laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand., Foutse khomh

SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a specific input versus a background distribution.

Fairness

The Different Faces of AI Ethics Across the World: A Principle-Implementation Gap Analysis

no code implementations12 May 2022 Lionel Nganyewou Tidjon, Foutse khomh

Next, we analyze the current level of AI readiness and current implementations of ethical AI principles in different countries, to identify gaps in the implementation of AI principles and their causes.

Ethics

A Search-Based Framework for Automatic Generation of Testing Environments for Cyber-Physical Systems

1 code implementation23 Mar 2022 Dmytro Humeniuk, Foutse khomh, Giuliano Antoniol

We compared three configurations of AmbieGen: based on a single objective genetic algorithm, multi objective, and random search.

Machine Learning Application Development: Practitioners' Insights

no code implementations31 Dec 2021 Md Saidur Rahman, Foutse khomh, Alaleh Hamidi, Jinghui Cheng, Giuliano Antoniol, Hironori Washizaki

In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development.

BIG-bench Machine Learning

Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

1 code implementation26 Dec 2021 Florian Tambon, Amin Nikanjam, Le An, Foutse khomh, Giuliano Antoniol

This paper presents the first empirical study of Keras and TensorFlow silent bugs, and their impact on users' programs.

On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods

no code implementations8 Nov 2021 Paulina Stevia Nouwou Mindom, Amin Nikanjam, Foutse khomh, John Mullins

The increasing adoption of Reinforcement Learning in safety-critical systems domains such as autonomous vehicles, health, and aviation raises the need for ensuring their safety.

Autonomous Vehicles Q-Learning +2

An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets

1 code implementation4 Nov 2021 Gias Uddin, Yann-Gael Gueheneuc, Foutse khomh, Chanchal K Roy

We report the results of an empirical study that we conducted to determine the feasibility of developing an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors.

Sentiment Analysis

Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set

1 code implementation26 Oct 2021 Gabriel Laberge, Yann Pequignot, Alexandre Mathieu, Foutse khomh, Mario Marchand

In this work, instead of aiming at reducing the under-specification of model explanations, we fully embrace it and extract logical statements about feature attributions that are consistent across all models with good empirical performance (i. e. all models in the Rashomon Set).

Additive models Feature Importance +1

The challenge of reproducible ML: an empirical study on the impact of bugs

1 code implementation9 Sep 2021 Emilio Rivera-Landos, Foutse khomh, Amin Nikanjam

This study attempts to quantify the impact that the occurrence of bugs in a popular ML framework, PyTorch, has on the performance of trained models.

The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study

1 code implementation11 Aug 2021 Mohammad Masudur Rahman, Foutse khomh, Shamima Yeasmin, Chanchal K. Roy

We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports although they contain such queries.

Models of Computational Profiles to Study the Likelihood of DNN Metamorphic Test Cases

no code implementations28 Jul 2021 Ettore Merlo, Mira Marhaba, Foutse khomh, Houssem Ben Braiek, Giuliano Antoniol

We investigate the distribution of computational profile likelihood of metamorphic test cases with respect to the likelihood distributions of training, test and error control cases.

HOMRS: High Order Metamorphic Relations Selector for Deep Neural Networks

1 code implementation10 Jul 2021 Florian Tambon, Giulio Antoniol, Foutse khomh

Deep Neural Networks (DNN) applications are increasingly becoming a part of our everyday life, from medical applications to autonomous cars.

Uncertainty Quantification valid +1

Data Driven Testing of Cyber Physical Systems

no code implementations23 Feb 2021 Dmytro Humeniuk, Giuliano Antoniol, Foutse khomh

The most common approach for pre-deployment testing is to model the system and run simulations with models or software in the loop.

Mining API Usage Scenarios from Stack Overflow

no code implementations17 Feb 2021 Gias Uddin, Foutse khomh, Chanchal K Roy

Each task consists of a code example, the task description, and the reactions of developers towards the code example.

Software Engineering

Faults in Deep Reinforcement Learning Programs: A Taxonomy and A Detection Approach

1 code implementation1 Jan 2021 Amin Nikanjam, Mohammad Mehdi Morovati, Foutse khomh, Houssem Ben Braiek

To allow for the automatic detection of faults in DRL programs, we have defined a meta-model of DRL programs and developed DRLinter, a model-based fault detection approach that leverages static analysis and graph transformations.

Fault Detection OpenAI Gym +2

SIGMA : Strengthening IDS with GAN and Metaheuristics Attacks

no code implementations18 Dec 2019 Simon Msika, Alejandro Quintero, Foutse khomh

More specifically, we propose a new method named SIGMA, that leverages adversarial examples to strengthen IDS against new types of attacks.

BIG-bench Machine Learning Intrusion Detection

DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

no code implementations5 Sep 2019 Houssem Ben Braiek, Foutse khomh

To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases.

Autonomous Vehicles Quantization

TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

no code implementations5 Sep 2019 Houssem Ben Braiek, Foutse khomh

In this paper, we examine training issues in ML programs and propose a catalog of verification routines that can be used to detect the identified issues, automatically.

A Machine-learning Based Ensemble Method For Anti-patterns Detection

1 code implementation29 Jan 2019 Antoine Barbez, Foutse khomh, Yann-Gaël Guéhéneuc

In this paper, we present SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based ensemble method to aggregate various anti-patterns detection approaches on the basis of their internal detection rules.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.