Search Results for author: Ozalp Babaoglu

Found 4 papers, 1 papers with code

Online Fault Classification in HPC Systems through Machine Learning

no code implementations26 Oct 2018 Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates.

Distributed, Parallel, and Cluster Computing

FINJ: A Fault Injection Tool for HPC Systems

1 code implementation26 Jul 2018 Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments.

Distributed, Parallel, and Cluster Computing

Towards Data-Driven Autonomics in Data Centers

no code implementations19 May 2015 Alina Sîrbu, Ozalp Babaoglu

Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions.

Management

Cannot find the paper you are looking for? You can Submit a new open access paper.