An optimization system for Maui Cluster Scheduler.

Abstract:

In our Physics Department we have a Linux cluster with about 200 Cpus running Torque/Maui as Resource Manager/Scheduler. Our farm has many singularities:

In this scenario the scheduling is a key point of the whole infrastructure, there are many needs to be satisfied and often requirements are not compatible one another. The number of variables that influence the scheduling behaviour is so high that the optimization is an hard task.
This project consists in finding a way to optimize our farm in particular and about the scheduling optimization in general. To achieve these results our work is divided in some areas:

People:

Coordinator:

Developers:

Status:

Project status:

Create an infrastructure for the storage and retrival of farm workloads Complete
Realization of a virtual farm (XEN) to simulate different farm structures. Complete
Configuring a virtual machine with a simulator (the MAUI simulator) to simulate both these "virtual farms" and our real farm. Complete
Build a set of metrics to evaluate how much our configurations works well (both real and simulated). Complete
Validate the simulator (using montecarlo techniques). Partial
Build an infrastructure for large scale simulation of many changing configurations. Complete
Build workloads analysis tools. Complete
Build tools to evolve scheduler configurations via GA (Genetic Algorithm). Complete
Documentation:
Papers:
Poster:
Talks:
Download:

Not yet

Links: