In our Physics Department we have a Linux cluster with about 200 Cpus running Torque/Maui as Resource Manager/Scheduler. Our farm has many singularities:
In this scenario the scheduling is a key point of the whole infrastructure, there are many needs to be satisfied and often requirements are not compatible one another. The number of variables that influence the scheduling behaviour is so high that the optimization is an hard task.
This project consists in finding a way to optimize our farm in particular and about the scheduling optimization in general. To achieve these results our work is divided in some areas:
Coordinator:
Developers:
Project status:
| Create an infrastructure for the storage and retrival of farm workloads | Complete |
| Realization of a virtual farm (XEN) to simulate different farm structures. | Complete |
| Configuring a virtual machine with a simulator (the MAUI simulator) to simulate both these "virtual farms" and our real farm. | Complete |
| Build a set of metrics to evaluate how much our configurations works well (both real and simulated). | Complete |
| Validate the simulator (using montecarlo techniques). | Partial |
| Build an infrastructure for large scale simulation of many changing configurations. | Complete |
| Build workloads analysis tools. | Complete |
| Build tools to evolve scheduler configurations via GA (Genetic Algorithm). | Complete |
Not yet