The ENHANCE project addressed the allied problems of programmability, cost predictability and dynamic scheduling of parallel applications executing on Grid-like systems. We characterised such systems as those in which there is dynamic variation in the availability and performance of a heterogeneous pool of computing nodes. Our approach extended and integrated ideas and tools from two established research programmes: stochastic process algebras, and in particular PEPA, were used to model the behaviour of concurrent systems in which some aspects of behaviour are not precisely predictable, while a skeleton-based programming framework explicitly captured the parallel algorithmic patterns inherent in our applications. By modelling skeletons with PEPA, then populating the models with information gleaned from the system at run-time, we provided support for scheduling and dynamic rescheduling decisions. Our key observations were as follows. Firstl, the use of patterns can helpfully constrain the implementation challenge, by providing detailed knowledge of the overall evolution of an application's interactions. Secondly, patterns allow both more accurate cost prediction, which in turn informs good scheduling, and lighter weight monitoring, because the pattern-based model tells us what the crucial aspects are for each instantiation. Thirdly, stochastic performance process algebra and associated tools provide a soundly grounded, practical, and effective means of manipulating and evaluating the necessary performance models.
During the project we designed, populated and implemented an experimental programming and run-time framework. In our system, applications are developed in the normal way, with parallelism captured through calls to our Skeleton Library. At run-time, the resulting executable is deployed across all available processing nodes in the Resource Pool. For an application generating p processes, p of the nodes execute both the application code and our lightweight run-time support thread. The remaining nodes simply execute the support thread. During execution, the support threads collaborate to implement a monitor-evaluate-reschedule cycle, which may result in migration of one or more processes to other previously inactive nodes, in response to worsening availability of the previously chosen nodes. At the heart of this cycle sits a process in which models of the application are constructed, candidate schedules generated and evaluated, and selected adjustments to the current schedule notified to the resource pool.
Our initial work focused on building an understanding of how best to structure our PEPA models to fit the project's requirements. We were able to demonstrate that PEPA's compositionality was a key feature in allowing us to separate the model into components representing the application (ie the skeleton), the processors, and the communication architecture. The evaluation time for small instances of these models was acceptable, since the skeletal constraints on the programming model produce a corresponding reduction in the size of the modelled state-space. Subsequently, we have shown that the process of model generation can be fully automated for skeletal programs, including those which involve arbitrary nests of our implemented skeletons (pipeline, deal and farm), and even for programs in which the choice of skeletons within an application is delayed until run-time (in other words, the model itself is constructed dynamically). The scheme is extendable to a larger set of skeletons. To our knowledge this is an unprecedented achievement, which amply demonstrates the power of combining skeletal program structure and PEPA's compositionality.
To complete the model, the generated structure must be parameterised with performance information. We have demonstrated that the challenge of obtaining these rates is orthogonal to the rest of the modelling process by implementing two schemes, one based on information gleaned from the Network Weather Service tool, the other using standard Unix load monitoring utilities. In the latter case, we were able to show that the process of information gathering can be effectively interwoven with the early stages of computation. For a small number of initial scheduling cycles we deliberately move the application around the machine, gather application specific rate information transparently, while making real progress in the overall computation. This obviates the need for user-generated performance annotations or any other static performance prediction mechanism.
At the core of our methodology is the requirement to solve PEPA models which represent candidate mappings of skeleton tasks to available resources. The standard PEPA solution tool is based upon the solution of continuous-time Markov chains (CTMCs). We have demonstrated the efficacy of an alternative approach, which, in contrast, generates and solves sets of ordinary differential equations. The approach is applicable when the number of entities in the system becomes large, the very situation in which the CTMC method becomes too time consuming. Our project is the first to report on this technique.
In order to fully exploit the reschedulings proposed by our model evaluation engine it is essential to have an efficient procedure for skeleton process migration. We have demonstrated that the information implicit in the use of each skeleton can greatly reduce the cost of this task. Essentially, rather than having to migrate a full process in the traditional OS sense,
we simply have to communicate any required user-space state between retiring and activated processors. Everything else is created afresh on the new processor. Our extendable algorithms implement this process for our set of skeletons. This was the first project to propose and demonstrate this application of skeleton-sourced structural information.
The various components, techniques and algorithms described above have been integrated into a proof-of-concept implementation. The application programmer writes a skeleton-based program and initiates a run (with the normal mpirun job starter) across the resource pool. Model generation, population, and evaluation, together with the generation and implementation of the resulting scheduling and rescheduling decisions are all automated.
We have experimented with our system in a dedicated cluster of workstations, into which we have injected artificially generated background load, to simulate the characteristics of a grid. This is the only way to make repeatable experiments (for example, to allow comparison of runs with and without our mechanism).