On this page... (hide)
Grid computing is emerging as a major new capability for modern, high performance technical computing. Such Grids couple geographically distributed resources such as high performance computers, workstations, clusters, and scientific instruments. Traditional methods of producing software for Grids are inefficient and error prone, and will not allow the rapid deployment of new applications.
GriddLeS is a tool that facilitates the construction of complex Grid application using legacy software components. We want to make use of the billions of lines of existing source code in exciting new grid applications. To learn more, read on ....
This project is supported by:
Computational and data Grids couple geographically distributed resources such as high performance computers, workstations, clusters, and scientific instruments. Accordingly, they have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. Unlike traditional high performance computing systems, such Grids provide more than just computing power, because they address issues of wide area networking, wide area scheduling and resource discovery in ways that allow many resources to be assembled on demand to solve large problems. Grid applications have the potential to allow real time processing of data streams from scientific instruments such as particle accelerators and telescopes in ways which are much more flexible and powerful that is currently available. A number of prototype applications have been built and these demonstrate that the Grid computing paradigm holds much promise.
Of particular interest are applications, called Grid Workflows, that consists of a number of components, including: computational models, distributed files, scientific instruments and special hardware platforms (such as visualisation systems). Importantly, such workflows are interconnected in a flexible and dynamic way to give the appearance of a single application that has access to a wide range of data, and running on a very high performance platform. Grid workflows have been specified for a number of different scientific domains including, gravitational wave physics, , astronomy and .
Much of the effort in Grid computing is being directed towards the construction of new applications, in many cases written from scratch. We are interested in building new applications, but from legacy components. In particular, we want to leverage the billions of lines of code embodied in existing scientific and engineering codes, by stitching them together into new Grid aware applications.
Over the past 5 years we have constructed a software tool called Nimrod/G, which allows a user to migrate a particular class of applications to the Grid. Specifically, it automates the execution of parameter sweep applications (parameter studies) over global computational grids. Nimrod is particularly novel because it supports user-defined deadline and budget constraints for scheduling computations and manages the supply and demand of resources in the Grid using an experimental computational economy. Thus, using Nimrod/G, we have demonstrated that it is possible to build specific Grid application very easily and quickly for a niche class of problems, namely parameter sweeps. However, Nimrod/G cannot be used to build general grid workflows.
The GriddLeS environment provides a more general environment than Nimrod, one that facilitates the composition of arbitrary grid applications from legacy software. The underlying belief is that is possible to take existing programs and grid enable them by providing a high level tool that facilitates the composition of complex systems from smaller, working components. A user of this environment interacts with a visual, graphical manipulation language to describe the interaction between programs, data sources, and IO devices such as shared scientific instruments.
One of the more important aspects of GriddLeS is the mechanism it uses to support communication between components. GriddLeS supports the construction of complete applications without source modification to the existing components. To achieve this we have overloaded the normal IO primitives in conventional languages so they support interprocess communication as well as file operations. This allows the individual components to behave as though they are operating in a conventional file system, whilst in fact they are sending and receiving data across a distributed grid infrastructure. The mechanism, called GridFiles, is very flexible and can be implemented by a range of different techniques, from file copy to IP sockets.
GridFiles makes use of a “File Multiplexer” as shown here. This routine replaces the normal file IO library for a particular language, and allows the system to redirect file IO requests dynamically to local files, remote files or remote processes. In the latter case, a file multiplexer on the writer machine is linked with a symmetric file multiplexer on the reader machine. The device handles the synchronisation of readers and writers, and thus supports quite complex interprocess communication patterns. It is also possible to cache the data being transmitted between components
Normal file IO primitives are intercepted by the File Multiplexer, and these are processed either by the Local File Client, the Remote File Client or the Grid Buffer Client depending on whether the file reference is for a local file, a remote file or an inter-process socket (accordingly).
The GNS Client is responsible for resolving the local file names specified in the OPEN calls, and for mapping these to either local files, remote files, remote replicated files or remote processes. The File Multiplexer treats the GNS as a read only database, and matches up multiple OPEN calls. The GNS is loaded by a separate process responsible for configuring a grid application. Each entry in the GNS indicates what should happen when a particular file is opened on a particular resource. For example, if the file is to remain local to the resource, then the GNS simply stores the local file name. However, if the file is to be read from a remote resource, the full pathname of the remote file is stored in the GNS entry. If a Grid Buffer is required, then the local file name is mapped onto a Grid Buffer identifier.
The Local File Client simply passes the calls onto the local file system, using the file name as resolved by the GNS. The Remote File Client connects to a Grid FTP server on the remote machine, and passes back blocks of the file as required. Note that the GridFTP server is a standard part of the Globus distribution, not a special component of GriddLeS. The Grid Buffer Client is responsible for implementing inter-process communication. It connects to a corresponding Grid Buffer Server on the other host, and sends blocks of data for each local WRITE call. At the other end of the socket, the Grid Buffer Client reads blocks by making calls to the local Grid Buffer Server. A cache file can be stored at either the sending end of a Grid Buffer connection or the receiving end.
The GRS Client is used to implement a GriddLeS data Replication Service. When an application opens a replicated file, the GRS makes an actual binding to one of those replicas. It determines the most appropriate one by measuring the available bandwidth to each replica (using tools such as the Network Weather Service (), and it dynamically switches source during program execution should the bandwidth change. The GRS has been designed to support a variety of replication services, but the current implementation uses the Storage Resource Broker ( ) from SDSC.
GriddLeS can be combined with Grid workflow packages such, a public domain grid workflow system. Using Kepler, it is possible to specify advanced Grid workflows, and GriddLeS provides the IO mechanism that allows the components to communicate in flexible ways.
GridRod is another related development.Grid Workflows are emerging as practical programming models for solving large e-scientific problems on the Grid. However, it is typically assumed that the workflow components either read or write data to conventional files, which are copied from one execution stage to another, or they are tightly coupled using IPC libraries such as MPI or distributed streaming. More flexible communication can be achieved by overloading conventional READ and WRITE operations with advanced IO mechanisms such as sockets, streams and pipes, as is done in the GriddLeS environment. Such flexibility allows the pipelining of temporally dependent components, or in contrast, delaying of tightly coupled computations based on the current resource availability and network connectivity. However, it is also harder to schedule the workflow, because the communication mode may not be decided until run time. A new scheduling model has been proposed that leverages such communication flexibility and allows us to generate dynamic runtime schedules. The scheduler in this case, not only allocates components to distributed Grid resources, but also specifies the inter-component communication mechanism (socket, pipe etc.) The current model is implemented as a dynamic workflow scheduling tool called GridRod, which harnesses Nimrod/G's Grid services and GriddLeS web services.
GriddLeS is currently being applied to a range of applications, including Atmospheric Sciences and Computational Mechanics.
Figure 1 shows a Grid workflow application that performs atmospheric modelling. This sample application takes temperature and pressure data from a variety of instruments, such as satellites and airborne and seaborne sensors, and feeds these to a range of different numerical models. In particular, data is assimilated into a general circulation model of the atmosphere (1), which computes the flow fields across the entire globe. This global model in turn drives the boundaries of a regional weather model (2) which produces more accurate wind vectors and temperature and pressure fields over a limited area. These values are in turn streamed into a variety of pollution models, such as a photo-chemical pollution model (3), a particle dispersion model (4) and a bush fire model (5). Each application addresses some particular aspect of the atmosphere in isolation, but when linked together they interact and provide a rich set of data ranging from weather to pollution. For example, a bush fire generates particles that must be dispersed, and also increases various precursors that affect photo-chemical pollution. If the fire is severe enough, it actually affects the regional weather. Accordingly, the different models need to interchange data at various times.
In the Grid, static data sources, such as pollution inventories and vegetation maps, required by the various computational models might be distributed geographically, but copies may be available at more than one site. This means that when models are scheduled to the various machines in the Grid, the location of the closest data also needs to be taken into account.
Figure 1 - Atmosheric Sciences grid Workflow
This application considers computer models of thin plates containing holes and subject to cyclical loading. The models assume pre-existing cracks normal to the hole profile and use the Jones method of crack dynamics to estimate the number of cycles required for these cracks to spread from an initial length to some final length. Our aim is to determine the hole shapes that will maximize the life of the worst (least cycles) crack. Previous work has shown that optimizing for life in this way may give different results from optimizing for stress on the hole boundary. The picture on the right shows the stress distribution in the plate for a particular hole shape.
In order to complete the computations, we need to execute a pipeline of 5 programs, as shown on the right. CHAMMY takes a formula for a hole shape, depending on several parameters and generates points on the boundary of that hole. The programs MAKES_SF_FILES and OBJECTIVE are used to transform data from one phase to the other. PAFEC is a finite element code that computes the stress tensors in the meshed design. FAST is a crack propagation code that computes the number of cycles before a number of independently placed cracks reach a certain length.
Traditionally, the entire pipeline has been executed on the one computer, with intermediate results passed using files. Importantly, some files are passed from one phase to another, whereas, other files are simply read from the file system. The final output, RESULT.DAT, contains the value for the life of the design, which is the minimum time for any of the cracks to reach a certain length. This result defines the life of the design. The GriddLeS File Multiplexer and GNS are flexible enough to map some files to the local file system, whilst linking writer-reader file chains into direct socket connections.
|Interprocess Communication||Jagan Kommineni and Philip Chan|
|Computational scheduling and QoS||Shahaan Ayyub and Tim Ho|
|Data replication and QoS||Tim Ho and Shahaan Ayyub|
|Application Deployment and Grid Services||Wojtek Goscinski|
|Applications development||Jagan Kommineni, Tom Peachey and Tim Ho|
|W/F specification and execution, parametric W/F Instrumentation integration|
|Chan, P. and Abramson, D. "A Programming Framework for Incremental Data Distribution in Iterative Applications." In: Proc. of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA-2008). 10-12 December 2008. Sydney, Australia. pp 244 – 251, IEEE Press ISBN 978-0-7695-3471-8.||Abstract||chan-abramson-Incremental.pdf|
|Chan, P. and Abramson, D., “Persistence and Communication State Transfer in an Asynchronous Pipe Mechanism”, The 13th International Conference on Parallel and Distributed Systems (ICPADS 07), Hsinchu, Taiwan, December 5-7, 2007.||Abstract||icpads2007.pdf|
|Kommineni, J., Abramson, D. and Tan, J. “Communication over a Secured Heterogeneous Grid with the GriddLeS runtime environment”, 2nd IEEE International Conference on e-Science and Grid Computing. Dec. 4- 6, 2006, Amsterdam, Netherlands.||Abstract||eScience2006Jagan.pdf|
|Goscinski, W and Abramson, D. “Legacy Application Deployment over Heterogeneous Grids using Distributed Ant”, IEEE Conference on e-Science and Grid Computing, Melbourne, Dec 2005||Abstract||DistAnt2.pdf|
|Kommineni, J and Abramson, D. “Building Virtual Applications for the GRID with Legacy Components”, in “Advances in Grid Computing - EGC 2005, European Grid Conference”, Springer Lecture Notes in Computer Science (LNCS 3470), Amsterdam, The Netherlands, February 14-16, 2005. pp 961 – 971. Edited by P.M.A. Sloot, A.G. Hoekstra, T. Priol, A. Reinefeld, M. Bubak.||Abstract|
|Ho, T. and Abramson, D. “The GriddLeS Data Replication Service”, IEEE Conference on e-Science and Grid Computing, Melbourne, Dec 2005.||Abstract||GriddLeSReplication.pdf|
|Abramson, D. and Kommineni, J., “A Flexible IO Scheme for Grid Workflows”. IPDPS-04, Santa Fe, New Mexico, April 2004||Abstract||FM.pdf|
|Chan, P and Abramson, D. " Persistence and Communication State Transfer in an Asynchronous Pipe Mechanism”, International Journal of Grid and High Performance Computing, Vol. 1, Issue 3, 2009, Pages: 18-36.||Abstract||IJGHPC2009_F.pdf|
- Abramson, D. and Komineni, J., “Interprocess Communication in GriddLeS: Grid Enabling Legacy Software”. Technical report, School of Computer Science and Software Engineering, Monash University "First GriddLes Paper 2003"
- Abramson, D. and Kommineni, J., “A Flexible IO Scheme for Grid Workflows”. IPDPS-04, Santa Fe, New Mexico, April 2004.
- Abramson, D., Kommineni, J., McGregor, J. and Katzfey, J. “An Atmospheric Sciences Workflow and its Implementation with Web Services”, Future Generation Computer Systems, 21 (2005), pp 69 – 78. Also appeared in The International Conference on Computational Sciences, ICCS04, Krakow Poland, June 6 – 9, 2004
- Kommineni, J and Abramson, D. “Building Virtual Applications for the GRID with Legacy Components”, in “Advances in Grid Computing - EGC 2005, European Grid Conference”, Springer Lecture Notes in Computer Science (LNCS 3470), Amsterdam, The Netherlands, February 14-16, 2005. pp 961 – 971. Edited by P.M.A. Sloot, A.G. Hoekstra, T. Priol, A. Reinefeld, M. Bubak
- Abramson, D., Kommineni, J. and Altinas, I. “Flexible IO services in the Kepler Grid Workflow Tool”, 1st IEEE Conference on e-Science and Grid Computing, Melbourne, Dec 2005.
- Ho, T. and Abramson, D. “The GriddLeS Data Replication Service”, 1st IEEE Conference on e-Science and Grid Computing, Melbourne, Dec 2005.
- Ho, T. and Abramson, D. “A Unified Data Grid Replication Framework”, 2nd IEEE International Conference on e-Science and Grid Computing. Dec. 4- 6, 2006, Amsterdam, Netherlands.
- Kommineni, J., Abramson, D. and Tan, J. “Communication over a Secured Heterogeneous Grid with the GriddLeS runtime environment”, 2nd IEEE International Conference on e-Science and Grid Computing. Dec. 4- 6, 2006, Amsterdam, Netherlands.
- Ho, T. and Abramson, D., “Active Data: Supporting the Grid Data Life Cycle”, CCGrid 2007, Brazil.