E-Science And Grids

e-Science and Grids (cont.,)

What is a Grid?

The Grid has been seen to represent:

  • Infrastructure (“middleware” & “services”) for establishing, managing, and evolving multi-organizational federations
  • A dynamic, autonomous, domain independent facility
  • On-demand, ubiquitous access to computing, data, and services
  • Mechanisms for creating and managing workflow within such federations
  • New capabilities constructed dynamically and transparently from distributed services
  • Service-oriented, virtualization

Nature of eScience

This new discipline of grid powered e-Science allows scientists to interact efficiently and effectively with each other, their instruments and their data, even across geographic separations, thereby ameliorating the tyranny of distance that often hinders research. Data can be captured, shared,interpreted and manipulated more efficiently and more reliably and on a far greater scale than previously possible. Data can be presented for interpretation in new ways using scientific visualization techniques and advanced data mining algorithms. These new technologies enable new insights to be derived and exploited. The data may also drive simulation models that support prediction and “what-if” analyses. The models and their results may be archived for later use and analysis, and shared securely and reliably with scientific collaborators across the globe. The resulting network of people and devices is empowered to interact more productively and to undertake experiments and analyses that are otherwise impossible.

Software engineering for the Grid

In spite of tremendous advances in middleware and internet software standards, creating Grid applications that harness geographically disparate resources is still difficult and error-prone. Programmers are presented with a range of middleware services, a raft of legacy software tools that do not address the distributed nature of the Grid, and many other incompatible development tools that often deal with only part of the Grid programming problem. So, a scientist might start with an idea for an innovative experiment but quickly become distracted by technical details that have little to do with the task at hand. Moreover, the highly distributed, heterogeneous and unreliable nature of the Grid makes software development extremely difficult. If we are to capitalize on the enormous potential offered by Grid computing, we must find more efficient and effective ways of developing Grid based applications.

Software development life cycle

A critical ingredient for success in e-Science is appropriate Grid-enabled software which, to date, has lagged behind the high-performance computers, data servers, instruments and networking infrastructure. All software follows a lifecycle, from development through execution, and back again, (See Diagram). Grid software is no exception, although there are sufficient differences in the details of the various phases in the lifecycle to make traditional tools and techniques inappropriate. For example, traditional software development tools rarely support the creation of virtual applications in which the components are distributed across multiple machines. In the Grid, these types of virtual applications are the norm. Likewise, traditional methods of debugging software do not scale to the size and heterogeneity of infrastructure found the Grid. Here, we identify four distinct phases of importance, development, deployment, testing and debugging and execution.

Grid Middleware

Nimrod, the hunter (of compute resources)

e-Science applications use services that are exposed by both the platform infrastructure and middleware such as Globus and Unicore. In our experience, whilst powerful, these services are typically too low level for many e-Science applications. As a result, there is a significant ‘semantic gap’ between them, because the application needs are not matched by the underlying middleware services. Moreover, they do not support the software lifecycle thereby making software development difficult and error-prone. To solve these problems, we propose a new hierarchy as shown on our Home Page . The existing middleware is renamed lower-middleware, and an upper middleware layer is inserted. This upper middleware layer is designed to narrow the semantic gap between existing middleware and applications. Importantly, it hosts a range of interoperating tools that will form the e-Scientists workbench, thus supporting the major phases of the software development lifecycle as well as the applications themselves. See Projects for links to our Middleware projects