TWiki> Main Web>TWikiUsers>JakubMoscicki>PhD (revision 8)EditAttachPDF

Introduction

Grids enable scientific computing at unprecedented scale in terms of computing, storage capacity, resource integration and scientific collaboration.

Last years have seen countless of Grid projects and initiatives, ranging from small experimental testbeds to large production infrastructures. [any good reference?]

High Energy Physics (HEP) is one of the applications which have been driving the development of one of the largest-scale Grid infrastructures - WLCG (Worldwide LHC Computing Grid) - for the needs of the LHC (Large Hadron Collider) experiments at CERN.

Other communities, such as bio-informatics, medical physics, earth observation, telecommunications have joined the WLCG and created the EGEE (Enabling Grids for E-sciencE) Grid - the world's largest production Grid infrastructure to date.

The EGEE Grid must reach its full operational capacity by the startup of the LHC experiments at CERN (end of 2007). In contrast to many, small-scale experimental Grid installations which are research playgrounds for future usage, the EGEE Grid applications must be deployed in a robust, reliable and performant production Grid of an unprecedented scale - now! In order to understand the challenge of enabling the applications in a large production Grid one must understand the idiosyncrasies of the development of a large Grid infrastructure itself.

The development of large production Grids (EGEE example)

The development of a large, production Grid in the scientific community is driven by a long-term negotiation process in a complex organizational matrix of resource providers (computing centers), research institutes and users (experiments).

Hardware resources (servers, storage devices, scientific instruments, networks) are in different administrative domains and controlled by different local policies. Virtual Organization (VO) brings a subset of resources into a single virtual administration domain and allows to set global policies which are mapped onto the local ones. However the ultimate control of the resources is retained by the resource providers which require the accountability and traceability of the user activities.

The deployment and maintenance of the middleware is at the heart of the operations in a large production Grid. The deployment process is not centralized because resource providers have local schedules and obligations towards their local non-Grid communities. This affects the maintenance of the Grid protocols and services and the introduction of the new ones. The operational changes, such as allowing certain local resources under a sole VO control (for example VOBox), must typically go through a negotiation process.

The introduction of new middleware components or operational changes are propagated very slowly in a large Grid and it may take several months until the entire system is updated. In addition the baseline middleware evolves in parallel to the deployment processes and application specific functionalities from different VOs may be conflicting. Therefore typically the application-specific functionalities are not introduced at the generic middleware level, but at a higher level, in the application-specific middleware.

The advantage of application-specific middleware is developed and deployed within the VO boundaries, thus at a smaller scale, but at a shorter timescale "in-sync" with the VO community needs.

The most prominent examples are the production systems of the LHC experiments such as DIRAC in LHCb, AliEn in Alice and PANDA in Atlas. These systems form permanent overlays above the regular infrastructure and rely on glide-in techniques. They implement many features such as centralized task queue and improved reliability which are necessary for large data productions. However the multi-user pilot jobs make the user accounting impossible which is a serious drawback for the Grid resource providers. Additionally such an approach can only be afforded by relatively large VOs because they have enough resources need to support central services, maintain VOboxes etc. Certainly this is not an option for smaller communities. Additionally, the data production is a centrally managed activity much different from data analysis. The efforts to enable analysis on the production systems is currently ongoing.

Enabling and support of the applications in a large production Grid

Significant efforts go to enabling and supporting of the applications. This includes the testing of the baseline infrastructure, the development and testing of the application-specific middleware, development of user interfaces [Ganga] and gathering monitoring information [Dashboard] for the operational fine-tuning and infrastructure debugging. Nevertheless user communities face several problems.

Improvement of the Quality of Service

Large variations in the performance and reliability of the EGEE Grid are confirmed by independent observations [ARDA] and everyday user experience. This may be accounted for the fact that a large distributed system such as Grid is inherently dynamic: resources are constantly reconfigured, added and removed; the total number of failures is correlated with the size of the system; the user activities are not coordinated and the load may change rapidly. Still, there is a common feeling in the user community that currently the Grid infrastructures do not provide a required level Quality of Service (QoS).

Similarly to the classical batch systems, the Grid is designed for optimizing throughput of long, non-interactive jobs. One of the reasons for this is that the underlying Computer Elements are batch farms. The overhead of hierarchical Grid scheduling (in the WMS->CE->BS chain) is large (and has been measured) [C.Germain-Renaud et al.: Scheduling for Responsive Grids]. However a large number of applications, from physics data analysis to medical image processing, require responsiveness and interactivity. more explanation here

Predictability is another important aspect for grid users to reliably estimate and the time in which the various stages of processing are completed (including the total turnaround time). Because of performance and reliability variations, hierarchical scheduling and heterogeneity of the resources, the predictability on the Grid is more complex than in classical batch farms.

The improvement of the Quality of Service is one of the most urgent issues for the successful applications on the Grid.

(Adding new capabilities (interactivity) as well)

Exploitation of application execution patterns

In traditional batch farms jobs are monolithic and typically unrelated executables. Limited features exist to support applications beyond a simple batch model. Some systems, such as LSF, allow a synchronized execution of a group of jobs (typically MPI) when the appropriate number of resources becomes available. Other systems, such as Condor DAGman, allow to introduce dependencies between the jobs using meta-scheduling. Similarly in the Grid, it is possible to specify special type jobs (such as MPI) or use meta-scheduling for job dependencies or workflows. The newest versions of the gLite middleware support bulk job submission which allows to reduce the submission time and share the common part of the input sandbox between multiple jobs. Bulk submission allows to increase the performance of job splitting i.e. submission of a large similar jobs which read different parts of a dataset.

On the other hand many applications exhibit striking structural similarities. Data-driven parallelism is very common in physics analysis applications (Atlas/Athena). Independent event simulation is possible with Monte-Carlo methods in medicine (radiotherapy). Workflow and parameter sweep applications are common in bio-informatics and exemplified by docking algorithms (Autodock). Iterative patterns and pipe-lines are present in many image processing applications (xmipp).

Because of the lack of convenient tools users build more-or-less ad-hoc solutions to manage their applications in a better way. However even more structured efforts suffer from several drawback. For example, a MPI-BLAST is a MPI-flavored version of the Basic Local Alignment Search Tool (BLAST) - a widely used tool for searching protein and DNA databases for sequence similarities [NCBI]. BLAST is a typical Master/Worker application and the appropriate management and bookkeeping layer has been implemented in MPI, and later ported to the mpich-g2 environment. There are a number of problems with such an approach:

  • the mainstream Grid installations limit the support of the MPI jobs to a single Grid cluster, so the Grid capabilities cannot be fully exploited;
  • the MPI was designed to build complex parallel applications rather than job management layers so the cost of development is relatively high;
  • the management layer must be constantly maintained and adapted to the changing Grid environment.

The exploitation of the recurrent application patterns in a structured and generic way would have an important impact on enabling new applications in the Grid.

Environment transition

Grid is only one of the environments for the everyday work of scientists. Development and testing of algorithms and formulas is an intrinsic part of many types of scientific research activities such as the particle physics. A typical pattern is to develop and debug an algorithm locally, then to test it on a larger scale using local computing resources and locally available data before harnessing the full computational power of the Grid. The transition between local and Grid environments may also happen in another direction as the research process may involve multiple algorithm development cycles.

Some user communities posses local resources which are not part of Grid: sometimes for the lack of human resources to maintain a Grid site, sometimes for the conservative policy in embracing new trends. It is clear that if such users had a possibility to mix local resources with the Grid ones it would be beneficial not only for them, but also, in the long term, for the whole Grid community.

User interfaces such as [Ganga] facilitate the transition between the local and Grids environments and provide the necessary aid for the users . However they do not provide the means to integrate both local and Grid resources into a single, useful computing platform.

A lightweight method of integration of Grid and local resources to run the applications in a robust way would be beneficial for smaller user groups as well as the whole Grid community in the long run.

Defining the viewpoint

Given the discussion above we can define the view-point for further dissertation. We look from the perspective of a user of the world's largest production Grid. We try to maximize the user's return-of-investment into a Grid technology by:

  • providing the improved Quality of Service and increased application performance and reliability in the Grid;
  • enabling the capabilities currently unavailable in the Grid such as interactivity;
  • reducing user's investment (time) needed to interface the applications which can take advantage of improved QoS and new capabilites.

As explained in the previous sections it is not practical to expect that the middleware release cycle is coupled to the applications' activity schedules. Therefore many of the generic middleware concepts which have been proposed to handle hard (deterministic) Quality of Service such as [G-QoSm], [GARA] or advanced reservations systems [G-RSVPM] cannot be useful in our environment. We probably must revert to more pragmatic, less ambitious statistical or best-effort Quality of Service implemented in a higher-middleware on top of unmodified generic middleware.

QoS References:

[GARA]

[G-QoSm] 2004 IEEE International Symposium on Cluster Computing and the Grid: QoS Support for High-Performance Scientific Grid Applications, R.Al-ali et al.

[G-RSVPM] G-RSVPM: A Grid Resource Reservation Model; First International Conference on Semantics, Knowledge and Grid (SKG'06); By Zeng Wandan, Chang Guiran, Zhang Dengke, Zheng Xiuying, November 2005


Objectives

- accomodation of short and very short jobs within batch oriented infrastructure

- improvement of efficiency (QoS) of the standard, batch oriented infrastructure in the production environment

- accomodation of different application types (embarassingly parallel applications are the most common, but also iterative, workflow, parameter sweep, divide-and-conquer): minimize the developement integration and development time

- mulitple application deployment models: it's not possible to envisage single deployment model for applications and VOs - need a flexible helper which could accomodate the differences (for example: sofwtare area in Athena or Geant4 versus dynamic software repository on storage elements)

- decouple application-specific functionality from the execution patterns - provide a

Roadmap

- study and analyse the performance of EGEE grid - try to understand if anything can be done at the end-user level

MPI_BLAST : a management layer added for data-parallel problem, the layer implemented in MPI

Related Work

Nimrod, CondorG

Charakterystyki

User level scheduling and component bus (reactor/proactor?).

Slot model for grid computing from end-user perspective: - tq, tq, p - hierarchical scheduling in the Grid: tq = tRB + tCE + tBS

QoS: - partial turnaround time - variation and predicitability of job execution - fault-tolerance and reliability

Liabilities: - fair-share

Design and koncepcje

Feasibility studies

Activities

Experiments

Conclusions

Quotes

The development of user-level middleware will be critical to the wide-spread use of and application performance on the Computational Grid. [AppLes]


-- JakubMoscicki - 16 Dec 2006

Edit | Attach | Watch | Print version | History: r14 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2006-12-19 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback