Workplan for the first year

The gLite job management services have already been used in production environments, some new functionality addressing issues already raised by users and administrators have already been identified (see below). Some others will be raised by existing and new user communities during the duration of the project, and it will be important to be able to promptly address such requests.

Among the already identified area where work is needed, there is the design and implementation of general and adaptive feedback mechanisms. The High Throughput Computing paradigm typically involves a scenario whereby a given, estimated processing power is made available and sustained by the computing environment over a medium/long period of time. As a consequence, the performance goals are in general targeted at maximizing resource utilization to obtain the expected throughput, rather than minimizing run time for individual jobs. However, there are use-cases for which the described model does not immediately fit. This is the case e.g. for computations which can be split in uniform and independent jobs (which can therefore be executed in parallel), which however must be all successfully completed to consider the overall computation as completed. For such scenarios a Minimum Completion Time (MCT) dynamic scheduling policy, where not even a single job should lay in non-terminal Grid states, is therefore required. Considering the gLite job submission environment where jobs are submitted to computing elements though the Workload Management System, this means the implementation of an adaptive feedback mechanism to migrate jobs stuck in blocking queues. A first solution implementing a general and adaptive feedback mechanism in the gLite job management services is supposed to be provided with the first EMI major release.

Another area where work is needed is the improvement of parallel and MPI support. Although the EGEE projects (the projects where the gLite middleware was developed and used) were initially focused on the HEP community, other communities were added later, such as chemistry, biology, medical imaging, etc. In these communities the use of parallel programs is quite common. While the execution of parallel jobs is already supported by the gLite job management services, there are still some open issues, such as the inability to fully describe and allocate accordingly the resources in a multi-core environment. The EGEE MPI WG at the end of the EGEE-III project produced a document with a set of recommendations for the middleware, to address such shortcomings. Such requests will be implemented in the CREAM-CE and in the gLite WMS in the first EMI release.

Implementation

Provision of general adaptive feedback mechanism in the WMS

  • Work in the WMS is in progress. The plan is to release it with WMS version 3.3
  • Support in LB being provided with LB 2.1

Improvement of parallel and MPI support

08 Sept 2010

At 0th order, the idea is to support the new attributes (smpgranularity, etc.) in the CREAM-CE within the cerequirements attribute (see here)

E.g.: cerequirements = "smpgranularity == 4";

For this purpose, scripts xxx_local_submit_attributes.sh (for lsf and pbs/torque) have been implemented.

Some changes were needed also to the xx_submit.sh scripts.

These scripts were given to some MPI users for testing and validation.

They can be tested considering the direct job submission to CREAM (i.e. glite-ce-job-submit command).

When these scripts are validated, the "logic" will be moved inside cream and blah (which will need to be modified), so that the new attributes can be used as "first" level jdl attributes, e.g. cpunumber=4;

-- MassimoSgaravatto - 06-Sep-2010

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2010-09-08 - MassimoSgaravatto
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback