Nr | Task | Who | Priority | Needed when | Verified | Status | Details | ETA |
---|---|---|---|---|---|---|---|---|
1. Installation | ||||||||
1.1 | Package dependencies defined | Luigi (JRA1) - Sara, Simone (SA1) | 10 | before first certification | No | Done according to development team. To be confirmed by SA3 | ||
1.2 | No redundant packages | Luigi (JRA1) - Sara, Simone (SA1) | 5 | a.s.p. | No | Done according to development team. To be confirmed by SA3 | - | |
1.3 | Common packages, including external packages, versions should be consistent with other node types | (JRA1) - Sara, Simone (SA1) | 4 | a.s.p. | No | Done according to development team. To be confirmed by SA3 | - | |
1.4 | The file locations should follow the standard convention | Luigi, Alvise (JRA1) | 6 | before first rollout | No | Done according to developemnt team (logs moved to /opt/glite/var/log). To be confirmed by SA3 |
||
1.5 | Build on ETICS for SL4 with VDT-1.6 | (JRA1) | 10 | before first rollout | OK | Done | - | |
2. Configuration | ||||||||
2.1 | YAIM will be used and should be compatible with the component centric YAIM architecture and only configure what is needed | Sara, Simone, Cristina (SA1) | 10 | before first rollout | No | Done according to development team. To be verified by SA3 |
||
3. Documents | ||||||||
3.1 | Release notes | Luigi, Alvise (JRA1) | 10 | before PPS | No | Done according to development team. To be confirmed by SA3 | CREAM release notes published at: http://grid.pd.infn.it/cream/field.php?n=Main.ReleaseNotes![]() |
- |
3.2 | User guide for the clients | (JRA1) - (SA3) | 8 | before PPS | No | Done according to development team. To be confirmed by SA3 | For submissions to CREAM via WMS no specific guide is needed (i.e. the WMS guide is the proper documentation) since knowing the CE type is not important. For direct submissions to CREAM (i.e. bypassing the WMS) a CREAM user guide along with a CREAM JDL guide is available in the CREAM web site (htp://grid.pd.infn.it/grid). | |
3.3 | Basic guide for operations covering the different deployment scenarios | (SA3) - (SA1) - (JRA1) | 8 | before prod | No | In progress | Besides the documentation for the yaim based installation and configuration, some documentation targeted to sysadmins is available in the CREAM web site (http://grid.pd.infn.it/cream![]() |
- |
4. Functionality | ||||||||
4.1 | Accounting system, APEL has to work | Alessio, Elisabetta (SA3) | 10 | before PPS | No | In progress | This was tested for LSF. The records get properly accounted, but it looks like there is a bug in APEL (#30041). Tests to be done for Torque. | |
4.2 | Information system, BDII will be used and should be able to publish VO tag (gridftp server is needed) and other runtime environment, correctly publish static and dynamic information using glue schema (version >= 1.3), sanity check | Cristina, Sara, Simone (SA1) | 10 | before PPS | No | Done according to development team. To be confirmed by SA3 | There isn't anything specific to CREAM. It is exactly the same stuff used in LCG CE and gLite CE. Done when Task 2.1 is done | |
4.3 | Security, proxy with VOMS extension has to be supported, CRL update | Luigi (JRA1) | 9 | before PPS | No | Done according to development team. To be confirmed by SA3 | - | - |
4.4 | Job submission through WMS and CLI on UI | Luigi (JRA1) | 9 | before PPS | Yes | Done | Job submissions to CREAM is already possible via the WMS and also interacting directly with CREAM (i.e. bypassing the WMS). A "official" CREAM CLI exists | - |
4.5 | Job submission through Condor-G | Massimo, Francesco, Luigi (JRA1) - Condor | 7 | later | No | In progress | Some work was done Need to re-contact Condor guys since the CREAM interface had to be changed |
|
4.6 | Batch system support, start with torque and LSF, Condor and SGE later | Alessio, Elisabetta, Mezzadri, Prelz (SA3) - Luigi (JRA1) | 8 | before PPS | No | In progress | The interaction with the batch system is fully managed by BLAH, which already supports Torque/PBS and LSF (submissions to these batch systems via CREAM has been verified). The BLAH BLparser hasbeing reimplemented, also to facilitate the porting to new batch systems. This modification required some changes in the CREAM code as well. A first implementation of this new BLAH BLParser supporting Condor has been done. Basic tests have been done at PIC (submissions via WMS and via CREAM-CLI) and it seems working (so far the only seen problem is that the ReallyRunning event is not logged by the LRMS: to be investigated). PIC people are going to do more tests. When the new BLAH model will prove to be reliable, it will be used also for LSF and PBS. |
- |
4.7 | Support passing parameters to the batch systems | Luigi, Alvise (JRA1) - Elisabetta (SA3) | 7 | later | No | Done according to development team. To be confirmed by SA3 | CREAM implements this feature via Blah, in the same way done in the gLite CE. So the JDL 'Requirements' attributes listed as 'CeForwardParameters' in the WMS conf. file are forwarded to BLAH (as 'CERequirements' in the classad sent to BLAH). Then the "local" scripts, invoked by the BLAH submission scripts, have to be properly customized by the local sysadmin. This is explained in patch https://savannah.cern.ch/patch/?func=detailitem&item_id=1044![]() |
|
4.8 | Support stdout and stderr monitoring | Luigi, Paolo (JRA1) | 5 | later | No | Done according to development team. To be confirmed by SA3 | Supported via 'Job perusal', for jobs submitted to CREAM via WMS and also directly from UI | |
4.9 | Support MPI | Luigi, Paolo (JRA1) - Barbera (NA4) | 5 | later | No | Done according to development team. To be confirmed by SA3 | MPI jobs supported for jobs submitted to CREAM via WMS and also directly from UI. Implemented the new functionality requested by the MPI WG of TCG | |
4.10 | Proxy renewal | Alvise, Moreno, Luigi (JRA1) Alessio, Elisabetta (SA3) | 10 | before PPS | no | In progress | Done Known issue: from time to time BLAH reports that the proxy renewal operation was successfully done, while the proxy was not actually renewed. |
|
4.11 | Support more than 5000 simultaneous jobs, less than 0.5% jobs fail due to CE | (JRA1) | 9 | before PPS | yes | In progress | This was demostranted in the CREAM tests done in the summer (see the test results). Being re-tested with the redesigned CREAM-ICE |
- |
5. Operations | ||||||||
5.1 | Port list | (JRA1) - (SA3) | 10 | before certification | no | Done according to development team. To be confirmed by SA3 | List published in http://grid.pd.infn.it/cream/field.php?n=Main.PortsUsedInACREAMCE![]() |
24/09/2007 |
5.2 | Long time unattended running, more than 5 days, eventually extend to 1 month | (JRA1) - (SA3) | 8 | later | No | In progress | To be tested | - |
5.3 | Logfile rotation | Sara, Simone, Cristina (SA1) | 7 | before prod | No | Done according to development team. To be confirmed by SA3 | CREAM and CEMon log file rotation implemented via log4j. For the other log files (glexec, blah) log rotation implemented within YAIM | |
5.4 | Audit trace management | Luigi (JRA1) | 10 | before PPS | No | Done according to development team. To be confirmed by SA3 | All the accesses are properly logged in the CREAM and glexec log files (the verbosity can be tuned) | |
5.5 | All services should be up after rebooting, and less than 0.5% jobs lost | Paolo (JRA1) | 6 | later | no | Blocked | This was already demonstrated during the summer tests: with a restart of the service very few jobs got lost. However there is a known issue happening just after the restart of the service (bug #22437). The new voms (1.8) software is supposed to address this issue. Its integration requires some changes in the CREAM code (being done), but first the integration should be done on util-java, authz-framework and delegation-java |
|
5.6 | Clean up pool accounts for dynamic mapping | Sara, Simone, Cristina (SA1) - (JRA1) | 10 | before prod | No | Done according to development team. To be confirmed by SA3 | Done by lcg-expiregridmapdir cron job | - |
5.7 | Clean up obsolete and temporary files, specially the files under the home directories of pool accounts | Alessio, Elisabetta (SA3) | 5 | before prod | No | Done according to development team. To be confirmed by SA3 | Done by cleanup-grid-accounts cron job | - |
5.8 | SAM monitoring integration | Sara, Simone, Cristina (SA3) | 8 | later | no | in progress | Need to contact SAM people to understand in detail what has to be done (e.g. are there some templates that can be considered ?). This will start when task 2.1 is done | |
5.9 | Verify that no serious memory leaks are present | Alvise (JRA1) | 9 | before prod | No | In progress | CREAM and ICE seems ok. There is a memory leak in classad.jar Pinged many times Condor people to have the new jclassad with this leak fixed. For the time being need to replace the classad.jar with a patched one as post-install task. Serious leaks in ICE fixed, but some are still there. Implemented suicidal patch (under tests) |