TWiki> EGEE Web>CECheckList (revision 39)EditAttachPDF
---++ CE Check List

These notes describe the check list for a release candidate CE.

Nr Task Who Priority Needed when Verified Status DetailsSorted ascending ETA
4.3 Security, proxy with VOMS extension has to be supported, CRL update Luigi (JRA1) 9 before PPS No Done according to development team. To be confirmed by SA3 - -
1. Installation
2. Configuration
3. Documents
4. Functionality
5. Operations
5.4 Audit trace management Luigi (JRA1) 10 before PPS No Done according to development team. To be confirmed by SA3 All the accesses are properly logged in the CREAM and glexec log files (the verbosity can be tuned) 10/10/2007
1.5 Build on ETICS for SL4 with VDT-1.6 (JRA1) 10 before first rollout OK Done As of Nov. 12, 2007, all *.ce modules and ice build properly. See ETICS build reports -
3.3 Basic guide for operations covering the different deployment scenarios (SA3) - (SA1) - (JRA1) 8 before prod No In progress Besides the documentation for the yaim based installation and configuration, some documentation targeted to sysadmins is available in the CREAM web site (http://grid.pd.infn.it/cream) under "Administrator Guides". This is being augmented -
5.3 Logfile rotation Sara, Simone, Cristina (SA1) 7 before prod No Done according to development team. To be confirmed by SA3 CREAM and CEMon log file rotation implemented via log4j. For the other log files (glexec, blah) log rotation implemented within YAIM  
4.7 Support passing parameters to the batch systems Luigi, Alvise (JRA1) - Elisabetta (SA3) 7 later No Done according to development team. To be confirmed by SA3 CREAM implements this feature via Blah, in the same way done in the gLite CE. So the JDL 'Requirements' attributes listed as 'CeForwardParameters' in the WMS conf. file are forwarded to BLAH (as 'CERequirements' in the classad sent to BLAH). Then the "local" scripts, invoked by the BLAH submission scripts, have to be properly customized by the local sysadmin. This is explained in patch https://savannah.cern.ch/patch/?func=detailitem&item_id=1044 and in https://twiki.cern.ch/twiki/bin/view/EGEE/INFN_Test_Results. For direct submissions to the CREAM CE, the CREAM JDL 'CERequirements' attribute can be used, as documented in the CREAM JDl guide 11/10/2007
3.1 Release notes Luigi, Alvise (JRA1) 10 before PPS No Done according to development team. To be confirmed by SA3 CREAM release notes published at: http://grid.pd.infn.it/cream/field.php?n=Main.ReleaseNotes and updated whenever a new version is released -
5.9 Verify that no serious memory leaks are present Alvise (JRA1) 9 before prod No In progress CREAM seems ok. Memory leaks in ICE likey due to leaks in globus and gridsite libraries. Temporary fix is to implement the “harakiri patch” (more or less the “suicidal patch” used in WMProxy). ICE memory usage reduction being done as well  
5.7 Clean up obsolete and temporary files, specially the files under the home directories of pool accounts Alessio, Elisabetta (SA3) 5 before prod No Done according to development team. To be confirmed by SA3 Done by cleanup-grid-accounts cron job -
5.6 Clean up pool accounts for dynamic mapping Sara, Simone, Cristina (SA1) - (JRA1) 10 before prod No Done according to development team. To be confirmed by SA3 Done by lcg-expiregridmapdir cron job -
1.1 Package dependencies defined Luigi (JRA1) - Sara, Simone (SA1) 10 before first certification In progress In progress Done, but being checked with task 2.1  
3.2 User guide for the clients (JRA1) - (SA3) 8 before PPS No Done according to development team. To be confirmed by SA3 For submissions to CREAM via WMS no specific guide is needed (i.e. the WMS guide is the proper documentation) since knowing the CE type is not important. For direct submissions to CREAM (i.e. bypassing the WMS) a CREAM user guide along with a CREAM JDL guide is available in the CREAM web site (htp://grid.pd.infn.it/grid).  
4.10 Proxy renewal Alvise, Moreno, Luigi (JRA1) Alessio, Elisabetta (SA3) 10 before PPS no in progress In the current implementation of CREAM/ICE, proxy renewal is implemented, but there are known problems occuring when the load of the system is high. This is being addressed now. This required a code redesign both in ICE and CREAM (e.g. in CREAM a DB will be used for the backend). This work is also going to improve the scalability and the efficiency of the system, but is taking more than originally expected  
4.4 Job submission through WMS and CLI on UI Luigi (JRA1) 9 before PPS Yes OK Job submissions to CREAM is already possible via the WMS and also interacting directly with CREAM (i.e. bypassing the WMS). A "official" CREAM CLI exists -
5.1 Port list (JRA1) - (SA3) 10 before certification no Done according to development team. To be confirmed by SA3 List published in http://grid.pd.infn.it/cream/field.php?n=Main.PortsUsedInACREAMCE and communicated to John White for its inclusion in org.glite.site-info.ports/doc/middleware-ports.txt 24/09/2007
4.9 Support MPI Luigi, Paolo (JRA1) - Barbera (NA4) 5 later No In progress MPI jobs supported for jobs submitted to CREAM via WMS and also directly from UI. Still to implement the new functionality requested by the MPI WG of TCG  
1.2 No redundant packages Luigi (JRA1) - Sara, Simone (SA1) 5 a.s.p. No Done according to development team. To be confirmed by SA3   -
1.3 Common packages, including external packages, versions should be consistent with other node types (JRA1) - Sara, Simone (SA1) 4 a.s.p. No Done according to development team. To be confirmed by SA3   -
5.8 SAM monitoring integration Sara, Simone, Cristina (SA3) 8 later no in progress Need to contact SAM people to understand in detail what has to be done (e.g. are there some templates that can be considered ?). This will start when task 2.1 is done unknown
5.5 All services should be up after rebooting, and less than 0.5% jobs lost Paolo (JRA1) 6 later no blocked still failing the first connection after start-up, waiting for feed-back from MSWG 1 week since unblocked
4.8 Support stdout and stderr monitoring Luigi, Paolo (JRA1) 5 later No Done according to development team. To be confirmed by SA3 Supported via 'Job perusal', for jobs submitted to CREAM via WMS and also directly from UI  
4.5 Job submission through Condor-G Massimo, Francesco, Luigi (JRA1) - Condor 7 later No In progress

the integration of CREAM and Condor-G already started; some simple jobs have been correctly submitted to CREAM (problem with output sandbox transfering); basic Condor-G->CREAM operations implemented (to be tested). This willhave to be revised when the new CREAM sw (see task 4.10) is ready. CEMon integration for async notification of job status changes to be done.

31/10/2007
4.6 Batch system support, start with torque and LSF, Condor and SGE later Alessio, Elisabetta, Mezzadri, Prelz (SA3) - Luigi (JRA1) 8 before PPS No In progress The interaction with the batch system is fully managed by BLAH, which already supports Torque/PBS and LSF (submissions to these batch systems via CREAM has been verified). The BLAH BLparser is being reimplemented, also to facilitate the porting to new batch systems. This modification will require some changes in the CREAM code as well. A first implementation of this new BLAH BLParser supporting Condor is expected by end of November. . The teams responsible for Condor and SGE support have been informed that customizing the current implementation of the code doesn't make too sense since, as said above, BLAH BLParser is being redesigned -
4.2 Information system, BDII will be used and should be able to publish VO tag (gridftp server is needed) and other runtime environment, correctly publish static and dynamic information using glue schema (version >= 1.3), sanity check Cristina, Sara, Simone (SA1) 10 before PPS No In progress There isn't anything specific to CREAM. It is exactly the same stuff used in LCG CE and gLite CE. Done when Task 2.1 is done 01/10/2007
4.11 Support more than 5000 simultaneous jobs, less than 0.5% jobs fail due to CE (JRA1) 9 before PPS yes OK This was demostranted in the CREAM tests done in the summer (see the test results). This will have to be re-demonstrated when the on-going redesign of the system (see task 4.10) is done -
4.1 Accounting system, APEL has to work Alessio, Elisabetta (SA3) 10 before PPS No In progress This was tested for LSF. The records get properly accounted, but it looks like there is a bug in APEL (#30041). Tests to be done for Torque.  
2.1 YAIM will be used and should be compatible with the component centric YAIM architecture and only configure what is needed Sara, Simone, Cristina (SA1) 10 before first rollout No In progress Three known non blocking problems: see here. Di is going to try the installation at Cern to double check that there aren't other issues. If this is the case, installation procedure can be used also in other external sites. Still to complete the documentation on twiki at Cern  
5.2 Long time unattended running, more than 5 days, eventually extend to 1 month (JRA1) - (SA3) 8 later No In progress To be done when the new software (see task 4.10) is ready -
1.4 The file locations should follow the standard convention Luigi, Alvise (JRA1) 6 before first rollout No In progress Verfying standard conventions in the developers guide 24/09/2007

Test results for LCG-CE on SL4: LCG-CE

Test results for gLite-CE on SL3: gLite-CE SL3

Test results for cream on SL3: cream SL3

-- Main.markusw - 09 Aug 2007 -- DiQing - 09 Aug 2007

Edit | Attach | Watch | Print version | History: r43 | r41 < r40 < r39 < r38 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r39 - 2007-11-22 - MassimoSgaravatto
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback