Release Planning

31/01/06 : LCG-2_7_0 released.

The list can be sorted by clicking on the appropriate table header.

Timetable

  • Mon 9th Jan - tag and begin local testing of installations and upgrades on mini testbeds, complete documentation
  • Mon 16th Jan - release to 3 ROCs for a week of further testing
  • Mon 23rd Jan - incorporate results of ROC testing and release asap

The List

This is the proposed list for LCG-2_7_0.

Category Item Responsible Priority (1-5) Complete (%) Comments
Critical Bug Check RB, BDII, CE, PX, SE_classic, R-GMA, GFAL, LFC, SE_DPM Oliver 5 100 LCG Operations, severity 5, no release blockers.
JRA1 Middleware, priority critical, severity critical; 23 bugs, of which 16 for components we're deploying (all R-GMA)
report from R-GMA team is that there is only one genuinely critical bug (inconsistent archivers) which is there in 1.4 anyway
OSG Components needed for OSG interoperation Laurence 5 100 None Required
Docs Install Guide & Yaim docs Antonio and Alessandro 5 100 Done
Docs Upgrade Guide Antonio and Alessandro 5 100 https://twiki.cern.ch/twiki/bin/view/LCG/LfcVirtualIdsAndVOMS
https://twiki.cern.ch/twiki/bin/view/LCG/MultiDomainDpm
R-GMA upgrade procedure
Docs Site Test Guide Laurence 5 100 unchanged
Docs quickUI, Tar-Dist-Use Oliver 5 100 unchanged
Docs Release Notes Oliver/Markus 5 100  
Docs Site Setup Laurence 5 100 Add specs for production top level BDIIs
VOMS New server/client version & paraphernalia Maria/Louis 5 100 http://cern.ch/dimou/lcg/voms/voms-lcg2-2_7_0.html contains release notes, new "vomses" and "edg-mkgridmap.conf"
VOMS server to be a meta-package
VOMS lcas/lcmaps group configuration via yaim Maarten/Oliver 2 100 Important to CMS, issue now understood, yaim updates done, gridmap file for DPM/LFC done too (mapping DN to VO)
VO-BOX The VO-BOX is published in the info system. We need a mechanism that allows the VOs on the box to add their own key value pairs. Patricia 3 100 In cert
VO-BOX Include all UI functionality (ie rpms and configuration) on a VOBOX Simone/Oliver/Louis 4 100 in cert
Info system Cleanup of the info providers to make full use of the new glue schema Laurence 2 100 S. Burke to advise, ongoing work, will not be complete by 2.7.0 but whatever is there can go in
Info system Add the GlueSchemaLocation for exp installed software Oliver 3 100 rpms in cert, yaim configures gip plugin wrapper
Info system New gip configuration / yaim Oliver 5 100 Under test, in cert
Info system Jeff Templon's ETT (the NIKHEF ETT RPMs) Jeff 5 100 Yaim done, rpms in cert
LFC/DPM New LFC methods for performance improvement Jean-Philippe 5 100 done
LFC/DPM Multi-domain DPM Jean-Philippe 3 100 Code written, testing OK. installation from scratch has been successfully tested. Migration of an existing DPM has been scripted and is in test. Migration does not need a DB schema change but does need a DB content update. DPM 1.4.5 in cert, awaiting LFC
LFC/DPM VOMS enabled LFC Jean-Philippe 4 100 in certification. Server done, needs schema change, cannot be an update.
LFC/DPM lcg-info-dynamic-dpm Graeme, Laurence 3 100 In cert
lcg_utils/GFAL New version, lots of bug fixes, error messages improved James 4 100 ready for cert
Backup Back up mechanism for the mission critical DBs on the T2 and smaller centers. Mainly DPM, local LFCs, dCache internal DBs. These are critical when lost. Has to be simple, with an option to sent the backup up the chain to the corresponding T1 center (data management tools??) Probably supplied in the first instance as a HOWTO Piotr 2 100 Document available here: MysqlReplicationAndBackup. Please check for mistakes.
Separation of state and processing RB, MyProxy Yvan 1 100 Config with single active node in test, working. Not yet on cert tb. Docs and rpms referenced in release notes.
StdOut/Err Mon Update, reflecting the input that we receive by the users (some feedback has been already given). Patricia/Di 3 100 In cert, usable (?) but a number of bugs unfixed, Di on the case, 1 day (hopefully) for something usable
StdOut/Err Mon Ensure these are off by default. Patricia 5 100  
Job Mon Ensure it is off by default. Update based on user input. Modify rgma tools to take advantage of the new exceptions and error checking Laurence 4 100 Requires a few mods and testing, 1 day of effort
Job Status Mon Add to release notes that this should be turned off at the RB level if required by local policy/law Oliver 5 100  
RB We need to enable queries such as "Show me the state of all jobs of my VO". These are the queries that the experiments would like to see. Laurence 3 100 The job status monitoring already provides this
RB New wms packages with bugfixes David 5 100 in cert, working
RB Sandbox : Add a smart mechanism that limits the output sandbox size on the RB. We have a mechanism for the input sandbox already in place. The recently observed jobs with stderr files > 2GB can bring down any RB. The mechanism should work like this:
The limit has to be configurable
Sort all the files in reverse order by size
Transfer all the files that fit into the limit
For the remaining files transfer the first and last 100K and a note on the original size of the files
David 4 100 This is in cert, and deployed on our own RBs.
RB Sandbox cleanup cron Maarten/David 3 100 in cert
Security Pool account recycling Maarten 2 100 expire-gridmapdir in cert, cleanup-grid-accounts in cert
Security signed rpm distribution Louis 3 100 done for new rpms
VO management via YAIM A web based tool that displays all VOs with a short description and a comment by the ROC managers of the sites region. The site then selects the VOs and assigns shares (in case the site uses a fair share scheduler). The tool then creates the VO dependent information for YAIM. A clear distinction between pilot VOs and others has to be made. Dimitar/Oliver 3 100 Installed at CERN for permanent hosting
https://lcg-sft.cern.ch/yaimtool/yaimtool.py
Basic testing complete
VO management via YAIM If the web tool is not ready, make sure info for as many VOs as possible is shipped with yaim, ensure GEANT4 and UNOSAT are there. Oliver - 100 All VOs with registered contacts have been asked for their yaim vars. Yaim distributed with a separate file of VO information
VO management via YAIM Default VOs are to be 4 LHC experiments + biomed + dteam Oliver 3 100 MIS removed
Monitoring Remove gridftp monitor from the CE. Louis 4 100 Cannot use 'obsoletes' because this breaks combined nodes. Mentioned in the release notes. This should only be on SEs now (not CE or RB)
R-GMA Inclusion of the latest R-GMA ( gLite 1.5 ) Laurence 4 100 Latest rpms in testing with Job Status Tables. New rpms fix recent memory problems, currently looking like an improvement on 1.4
Upgrade procedure required as 1.4 clients cannot communicate with 1.5 server
Turn off non-authenticated connectors
FTS Clients Ensure we deploy the correct FTS clients Gavin 5 100 In cert
d-Cache new d-Cache, 1.6.6-4 Maarten,Owen,Oliver 1 100 Simpler configuration, bug fixes, pnfs now on postgres, billing/audit database, audit for all transactions except srm-copy, best with version 8.1 postgres.
Rutherford tier 1 running this since dec 05
supported with patched 2.6.0 yaim for now, Owen's stuff to come a little later
SFT Latest SFT rpms Piotr 2 100  

Priority: 1 is low, 5 is high.

The multi VO FTS service will be released independently using the gLite distribution.

No longer proposed for 2.7.0, under discussion for subsequent releases

Category Item Responsible Priority (1-5) Complete (%) Comments
VO-BOX List of trusted domains / networks for iptables configuration Romain 4 - Does this need to be redelegated?
Info system Update of info providers to publish versions for each service Laurence 3 0% Lots of work, needs to be done per Service, not started
Info system Add a key value pair to the info provider for services that declares the service as being part of production. This is already published for some services, but YAIM is not configuring this. Laurence 3 0% Not Started, Lots of work! Need to add a new Glue attribute
Info system VOMS server info provider Laurence 3 0% Not Started, will not be ready for 2.7.0
LFC/DPM srm-copy Jean-Philippe - 50 not ready yet, still possible depending on priorities
LFC/DPM using the internal catalogue as a local file catalogue Jean-Philippe - - No code change was required, but not tested, not ready for 2.7.0
LFC/DPM VOMS enabled DPM Jean-Philippe 0 0 Alpha. Needs schema change, cannot be an update
LFC/DPM LFC/DPM ReadOnly LFC - replication Jean-Philippe 0 0 oracle -> oracle replication in test on db testbed
oracle -> MySQL replication currently unavailable
LFC/DPM Python and Perl interfaces to DPM Jean-Philippe 3 90 Missing a couple of new methods for perl/python
Separation of state and processing CE Andrey - - Will not be ready for 2.7.0
Job monitoring tools (job mon) Clarify that the reported CPU and wall clock time is correct, especially the double zero values should be understood Maarten 2 0 Not started, not for 2.7.0
Security make the fork job manager be properly authorized as previously discussed Maarten/David 4 0 Could be ready if top priority, testing crucial. Would not cover all fork job manager abuses but would be a big improvement. Can be put out as an upgrade
BDII The top level BDIIs should be published as services by the site BDII Laurence 3 0 Not Started, Difficult
BDII we need an info provider that can add a few values that reflect the load on the node Laurence 3 50 Script done but need to wait for BDIIs to be published as a service
Monitoring End to end monitoring (e2e monit), on the MON, former WP7, must be enabled by a switch in site-info.def Martin Swaney, James 1 50 rpms have been delivered, not ready, not critical
VDT/Globus Upgrade to more recent version of VDT and Globus.
This should be synchronized with OSG and gLite
Maarten - - When?? This will important
MPI Improve MPI support - 1 0 This will become important for gLite3
Torque Torque 2 is available   0 0 See bug #14260
A couple of new directories are needed, and the init.d scripts have changed names, and the RPMs are organized differently. The config files are not different. Jobmanager patch required

Other wishlist stuff, to be tabulated

  1. All the gLite components ready for the road in time. This depends on how well gLite 1.4.1/1.5 do on the pre production service.
    • Goal: gLite WLM (RB,CE, and UI)
    • Modifications needed for interoperation (gLite WLM (broker and UI) + LCG2 CEs and WNs,)
    • WLM (broker):
      • Information provider
      • Verification that broker info file is there
      • Jobwrapper scripts
      • Logging and Bookkeeping and monitoring to R-GMA
      • Tarballisation of gLite UI
      • State externalization (can be done later, needed only for building a resilient system)
  2. MyProxy server consolidation
    • This service is becoming more and more used and we need to find ways to manage the access.
    • A VO for services could be introduced with roles to allow fine grained control. There is currently a limitation to 16 roles/VO. We should check how this can be changed. There is no mapping involved, the roles are just there to simplify the control configuration
  3. MySQL node type
    • All services, except R-GMA mon box and the RB can use one DB located on one node.
  4. Add squid as a service for ATLAS and CMS on the T1s, motivation has been provided by Rod Walker.
    • We on Atlas have several use cases for sites having a caching web proxy. I would think other VO's may make use of it too.
      • Ad hoc user code/data distibution.
      • Conditions/geometry/calibration data either flatfile or FronTier .
      • Proxy helps with private network clusters.
    • The installation needed would be standard except for an increase in the max size of the cached objects. An env LCG_HTTP_PROXY could be set, and only if this is copied to HTTP_PROXY will squid be used, so it would have no impact on other users.
    • Configuration
      • max cached object of say 2GB
      • cache size of 50GB for one VO
      • default cache turnover policy
      • location, location, location. Maybe SE?
    • NOTE: The above is based on input from Rod Walker
  5. Pilot jobs A service on the CE is needed to allow pilot jobs on the WN to announce change of user for the running job. lhcb and other VOs submit pilot jobs that pull user jobs from their own task queues. To allow the site to make a final decision on who can run and produce proper traces the framework of the VO needs to contact the gate keeper and request it to accept or reject the change of user.
  6. Pool account clean up cron job - Maarten and Jeff
  7. VO-BOX - Job submission capability via condor; only alice was interested, and they do not seem to be any more...
Edit | Attach | Watch | Print version | History: r72 < r71 < r70 < r69 < r68 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r72 - 2007-02-14 - FlaviaDonno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback