PPS Pilot Follow-up Meeting Minutes Thu 11 Dec 2008

  • Date: Thu 11 Dec 2008
  • Agenda: na
  • Description: PIlot of SLC5 WN at CERN-PROD
  • Chair: Antonio Retico


  • PPS: Antonio Retico
  • CERN: Ulrich Schwickerath, Harry Renshall
  • CMS: Andrea Sciaba'
  • Alice: apologise
  • LHCb: apologise
  • Atlas: absent

NOTE: The purpose of this meeting is to agree on short and middle term timeline for the pilot activities

Review of action items (tasks)

Status of the subtasks of TASK:8350 (see them in the PPS tracker ) .

No open tasks to discuss

Status and results of the pilot service (by VOs and sites)

Harry expresses a concern about the Cern libs which are not included in the SLC5 distribution.

Ulrich: I mentioned the potential issue with the OS providers, but apparently there would be problems with them (binaries wouldn't work). Alice has assured however that the OS works perfectly with their applications. Perhaps they ship the libraries.

Ulrich: we will deploy only what comes with the operating system.One version of python (the default, 2.5) and of gcc (4.1). I don't have any rpms which I could install and which would not conflict with these (which are required by the OS).

Andrea: there are no problems reported with that but I don't think that CMS has actually tried and compile the application software. For running that's just fine. Has this been discussed with people in charge of the Applications Area? Ulrich: not with me

Andrea: CMS tested on the SLC5 WNs with ProdAgent submitting Montecarlo jobs . 55 events were reconstructed with 11 jobs. Having more resources available would help to build a statistically significant sample. I am going to ask to the CRAB developers to reconfigure their application to include the SLC5 WNs. In that respect an official communication to CMS would help.

Harry: there is a statement from IT that he migration to SLC5 is howevevr going to happen next year and that the ramp-up will start in January. Probably this has to be given more resonance. Maybe sending a detailed plan in advance to info-experiments.

Ulrich proposed his timeline for the upgrade of the system:

  • Before Christmas we want to test the upgrade of the WN which we have just received which include VDT1.10 and Java1.6. The Experiments will be requested to try it just before Christmas. From a briefing with Roberto Santinelli (LHCb) it turned out that DIRAC3 currently cannot cope with the published value of glueCEStateStatus='Preproduction' because of very subtle problems. Therefore the only way to allow all the experiments to test would be to reserve two days for LHCb before Christmas in which ce110 will publish as 'Production'.

NOTE:The proposed days for LHCb to test are Tuesday16th and Wednesday17th December

CMS has no objections and will inform the testers consequently.

Antonio: are you still waiting for something from the integration team as far as the rpms are concerned?

Ulrich: Nothing to wait for form the integration team. The quattorization of the nodes is completed (UPDATE 12/12: A first test went through, one problem spotted related to gssklog (automatic grabbing does not work, possibly related to the VDT update, need to check that. This affects jobs which need access to AFS)

  • We plan to increase gradually the resources in January. We will receive new nodes in January that will be allocated directly to the SLC5 subcluster. We are thinking about draining two CEs during the Christmas break and to reconvert them to SLC5. After the Christmas break we will publish glueCEStateStatus='Production'.New nodes will go to SLC5 only, later existing SLC4 resources will be gradually migrated.

Andrea: How the SLC5 nodes will be identified in the OS?

Ulrich: by the OS version

Andrea: CRAB and ProdAgent currently don't select the OS. It can be done, eventually. in case of problems. is there a standard set on how a site should publish the OS now?

Antonio: Technically there isn't, but there is a gstat test applied and monitored in production so it is a value to which the sites have to comply

Harry: the method is exactly the same used for the migration form SLC3 to SLC4

Recommendations for release and deployment

The following timeline was discussed and agreed by the participantes

  • by the end of this week: upgrade to the new version of WNs.
  • next week: test by the experiments. Two days (Tuesday and Wednesday reserved to LHCb)
  • Xmas break: draining two CEs form the production set
  • January: ramp-up of the resources with new nodes (to be delivered in January)
  • end of January: formally opening the production SLC5 subcluster (end of the pilot. ce110 is released for other tests)

Decision about termination/extension of the pilot

the pilot phase could end at the end of January (see timeline above)

Next check point: 8th January 2009


