Pilot Follow-up Meeting Minutes Fri 18 Dec 2009

  • Date: Fri 18 Dec 2009
  • Agenda: 76281
  • Description: Pilot of glexec/Argus : Check-point
  • Chair: Antonio Retico
  • Home: PilotServiceArgus

Attendance

  • Operations/SA1: Antonio Retico
  • Certification/SA3: Gianni Pucciani
  • Development/JRA1: Christoph Witzig, Chad La Joie
  • SRCE: Emir Imamagic, Nikola Garafolic
  • Switch: Apologies
  • Cesnet: -
  • FZK: Apologies
  • INFN-CNAF: Andrea Ceccanti
  • SAM/Nagios: -
  • CMS:
  • ATLAS:
  • Alice:
  • LHCb: -
  • WLCG (Pilot Jobs Working Group): Maarten Litmaath

Review of action items (tasks)

SA1/SA3 tasks

Status of the subtasks of TASK:12720 (see them in the PPS tracker ) .

Notes:

Repository at CNAF now fully operational.Closed TASK:12721

New tasks for the sites to start installation will be submitted

other tasks


Assigned to Due date Description State Closed Notify  
GiuseppeMIsurelli 2010-02-19 Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs.

INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center.
Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site.

2010-02-19 MaartenLitmaath edit

Assigned to Due date Description State Closed Notify  
GianniPucciani 2010-02-19 Provide functional specification of glexec tests being implemented at SRCE     edit
ChadLaJoie 2010-02-03 Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536

this was done on the 2nd of February
This is done now at https://savannah.cern.ch/patch/?3536

2010-03-02 MaartenLitmaath edit

Actions related to development to be started were closed because items are now tracked by bugs and patches

Timeline for installation at CNAF T1 presented by Andrea:
We are converging on the fact that CNAF T1 will be the first site to offer large-scale installation to the experiments. They are currently working to the installation of glexec WN at the T1 and they hope to have finished before Christmas or alternatively by the 6th of January in order to be ready by the 15th. If the preliminary tests now undergoing succeed they are OK to use Argus 1.1 when it will be in status "Ready for Certification"

Status and results of the pilot service (by VOs and sites)

Proposal by Jose Caballero on tests to be done by admins/developers

  1. check all the problems we found with gLExec/SCAS don't appear again:
    verify gLExec/Argus work fine with proxies expired and re-generated, and with proxies with at least two levels of delegation.
  2. check that requests for different end users happening very close in time get the right response, and the new IDs are not swapped or both of them are getting incorrectly the same ID.
  3. check a sustained request rate is working OK. I think that has been done already.
  4. check what happens with peaks: a huge amount of requests within the same time window.

SWITCH

Update sent off-line by Alessandro on the points above

Point 1) Alessandro tested with a proxy as "retrieved" by the myproxy server through delegation and it worked fine

He also verified that it correctly fails with expired proxies.

Alessandro : Point 2 and 3 are trickier, but I believe they were somehow tested before and during the certification phase (see below).

Gianni: I think point 2, 3 and 4 have already been covered during certification tests: https://twiki.cern.ch/twiki/bin/view/Main/ArgusCertification Though, the load tests have been done using the pep-cli as client and not glexec, however the Argus behavior does not change.

As for point 4, while also having been part of the mentioned tests, this is in fact what we should carry out during the pilot phase….

Antonio: mentioned a private conversation he had with Sanjay Padhi (CMS) who said that he would be oriented to carry on a stress testing on the services in the framework of the standard testing of the CMS applications. More details on this will be asked to the developer in occasion of the next kick-off

CNAF

Andrea and Christoph reported about the newer version of the PEPd (the one allowing client authentication) being deployed and tested at CNAF in these days

They are aiming to have the T! ready to support large-scale usage already by the 15th of January in order to favour experiments more advanced in usage of pilot jobs.

Antonio remarked that all changes applied at the sites should be tracked and the corresponding versions should be made available to the other sites through the pilot repository at CNAF

Andrea reassured him saying that this is under control between Danilo and himself.

Status and results of the development (by developers)

not discussed

Open Issues (by VOs, sites, deployment teams)

List of Open bugs and relevant decisions

The issue related to client authentication (seen as an obstacle to the deployment in production) is tracked by

  • BUG:59709 (59718): [ARGUS] PEPd should allow only cert-chain as Subject attribute

The installation of Argus and glexec should now proceed at the other sites.

Question (Antonio): Which is the version of Argus we shoudl ask the sites to deploy? The certified one or the version 1.1 corresponding to patch 3536? The point was briefly discussed between Maarten, Christoph and Antoni. The relevant points of the dicussion are:

  • there are changes in the configuration between the certified version and the new one (1.1) . In order to enable client authentication Argus need to be re-configured via YAIM (not available yet)
  • a simple rpm upgrade from version 1.0 to 1.1 results however in a working system (although with the client authentication not set)
  • The decision was to leave the sites free to chose which version they prefer to run according to their proferences/needs wrt client authentication. In this regard the feedback from CNAF is highly welcome (e.g. special configuration instructions needed to complete the set-up, eventually to be documented in the pilot twiki )

Recommendations for release and deployment

Decision about termination/extension of the pilot

Current planning

Initial plan

Task Owner Start Date Due Date Status
Set-up repositories and documentation SA1, SA3, CNAF 23-Nov-09 24-Nov-09 Done
Preliminary installation (ARGUS, WN, CE) SWITCH 25-Nov-09 27-Nov-09 Done
Core installations (ARGUS, WN, CE) FZK, SRCE, CESNET, CNAF 30-Nov-09 10-Dec-09 In progress

Constraints and milestones

  • kick-off with sites: 25-Nov (11 AM CET)
  • 1st site technically available for Experiments to test (SWITCH): 1-Dec
  • kick-off with experiments: 1-Dec (11 AM CET)
  • All sites technically available for Experiments to test: 15-Jan
  • Indicative start of Alice developments to integrate glexec: 18-Jan
  • Indicative start of CMS developments to integrate glexec: 15-Feb
  • END of activity (proposed): 31-Mar

The milestones were reviewed.

All connected sites confirmed that the works are on the defined tracks and that they don't see problems to meet the first deadline (15th Jan). In particular:

SRCE (Emir)

Argus installed on SL5. Glexec on WNs in progress (a few issues obliged them to remove two WNs from production)

FZK (Christoph)

Communication in progress between Chad and Angela (now on leaves)

Other Christoph reported also that he is in contact with Raul Lopes from UK, Brunel (UKI-LT2-Brunel) who is willing to participate to the testing.

Antonio will get in touch with him in order to include him in the loop

AOB

Christoph wants to thank all the sites for the work kept in progress in the proximity of the Christmas

Exchange of Christmas wishes


Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2009-12-18 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback