SA1/SA3 tasks
Status of the subtasks of TASK:12720 (see them in the PPS tracker ) .
Notes:
Repository at CNAF now fully operational.Closed TASK:12721
New tasks for the sites to start installation will be submitted
other tasks
Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs. INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center. Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site.
Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536 this was done on the 2nd of February This is done now at https://savannah.cern.ch/patch/?3536
Actions related to development to be started were closed because items are now tracked by bugs and patches
Timeline for installation at CNAF T1 presented by Andrea:
We are converging on the fact that CNAF T1 will be the first site to offer large-scale installation to the experiments. They are currently working to the installation of glexec WN at the T1 and they hope to have finished before Christmas or alternatively by the 6th of January in order to be ready by the 15th. If the preliminary tests now undergoing succeed they are OK to use Argus 1.1 when it will be in status "Ready for Certification"
Status and results of the pilot service (by VOs and sites)
Proposal by Jose Caballero on tests to be done by admins/developers
check all the problems we found with gLExec/SCAS don't appear again: verify gLExec/Argus work fine with proxies expired and re-generated, and with proxies with at least two levels of delegation.
check that requests for different end users happening very close in time get the right response, and the new IDs are not swapped or both of them are getting incorrectly the same ID.
check a sustained request rate is working OK. I think that has been done already.
check what happens with peaks: a huge amount of requests within the same time window.
SWITCH
Update sent off-line by Alessandro on the points above
Point 1) Alessandro tested with a proxy as "retrieved" by the myproxy server through delegation and it worked fine
He also verified that it correctly fails with expired proxies.
Alessandro : Point 2 and 3 are trickier, but I believe they were somehow tested before and during the certification phase (see below).
Gianni: I think point 2, 3 and 4 have already been covered during certification
tests:
https://twiki.cern.ch/twiki/bin/view/Main/ArgusCertification
Though, the load tests have been done using the pep-cli as client and not glexec, however the Argus behavior does not change.
As for point 4, while also having been part of the mentioned tests, this is in fact what we should carry out during the pilot phase….
Antonio: mentioned a private conversation he had with Sanjay Padhi (CMS) who said that he would be oriented to carry on a stress testing on the services in the framework of the standard testing of the CMS applications. More details on this will be asked to the developer in occasion of the next kick-off
CNAF
Andrea and Christoph reported about the newer version of the PEPd (the one allowing client authentication) being deployed and tested at CNAF in these days
They are aiming to have the T! ready to support large-scale usage already by the 15th of January in order to favour experiments more advanced in usage of pilot jobs.
Antonio remarked that all changes applied at the sites should be tracked and the corresponding versions should be made available to the other sites through the pilot repository at CNAF
Andrea reassured him saying that this is under control between Danilo and himself.
Status and results of the development (by developers)
not discussed
Open Issues (by VOs, sites, deployment teams)
List of Open bugs and relevant decisions
The issue related to client authentication (seen as an obstacle to the deployment in production) is tracked by
BUG:59709 (59718): [ARGUS] PEPd should allow only cert-chain as Subject attribute
The installation of Argus and glexec should now proceed at the other sites.
Question (Antonio): Which is the version of Argus we shoudl ask the sites to deploy? The certified one or the version 1.1 corresponding to patch 3536? The point was briefly discussed between Maarten, Christoph and Antoni. The relevant points of the dicussion are:
there are changes in the configuration between the certified version and the new one (1.1) . In order to enable client authentication Argus need to be re-configured via YAIM (not available yet)
a simple rpm upgrade from version 1.0 to 1.1 results however in a working system (although with the client authentication not set)
The decision was to leave the sites free to chose which version they prefer to run according to their proferences/needs wrt client authentication. In this regard the feedback from CNAF is highly welcome (e.g. special configuration instructions needed to complete the set-up, eventually to be documented in the pilot twiki )
Recommendations for release and deployment
Decision about termination/extension of the pilot
Current planning
Initial plan
Task
Owner
Start Date
Due Date
Status
Set-up repositories and documentation
SA1, SA3, CNAF
23-Nov-09
24-Nov-09
Done
Preliminary installation (ARGUS, WN, CE)
SWITCH
25-Nov-09
27-Nov-09
Done
Core installations (ARGUS, WN, CE)
FZK, SRCE, CESNET, CNAF
30-Nov-09
10-Dec-09
In progress
Constraints and milestones
kick-off with sites: 25-Nov (11 AM CET)
1st site technically available for Experiments to test (SWITCH): 1-Dec
kick-off with experiments: 1-Dec (11 AM CET)
All sites technically available for Experiments to test: 15-Jan
Indicative start of Alice developments to integrate glexec: 18-Jan
Indicative start of CMS developments to integrate glexec: 15-Feb
END of activity (proposed): 31-Mar
The milestones were reviewed.
All connected sites confirmed that the works are on the defined tracks and that they don't see problems to meet the first deadline (15th Jan). In particular:
SRCE (Emir)
Argus installed on SL5. Glexec on WNs in progress (a few issues obliged them to remove two WNs from production)
FZK (Christoph)
Communication in progress between Chad and Angela (now on leaves)
Other
Christoph reported also that he is in contact with Raul Lopes from UK, Brunel (UKI-LT2-Brunel) who is willing to participate to the testing.
Antonio will get in touch with him in order to include him in the loop
AOB
Christoph wants to thank all the sites for the work kept in progress in the proximity of the Christmas
Exchange of Christmas wishes