Pilot Follow-up Meeting Minutes Wed 17 Mar 2010

  • Date: Wed 17 Mar 2010
  • Agenda: 87203
  • Description: Pilot of glexec/Argus : Check-point
  • Chair: Antonio Retico
  • Home: PilotServiceArgus


  • Operations/SA1: Antonio Retico, Mario David
  • Certification/SA3: Gianni Pucciani
  • Development/JRA1: Christoph Witzig
  • SRCE: Apologies
  • Switch: Alessandro Usai
  • Cesnet: -
  • FZK: Troubles to connect (at a conference)
  • INFN-CNAF: Giuseppe Misurelli
  • IFIC: -
  • SAM/Nagios: -
  • CMS: Apologies
  • ATLAS: -
  • Alice: Apologies
  • LHCb: -
  • WLCG (Pilot Jobs Working Group): Maarten Litmaath

NOTE: For the second time the meeting on EVO gave problems (no audio for many of the participants); conference moved on the Alcatel system. This will be used for the next check point as well

Review of action items (tasks)

SA1/SA3 tasks

Status of the subtasks of TASK:12720(see them in the PPS tracker ) .

TASK:13188 (Installation at CESNET) : No news TASK:13599 (Instalaltion at CNAF): this is now done. See report later. The task will be closed and replaced by a "service management" task

other tasks

Assigned to Due date Description State Closed Notify  
GiuseppeMIsurelli 2010-02-19 Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs.

INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center.
Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site.

2010-02-19 MaartenLitmaath edit

Assigned to Due date Description State Closed Notify  
GianniPucciani 2010-02-19 Provide functional specification of glexec tests being implemented at SRCE     edit
ChadLaJoie 2010-02-03 Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536

this was done on the 2nd of February
This is done now at https://savannah.cern.ch/patch/?3536

2010-03-02 MaartenLitmaath edit


Status and results of the pilot service (by VOs and sites)

CMS (Claudio) sent a report via email :
-----Original Message-----
From: Claudio Grandi 
Sent: Wednesday, March 17, 2010 9:36 AM
To: Antonio Retico; Maarten Litmaath
Subject: No progress on ARGUS testing for CMS

Antonio, Maarten,
unfortunately I didn't get more news from Igor apart from a couple of failed attempts to submit pilots to CNAF (proxy expired because of other production activities going on at the Tier-1).

I fear that CMS cannot promise to do more on that as the component that is interested in using glexec on WNs is the US one and I need to rely on them for the testing but cannot control their priorities.

I apologize for that.

                                  Cheers, Claudio

ALICE (Patricia, via e-mail), nothing to report from the developers
ATLAS -Absent .

Antonio will contact them again to inform them that now there are two large installations available at KIT and CNAF

FZK (Angela, via e-mail) No user activity to report
IFIC (Javier, via e-mail): Absent
CESNET: Absent:
SRCE (Nikola via email)
-----Original Message-----
From: Nikola Garafolic [mailto:nikola.garafolic@srce.hr] 
Sent: Wednesday, March 17, 2010 1:26 PM
To: Antonio Retico
Subject: Re: [Reminder] pilot of glexec/Argus: check-point

Dear Antonio,

I am unable to participate in todays meeting. Colleague Emir is also 
unable, since he is on business trip.

Concerning our site, I still have not received any response that would 
help me with user mapping error, and could not map on other CE, I think 
cream CE.


INFN-CNAF: As reported above the installation is now complete. Antonio asked to list the corresponding end points in the pilot twiki page (layout section) and then close the task. Then he will open a service management task to be left open for the duration of the pilot. Giuseppe said that he will do it . He observed however that all the farm is now glexec enabled so there are not particular end-points apart from the production ones.

Maarten pointed out that no changes to the queue configuration have to be done for the time being in order to make the CMS glideins succeed. Long queues should be manageable in Condor via the well established means of proxy renewal or eventually long-life proxies (CMS is allowed to have up to 8 days proxies)

SWITCH: nothing to report
Operations: Antonio

Antonio observed that the new version of Argus (1.1) is getting to production (release scheduled by next week). He suggested that as no major developments are expected in the software for the near future, the pilot installations could start to refer to the production repository directly, releasing in this way the resources at CNAF. He will update the twiki page in that sense once the software will be available in the production repository.

Giuseppe observed that while this is true for Argus, the repository at CNAf is still needed for the client versions on the WNs as the current ones are not compatible with Argus 1.1

That observation triggered a discussion on the opportunity of releasing a server in production while the clients available can talk only to the earlier version. The conclusive decision (corroborated by an e-mail exchange with the developers and the certification after the meeting) was to delay the deployment of the Argus server and have it done with the next release together to the new version of the clients recently certified (see recommendation for release and deployment) in order to avoid potential misunderstanding by sites in production.

Status and results of the development (by developers)

not covered

Open Issues (by VOs, sites, deployment teams)


Recommendations for release and deployment

After the discussion above the following recommendations are made to the release team
  • hold on the release to production of patch #3536 (Argus server 1.1) excluding it from the release currently in preparation

Decision about termination/extension of the pilot

next check point: 17-Mar-2010 at 16.00 CET



