Pilot Follow-up Meeting Minutes Wed 17 Mar 2010

  • Date: Wed 17 Mar 2010
  • Agenda: 87203
  • Description: Pilot of glexec/Argus : Check-point
  • Chair: Antonio Retico
  • Home: PilotServiceArgus

Attendance

  • Operations/SA1: Antonio Retico, Mario David
  • Certification/SA3: Gianni Pucciani
  • Development/JRA1: Christoph Witzig, Andrea Ceccanti
  • SRCE: Apologies
  • Switch: Alessandro Usai
  • Cesnet: -
  • FZK: Troubles to connect (at a conference)
  • INFN-CNAF: Giuseppe Misurelli
  • IFIC: -
  • SAM/Nagios: -
  • CMS: Apologies
  • ATLAS: -
  • Alice: Apologies
  • LHCb: -
  • WLCG (Pilot Jobs Working Group): Maarten Litmaath

NOTE: For the second time the meeting on EVO gave problems (no audio for many of the participants); conference moved on the Alcatel system. This will be used for the next check point as well

Review of action items (tasks)

SA1/SA3 tasks

Status of the subtasks of TASK:12720(see them in the PPS tracker ) .

TASK:13188 (Installation at CESNET) : No news TASK:13599 (Instalaltion at CNAF): this is now done. See report later. The task will be closed and replaced by a "service management" task

other tasks


Assigned to Due date Description State Closed Notify  
GiuseppeMIsurelli 2010-02-19 Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs.

INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center.
Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site.

2010-02-19 MaartenLitmaath edit

Assigned to Due date Description State Closed Notify  
GianniPucciani 2010-02-19 Provide functional specification of glexec tests being implemented at SRCE     edit
ChadLaJoie 2010-02-03 Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536

this was done on the 2nd of February
This is done now at https://savannah.cern.ch/patch/?3536

2010-03-02 MaartenLitmaath edit

Notes:

Status and results of the pilot service (by VOs and sites)


CMS (Claudio) sent a report via email :
CMS made a couple of attempts to submit pilots to CNAF but failed (proxy expired because of other production activities going on at the Tier-1)


ALICE (Patricia, via e-mail), nothing to report from the developers
ATLAS -Absent .

Antonio will contact them again to inform them that now there are two large installations available at KIT and CNAF


FZK (Angela, via e-mail) No user activity to report
IFIC (Javier, via e-mail): Absent
CESNET: Absent:
SRCE (Nikola via email)
-----Original Message-----
From: Nikola Garafolic [mailto:nikola.garafolic@srce.hr] 
Sent: Wednesday, March 17, 2010 1:26 PM
To: Antonio Retico
Subject: Re: [Reminder] pilot of glexec/Argus: check-point

Dear Antonio,

I am unable to participate in today's meeting. Colleague Emir is also 
unable, since he is on business trip.

Concerning our site, I still have not received any response that would 
help me with user mapping error, and could not map on other CE, I think 
cream CE.

Regards,
Nikola

INFN-CNAF: As reported above the installation is now complete. Antonio asked to list the corresponding end points in the pilot twiki page (layout section) and then close the task. Then he will open a service management task to be left open for the duration of the pilot. Giuseppe said that he will do it . He observed however that all the farm is now glexec enabled so there are not particular end-points apart from the production ones.

Maarten pointed out that no changes to the queue configuration have to be done for the time being in order to make the CMS glideins succeed. Long queues should be manageable in Condor via the well established means of proxy renewal or eventually long-life proxies (CMS is allowed to have up to 8 days proxies)


SWITCH: nothing to report
Operations: Antonio

Antonio observed that the new version of Argus (1.1) is getting to production (release scheduled by next week). He suggested that as no major developments are expected in the software for the near future, the pilot installations could start to refer to the production repository directly, releasing in this way the resources at CNAF. He will update the twiki page in that sense once the software will be available in the production repository.

Giuseppe observed that while this is true for Argus, the repository at CNAf is still needed for the client versions on the WNs as the current ones are not compatible with Argus 1.1

That observation triggered a discussion on the opportunity of releasing a server in production while the clients available can talk only to the earlier version. The conclusive decision (corroborated by an e-mail exchange with the developers and the certification after the meeting) was to delay the deployment of the Argus server and have it done with the next release together to the new version of the clients recently certified (see recommendation for release and deployment) in order to avoid potential misunderstanding by sites in production.


Status and results of the development (by developers)

Christoph proposed that as some sites are now up and running with Argus, a test of the central OSCT banning list could be started in the pilot.

The proposal is accepted and Antonio suggested Christoph to send a request in that sense to the e-mail list with the configuration instructions for the sites to be followed.

Christoph (off-line): Please note that the global banning should be done with Argus 1.1. as the OSCT policy is in the Argus 1.1 format. We will distribute information next week how this can be enabled.

He points out that the issues between version 1.0 and 1.1 for global banning/glexec is a one time issue that will NOT occur any more in future version upgrades.

Open Issues (by VOs, sites, deployment teams)

none

Recommendations for release and deployment

After the discussion above the following recommendations are made to the release team
  • hold on the release to production of patch #3536 (Argus server 1.1) excluding it from the release currently in preparation
  • verify patch #3434 (glexec with glite-security-lcmaps-plugins-c-pep-1.0.3-1.sl5.x86_64) and include it in the next bundle for staged roll-out on 3.2

Antonio made a quick analysis of the status of the current releases and estimated that doing this way if no major issues occur the delivery of Argus + clients in the production repository could be expected not later than the 12th of April

Decision about termination/extension of the pilot

Antonio noticed that the work on glexec/Argus at the sites from the experiment is not very focused (the experiments have good reasons for that). The outcome doesn't seem worth the cost of maintaining the pilot service as it is now. So he proposes the following line for the next future.

  • extend the end date to half April (in order to make sure that the points related to the release to production are correctly sorted out)
  • ask the sites currently supporting the pilot to sign up as early adopters for glexec/Argus in the framework of the staged roll-out in order to keep them up-to-date with the latest production releases.
  • follow the progresses of the experiments at the different level using regular slots at the GDB (as it was done with CREAm after the end of the pilot)

Nobody objected to this idea during the meeting, It will be confirmed at the next check point (30th March) unless major exceptions from the experiments are raised.

Next check point: 30-Mar-2010 at 16.00 CET

AOB

none


Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2010-03-18 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback