Pilot Follow-up Meeting Minutes Tue 02 Feb 2010

  • Date: Tue 02 Feb 2010
  • Agenda: 83267
  • Description: Pilot of glexec/Argus : Check-point
  • Chair: Antonio Retico
  • Home: PilotServiceArgus

Attendance

  • Operations/SA1: Antonio Retico
  • Certification/SA3: Gianni Pucciani
  • Development/JRA1: Chad La Joie
  • SRCE: Nikola Garafolic
  • Switch: Alessandro Usai
  • Cesnet: -
  • FZK: Apologise
  • INFN-CNAF: Giuseppe Misurelli
  • SAM/Nagios: -
  • CMS: Claudio Grandi
  • ATLAS:
  • Alice: -
  • LHCb: -
  • WLCG (Pilot Jobs Working Group): Maarten Litmaath

Review of action items (tasks)

SA1/SA3 tasks

Status of the subtasks of TASK:12720(see them in the PPS tracker ) .

other tasks


Assigned to Due date Description State Closed Notify  
GiuseppeMIsurelli 2010-02-19 Provide a report describing the issues being faced by CNAF for the installation of glexec on the WNs.

INFN-T1 is experiencing a problem on the stability of GPFS interacting with the WN on demand system adopted locally into the resource center.
Since they decided to provide virtual WNs for the pilot, the issue is affecting consequently the deployment of the glexec WN component into the site.

2010-02-19 MaartenLitmaath edit

Assigned to Due date Description State Closed Notify  
GianniPucciani 2010-02-19 Provide functional specification of glexec tests being implemented at SRCE     edit
ChadLaJoie 2010-02-03 Provide instructions on how to preserve local policies during the upgrade of the Argus server to a newer version both in an e-mail to the sites and in the PATCH:3536

this was done on the 2nd of February
This is done now at https://savannah.cern.ch/patch/?3536

2010-03-02 MaartenLitmaath edit

Notes:

Status and results of the pilot service (by VOs and sites)


CNAF T1: Giuseppe: for the installation of glexec t the T1 is suspended we are waiting for the upgrade of the Argus server to the new version 1.1 (PATCH:3536 currently in certification) > once that done we will proceed. We hope to be ready by the end of this week.
  • Antonio: a mirror at CNAF should be done for the new patch
  • Giuseppe: is it really needed? now all the configurations are in Etics
  • Antonio: continue mirroring because the patch holds extra information (namely release notes and installation instructions) which we want to build up during the pilot life cycle
  • Gianni: with the new release process the certification repository is not there anymore so the mirroring must be done using ETICS as a source anyway
  • Antonio: it's not important where the SW is taken from but that the documentation (the PATCH is correctly filed)
  • an e-mail was sent to Danilo to organise the mirroring of PATCH:3536
  • CNAF-T1 will upgrade first, the others will follow
  • Chad offered to write an e-mail for the sites in order to explain how to dump and preserve during the upgrade local policies which sites may have set-up. This is needed only for this particular update and not as a general practice. Gianni pointed out that this should be written in the release notes of the patch. Chad will do both things (ACTION) (Note: Chad updated the release notes soon after the meeting).
  • Andrea Ceccanti (off-line, after the meeting) specifies that update is needed also on the glexec side (WN) . the corresponding meta-packages should be ready by Thursday

The deadline for the installation at the T1 is set to the 10th of February


SRCE: Nikola reported that the installation is finished. He will fill the list of the services in https://twiki.cern.ch/twiki/bin/view/EGEE/PilotServiceArgus#Pilot_Layout , then he will close the installation task and the "operations" task will be opened
FZK: (report sent by Angela) The installation at GridKa is ready but I definitely need all the used roles for the pilot. For now the following roles have permission to use gLExec:
  • /cms/Role=pilot
  • /atlas/Role=pilot
  • /atlas/Role=production
  • /lhcb/Role=pilot
I guess that usatlas wants also their role to be accepted. What about alice? Do they have a dedicated VOMS role?

The new VOBOX for Alice is ready: test-mw-fzk.gridka.de. (to be added to the pilot page).

More information about the reloading of policies would be good. If it has really no impact on the performance the default value should be decreased to at least 30 minutes. A discussion followed on the opportunity to decrease the default refresh time

  • Maarten: actually the 4 hours limit was set perhaps looking at what's currently done with the CRLs update (hours) but with this new system we could aim to do better
  • Chad: a reload of the policies from the PEP now requires ~25/30 ms which may grow to one second considering network overhead. I don't see any problems coming in using a refresh time > 15 min . However I would like an indication to be given in this respect by the JSPG
  • Maarten: one concern maybe the load on the centralised JSPG PEP server if the rate is too high.
  • The decision was made to open a bug report (Angela will do it, ACTION) and use it for reference for the discussion.

CESNET: Absent: Chad observed that from the last e-mails they sent to the list is not entirely clear whether CESNET is really implementing the scenario they are supposed to. In particular he was confused by questions regarding the lcg-CE which Argus doesn't support. Antonio will follow-up and try and clarify.
INFN-CNAF: nothing to report
SWITCH: nothing to report
Operations: Antonio

PATCH:3076 "New release of Argus service SL5/x86_64" was authorised today for deployment to production (should happen either this Thursday or next Monday). At that point the nodes could be pointed to the production repository. This would probably be pointless though because by that date however the new version of Argus 1.1 (PATCH:3536) is expected for the pilot service.

Two bugs were opened to request the enabling of Glue 2.0 Publishing for SCAS and ARGUS back-ends.
specifically for Argus it's BUG:62206
The workaround referenced in the bug as been inserted among the deployment instructions on the twiki page. it should be applied by our sites. In fact CMS has repeatedly requested a mean to detect glexec instances in production (currently not available). Giving a hint on where are the back-ends would already be something.

  • Antonio (answering Giuseppe). This corresponds to a best practice currently in use for all the services packaged in glite. Thie information is in general useful for monitoring purposes. the information which the Argus server would provide is limited essentially to something like "I am an Argus server", so it shouldn't represent a concern for security.
  • Maarten remarked that there are different opinions around about what the grid services should publish in the glue schema. The drawback of all services publishing by default is of course that the weight of the information in the IS grows without a clear use case of every records. In particular he reported that there is a discussion in progress now in the technical working group about how the glexec capability should be represented in the information system (another option is e.g. to tag this information in the CE runtimeEnvironment). This topic was bundled with others in a questionnaire sent to the sites. These discussions include other topics as well (e.g. the nature and number of roles which should be configured by a site in order to fully support a VO ) Maarten is currently working to a table summarising all the recommendations for future requirements
  • for the time being the decision is made to ask all the sites in the pilot to run the workaround on their Argus server in order to allow them to be published (and counted). An analogous action will be promoted for SCAS at the level of the operations meeting. Antonio will follow-up both things.

CMS: Claudio confirmed that CMS is planning to start integration tests at mid-February against CNAF-T1 and all the sites that will be offering glexec capability at that point (also based on SCAS and GUMS). Actually this work is already in progress at some OSG sites.
Alice: Absent. Maarten witnessed that Alice is active on the adaptation of their code because he has received requests for help on this topic by the developers. (off-line, after the meeting). Upon request from Antonio Patricia confirmed to be currently studying the documentation for the integration of glexec calls within the Alice software. In that respect she complained for the absence of a real user guide for glexec. She willl eventually post to the pilot e-mail list further requests for support.

Status and results of the development (by developers)

not covered

Open Issues (by VOs, sites, deployment teams)

List of Open bugs and relevant decisions

  • All sites will be soon requested to upgrade to the new version of Argus PATCH:3536 . CANF-T1 will be the first, the other will follow
  • All sites requested to apply the workaround in BUG:62206 in order for the Argus servers to star being published in the information system.

Recommendations for release and deployment

none

Decision about termination/extension of the pilot

next check point: 16-Feb-2010 at 14.00 CET

AOB

none


Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2010-02-02 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback