WLCG-OSG-EGEE Ops' Minutes Fri 23 Feb 2009

Summary

  • SL5 64 bit worker node has been running at CERN for some time without problems.
    This will be packaged and released as soon as possibe. 32 bit versions of SL5 services
    will only be worked on upon request and if there is sufficient effort.
    The preference is for the SL5 version of the middleware to be 64 bit only.

  • With BNL and FNAL completing their move out of the EGEE infrastructure and into
    the OSG infrastructure, a meeting has been set up to finalize the details of how
    trouble tickets will be dealt with.

Attendance

EGEE

  • Asia Pacific ROC: ShuTing Liao
  • Central Europe ROC: Malgorzata Krakowian
  • OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen, Maite Barosso
  • French ROC: Pierre Girard, Osman Aidel, Rolf Rummler
  • German/Swiss ROC: Angela Poschlad
  • Italian ROC: Alessandro Cavalli
  • Northern Europe ROC: Ron Trompert
  • Russian ROC: Lev Shamardin, Victor Edneral
  • South East Europe ROC: Kostas Koumantaros, Ioannis Liabotis
  • South West Europe ROC: Kai Neuffer, Gonzalo Merino
  • UK/Ireland ROC: Jeremy Coles
  • GGUS: Torsten Antoni
  • GOCDB: Gilles Mathieu

WLCG

  • WLCG Service Coordination: Harry Renshall, Jamie Shiers

WLCG Tier 1 Sites

  • ASGC: ShuTing Liao
  • BNL: Absent
  • CERN site: Ewan Roche
  • FNAL: Joe Kaiser
  • FZK: Angela Poschlad
  • IN2P3: Pierre Girard
  • INFN: Alessandro, Alfredo
  • NDGF: Leif
  • PIC: Gonzalo
  • RAL: Gareth Smith
  • SARA/NIKHEF: Absent
  • TRIUMF: Absent

LHC Experiments

  • ATLAS: Alessandro di Girolamo
  • LHCb: Roberto Santinelli
  • CMS: absent
  • ALICE: absent

Feedback on Last Week's Minutes

None was given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From ROC DECH ROC SouthEast Europe
To ROC CERN ROC France

  • Report from SouthEast Europe:
    • Problems Encountered during shift:
      • Again new alarms for nodes which have already been in SD
        This has been fixed during the week.
      • The new version of the https://lcg-sam.cern.ch:8443/sam/sam.py?... looks more attractive but unfortunately it is not so clear and easy to deal with those cases when an alarm is in ERROR but the last SAM test show that the corresponding service is still OK.
        Please submit requests for changes through GGUS.
  • Report from DECH Europe:
    • Problems Encountered during shift:
      • GGUS ticket: GGUS:46448. Site USCMS-FNAL-WC1 is an OSG site. Alarms should not be raised. But it happened this week when they started to publish their resources in a resource group. Seems to be fixed now.
      • GGUS ticket: GGUS:46448. The alarm FTS-infosites on fts-t1import.cern.ch is failing due to the fact that the middleware does not foresee the current production scenario in use at CERN. Developers are aware, a bug has been opened and it will be like this until the bug is fixed (Savannah bug #46083).
        STEVE: This is now fixed properly. If any more problems are seen, raise another ticket.

Sites Considered For Suspension
None this week.

PPS Reports and Issues

Please find Issues from EGEE ROCs and general info in: https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps

Pilot service of WMS3.1: in progress

gLite Release News

gLite 3.1 PPS Update 44 went through deployment test and it is now being installed by the remaining PPS sites. The update contains:
  • New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
    1. Short term proxy renewal solution in CREAM based CE
    2. fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice
  • [YAIM] glite-yaim-core 4.0.6 with many bug fixes (PATCH:2636)(PATCH:2697)
  • [BDII] Default DB cache size reduced to 50Mb(PATCH:2679) for x86_64
  • [WN] New glite-wn-info command designed to be executed on the WN by a job submitter. It returns information about that worker node to be used in a grid context (PATCH:2757 ; PATCH:2758)

Release of gLite 3.1 Update 41 to production in preparation The update, scheduled for the 25th of February will contain:

  • update to WMS 3.1 with numerous bug fixes
    ROBERTO (LHCb): Will CERN-PROD deploy this update? If so, when?
    EWAN: Usually within the week following the release to the production repositories.
  • New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
    1. Short term proxy renewal solution in CREAM based CE
    2. fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice

EGEE Items From ROC Reports

  • SWE ROC: We would like to know the status of the new gLite "authorization framework", the "framework to identify local T2 users" at a site.
    This will be dealt with next week.
  • UKI ROC: Got a GGUS ticket (GGUS:46475) but believe tickets should not apply. Still waiting for feedback on this. This CE is flagged as "Not in Production" in the GOCDB. Monitoring is turned on for troubleshooting purposes during commissioning. Our understanding is that GGUS ticketing does not apply in these circumstances.
    JOHN S.: Ideally SAM should not raise alarms in this case. Will look into honouring the "not in production" flag. Will check if a bug already exists and open one if not. ACTION on John

Grid Service Interventions.

See the links on the agenda page.

SL5

OLIVER: Version of 64 bit SL5 WN has been running under production conditions at CERN. This "testing" is now finished and the patch will be built and put into certification. It will be released as gLite 3.2.
In gLite 3.2 on SL5, 64 bit services/clients will be prioritized. 32 bit will be done where there is a need and resources available to do it.
The LHC experiments have declared (in the LCG Architects Forum) that they want 64 bit SL5 rolled out as soon as possible.
Antonio: Has any formal meeting been held with the experiments to get final sign off?
Oliver: No, although the LCG Architects Forum could be seen as such.
Antonio: I will do this through the PPS pilot so we can close the pilot.

WLCG Items

WLCG issues coming from ROC reports

  1. DECH: FZK-LCG2: New instance for FTS (2.1) is in production. The two instances will run in parallel for some time until all experiments have switched to the new instance. The new Service name is fts-fzk.gridka.de
  2. DECH: CMS User with voms group /cms/dcms cannot run with at CERN and various other sites, see https://gus.fzk.de/ws/ticket_info.php?ticket=46019. Not supporting this group and probably a lot of other groups makes no sense or the groups are waste. In my opinion, when one site supports a VO it should use wildcards to ensure the support for all users proxies. If it does not use wildcards the queues in the information system should be published only for the supported groups and roles. Is there a standard way how to deal with this situation? Or is it possible to exclude special group or roles in the information system (blacklist)?
    Steve T. is looking into this and all information will be put into the ticket.
    EWAN: Might not be technically possible to do this for the WMS. Looking in to this.

Upcoming WLCG Service Interventions

Consult the links on the agenda page.

WLCG Service Coordination

Nothing to add.

ATLAS Service

Nothing to add.

ALICE Service

Nothing to add.

CMS Service

Nothing to add.

LHCb Service

Nothing to add.

OSG Items

Discussion of open tickets for OSG
  • Maria: GGUS:45094. Felipe Silva is very unresponsive.
    Rob: Added some more contact names to the ticket.

  • The date for the meeting to discuss streamlining of tickets to FNAL and BNL is now set up.

Newly Created Action Items

Assigned to Due date Description State Closed Notify  
Main.!JohnShade 2009-03-09 When an individual service at a site is marked as "not in production" in the GOCDB, but the site is "in production", SAM continues to test the service. This is not the intended functionality. Check if there is a bug outstanding on this already, and if not, create one.

Update 27/2/09: It turns out that GridView does not synchronize on that particular GOCDB field, so it isn't available to SAM. The recommended workaround is to create a scheduled downtime - tests will still run, but no tickets will be raised. The requested functionality will be in the new Aggregated Topology Provider, and GOCDB will have a production attribute associated with each service.

Update 9/3/09 - closing during meeting

2009-03-10 edit

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

AOB

  • JOHN S: We would like to review this meeting so will be sending out a questionaire to get everybody's feedback.

Next Meeting

The next meeting will be Monday, 1st March 2009 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0148141


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2009-03-10 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback