Page created from template WlcgOsgEgeeOpsMinutesTemplate

Having created a new minutes page correct the following information. The index page will be updated automatically. Change the indico id and who ever was the chair. -->

WLCG-OSG-EGEE Ops' Minutes Monday 13th October 2008

Summary

- Process for making old versions of middleware services obsolete is under review. Document is attached to the agenda. Comments welcome. - Alice requested that all LCG RBs being run for them are replaced by gLite 3.1 WMSs - Alice asked for as many sites as possible to deploy the CREAM CE now. Alice will start using them.

Attendance

EGEE

Asia Pacific ROC: Absent
Central Europe ROC: Malgorzata Krakowian
OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen, Maite Barroso
French ROC: Absent
German/Swiss ROC: Absent
Italian ROC: Absent
Northern Europe ROC: Ron Trompert
Russian ROC: Absent
South East Europe ROC: Kostas Koumantaros
South West Europe ROC: Kai Neuffer
UK/Ireland ROC: Jeremy Coles
GGUS: Torsten Antoni, Guenter Grien

WLCG

WLCG Service Cordination: Harry Renshall

WLCG Tier 1 Sites

ASGC: Absent
BNL: Absent
CERN site: Absent
FNAL: Absent
FZK: Absent
IN2P3: Absent
INFN: Absent
NDGF: Absent
PIC: Kai
RAL: Derek Ross
SARA/NIKHEF: Ron Trompert
TRIUMF: Absent

LHC Experiments

ATLAS: Alessandro di Girolamo
LHCb: Roberto Santinelli
CMS: Absent
ALICE: Apologies received

Feedback on Last Week's Minutes

None was given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From ROC UK/I ROC Russia
To ROC Asia Pacific ROC Central Europe

  • Nothing to report.

PPS Reports

PPS Report & Issues

  • The issues with the BDII observed after gLite 3.1 Update 33 were analysed by Laurence Field.
    • The issue reported by CERN ROC, tracked with BUG:42799, was narrowed down to a race condition which could be solved by doubling the value of the configuration variable "GIP_TIMEOUT" in YAIM. (Similar configuration may be needed for other BDIIs used for site certification purposes, which may include unresponsive sites.
    • Another issue reported in BUG:42799 is addressed by PATCH:2519 which will be released with gLite3.1 Update34, scheduled for Thursday 15th

gLite Release News
Now in Production
  • 2008-10-10: gLite 3.1 Update 33 was released to production
    The update contains:
    • Proxy Server: The service now publishes the glite release version
    • lcg-CE: New updates on LCG CE improvement packages
    • glite-!BDII: Updated version that fixes a number of outstanding issues, improves the configuration and provides some additional features. This update affects ALL service nodes
    • Clients (UI, WN, VOBOX): removal of obsoleted DM packages
    • New version of YAIM core with a number of bug fixes

Release notes in: http://glite.web.cern.ch/glite/packages/R3.1/updates.asp. Immediately after the release a defect in the information published by the BDII was found at the Cern ROC. The issue, which is apparently due to race condition, is currently under investigation and, waiting for the cause to be narrowed down by the analysts, the service has been removed from the production repository and is not available to sites for the time being. More details in the Broadcast sent to the sites: https://cic.gridops.org/index.php?section=cod&page=broadcastretrieval&step=2&typeb=C&idbroadcast=36541

Now in PPS
  • 2008-10-07: after deployment testing PPS sites are now upgrading to gLite 3.1 PPS Update 37. This update contains:
    • LB: A number of non-critical bug fixes (PATCH:1803 , PATCH:2380)
    • glite-UI: glite-brokerinfo added
    • glite-BDII: obsoleted lcg-info packages removed
    • glite-VOMS (oracle and mysql): obsoleted lcg-info packages removed
    • glite-PX:
      • The information provider now publishes much richer information about how the MyProxy? service can be used.
      • Obsoleted lcg-info packages removed

Release notes in https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update37. Deployment test reports in: http://www.cern.ch/pps/index.php?dir=./release/testreports/gLite3.1.0/gLite3.1.0-PPS-UPDATE37/

Soon in Production
  • 2008-10-09: Release of gLite 3.1 Update 34 to production in preparation.
    The update, to be released next Thursday Friday, will contain:
    • LB: A number of non-critical bug fixes (PATCH:1803 , PATCH:2380)
    • glite-UI: glite-brokerinfo added
    • glite-BDII: obsoleted lcg-info packages removed
    • glite-VOMS (oracle and mysql): obsoleted lcg-info packages removed
    • FTS for SL4:

EGEE Items From ROC Reports

  • UKI: No data available in ROC (or site) report(s) for the failures from SAM framework section.
    Osman: Discussed with SAM and it's an Oracle problem (now fixed). Not clear why this suddenly became very slow. DB team looking into it.

  • Jeremy: Also seeing problems with Down Time making it into reports. Will submit GGSU ticket.

gLite 3.1 update 33, BDII

Details on the changes of gLite 3.1 update 33 for the BDII.
Dear colleagues, the status of gLite 3.1 Update 33 is as follows:
  1. The glite-BDII (top-level BDII) meta-rpm for Update 33 was removed on Friday. At the same time the previous meta-rpm was changed to require exactly the previous version (3.9.1-5) of the BDII RPM. Sites that already upgraded their top-level BDIIs before these changes may want to downgrade (but see below). Resource and site BDIIs were not seen to display the instabilities described in Savannah bug BUG:42727, therefore the meta-rpms for other node types have not been changed. The top-level BDII instability is being looked into with high priority.
  2. The "chown" problem reported by Michel Jouvin does not affect sites that use YAIM for their configurations. A fix for this problem has been coded and a new BDII version is being certified. It is expected to be released to the production system this week

gLite 3.0 services to be obsoleted

A reminder to all that the following middleware services will be made obsolete by next week unless any objections are received!
  • gLite 3.0 glite-SE_classic
  • gLite 3.0 glite-VOBOX
  • gLite 3.0 glite-WMS
  • gLite 3.0 glite-PX
  • gLite 3.0 glite-MON

Proposed process for removing SA1 support for old gLite services

Nick presented a process for making old middleware services obsolete. It will be reveiwed by the ROC managers and when approved it will be implemented immediately. Everyone is invited to give feedback (the sooner the better). The document is attached to the agenda.

WLCG Items

WLCG issues coming from ROC reports

  • France: TEAM/ALARM tickets for T1s: how LHC expirements make their choice between these two type of tickets?
    ATLAS:
    • ALARM tickets are for problems concerning T0 (mainly problem at T1 blocking data acceptance from T0)
    • TEAM tickets for all other problems of importance (mainly T1<->T2 transfers for the moment) Currently in discussion: if the problem is not acknowledged by the site before 2PM the following day, then an ALARM ticket is sent.
      Could CMS, ALICE and LHCb explicit the range of use of each tickets?

LHCb: Team tickets are normal tickets that any of the LHCb "team" can see and act on.
ATLAS: The above statement is correct for ATLAS.
The other experiments were not present.

  • Maite: Will ask for clarification on the usage of these tickets at next week's meeting.

Status of the WMS for Alice

Alice wants to fully replace the RBs and only use the WMS in production at all sites. In Alice's computing model it is recommended (not mandatory) that sites provide a local WMS, though they understand that for some T2 sites this can be very difficult. Alice would like to requests to T1 sites and in general to all sites providing RBs to Alice, to migrate to the WMS. Specially the first target sites are NIKHEF and CCIN2P3.
  • NIKHEF : is providing 2 RBs but no WMS yet
  • IN2P3: no WMS there supporting Alice. In France there are only 2 at T2 sites: datagrid.cea.fr y lal.in2p3.fr. They would like to request IN2P3 to also provide one.

  • !IN2P3 (Pierre): No plans yet but will discuss internally.
  • SARA (Ron): Shouldn't be a problem.
  • Pierre: Do Alice want a dedicated WMS?
    Maite: We will check, but in principle no. Should be OK to use an already existing WMS.

CREAM CE for Alice (& PPS pilot service)

Alice would like to start using the CREAM CE in production. To do this, Alice has the following requirements on sites:
  • Keep current LCG CE and install CREAM CE on another box.
  • Install a 2nd VObox to point to the CREAM CE. VOBox can be in a virtual machine if the site is short of boxes.
  • Point the CREAM CE to the standard Alice production queue.
  • Need a GridFTP server somewhere on the site.

This request also presents another opportunity: Any sites that wish to support Alice with the CREAM CE could also support the testing of the new ICE enabled WMS, simply by installing the latest version of the CREAM CE (available in the PPS repositories) rather then the version currently in the production repositories. Sites wishing to do this would also need to configure CMS as a VO on their site - no other action is needed on the part of the site.

Any sites who are interested should contact occ-grid-support@cernNOSPAMPLEASE.ch. Installation instructions for CREAM CE will be provided.

Alice would like to ask that all LCG tier-1s (which support the Alice VO) contribute to this task. Alice would also like to invite as many tier-2 sites as possible to join in.

Upcoming WLCG Service Interventions

Many interventions scheduled this week. Please consult the URLs above for details.

ATLAS Service

The site is LPNHE (part of GRIF):
It is in downtime (https://goc.gridops.org/downtime/list?id=10455542) but no rss feed has been sent about it.
feed://cic.gridops.org/index_rssflow.php?service=downtime_vo&vo=atlas This could be useful for the CIC people to tune the rss feed, that is the way in which the experiments are retrieving the infos about the downtimes.

  • This will be investigated by Osman (CIC Portal).

ALICE Service

No report.

CMS Service

No report.

LHCb Service

  • Any comments from sites concerning last week request about gridmap file for LHCb? If not I will proceed by formulating an EGEE broadcast for all sites to implement this "safe" mapping in case of VOMS mapping failure.

No comments from the sites or ROCs. Please can everyone read the minutes from last week's minutes on this issue!!

  • EGEE downtime announcement procedure:
    1. Announcement of scheduled downtime with a mail "Announcement" at least 24h in advance as in the MoU.
    2. Start of downtime (scheduled and unscheduled) as of the time when it starts with a mail "Start" (with correct time!)
    3. End of downtime: mail"End" (with correct time)

ACTION: OCC will put this enhancement request into the GOCDB and CIC Portal.

  • (From Philippe) In the last couple of days we tend to receive update notifications from GGUS for tickets that according to the web page were not updated at all (ex #41707, last update was October 3rd but we got mails also recently). Why this happens?
Maria Dimou: We are working on this.
Diana Bosio: Could be a bug. We are looking into it.

WLCG Service Coordination

Nothing to report this week.

OSG Items

No points to raise.

Action Items

Newly Created Action Items

Assigned to Due date Description State Closed Notify  
Main.OCC 2008-11-17 OCC to put an enhancement request into the GOCDB and CIC Portal for the following:
EGEE downtime announcement procedure:
1. Announcement of scheduled downtime with a mail "Announcement" at least 24h in advance as in the MoU.
2. Start of downtime (scheduled and unscheduled) as of the time when it starts with a mail "Start" (with correct time!)
3. End of downtime: mail"End" (with correct time)
Update 3 Nov: OCC has entered this enhancement request into the GOC DB "shopping list" in Savannah (https://savannah.cern.ch/support/?105977).
Close the item.
2008-11-07 edit

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

Next Meeting

The next meeting will be Monday, 20th October 2008 at 16:00 UTC+.

  • Attendees can join from 15:45 UTC+2 onwards.
  • The meeting will start promptly at 16:00 UTC+2.
  • The WLCG section will start at the fixed time of 16:30 UTC+2.
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2008-11-07 - NickThackray
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback