WLCG-OSG-EGEE Ops' Minutes Mon 24 Nov 2008


A number of regions reported high load and apparently non optimal usage of resource by the biomed VO which is currently running some challenges. e.g. High load LFCs in France. Multiple WorkerNodes in the UK accessing identical files at remote SEs.



  • Asia Pacific ROC:
  • Central Europe ROC: Małgorzata Krakowian
  • OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen, Diana, Maria
  • French ROC: Osman, Piere
  • German/Swiss ROC: Angela Poschlad, Wen Mei
  • Italian ROC:
  • Northern Europe ROC:
  • Russian ROC: Lev Sharardin
  • South East Europe ROC: Kai
  • South West Europe ROC: Kostas
  • UK/Ireland ROC: Jeremy Coles
  • GGUS: Torsten
  • GOCDB: Gilles Mathieu


  • WLCG Service Cordination: Harry Renshall, Jamie Shiers

WLCG Tier 1 Sites

  • ASGC:
  • CERN site: Sophie
  • FNAL: Catalin Dumitrescu
  • FZK: Angela Poschlad
  • IN2P3: Pierre
  • INFN: Alessandro, Alfredo
  • NDGF: Vera
  • PIC: Kai
  • RAL: Gareth Smith
  • SARA/NIKHEF: Absent
  • TRIUMF: Absent

LHC Experiments

  • ATLAS: Alessandro di Girolamo
  • LHCb: Roberto Santinelli
  • CMS: Daniele Bonacorsi
  • ALICE: absent

Feedback on Last Week's Minutes

Sites were asked to enter their names in the webinterface when call in. Nick will add the link to the webinterface in the minutes and agenda.

Assigned to Due date Description State Closed Notify  
NicholasThackray 2008-12-02 Nick to add hyperlink to agenda and minutes template for the Alcatel meeting call back.

Update: link was always there, but now uses a font for the blind.

2008-12-02 edit

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From Central Europe Asia/Pacific

  • GGUS:42124, WEIZMANN-LCG2 working on fixing APEL with APEL supporters. Since work is active should be reduced from this escalation by the current COD.

PPS Reports

  • OpsMeetingPps
  • Hydra testing , ROCs are asked if they wish to help with certifying, testing or using HYDRA.

EGEE Items From ROC Reports

ROC France
INFORMATION IN2P3-CC: Central LFC for Biomed VO is currently overloaded due to a growth of Biomed activity. Even if the hardware was upgraded in emergency on Friday the problem is still there. The problem might be due to some limitations in the number of simultaneous connections between the LFC and the Oracle DB. We will contact LFC support to find a good (and scalable) solution. Sorry for the inconvenience.
Comments from Pierre
using a bad method against the LFC.
A Biomed user's activity has caused site instabilities by repeatedly trasfering the same 2.8GB file to WNs across EGEE from a single UK site SE. After ticketing the user they produced more replicas but there is concern about this data distribution model and the bandwidth stress. For a related GGUS ticket see: GGUS:43489. The user responded quickly. We may be seeing signs of the limit of the standard submission approach/model: "We are submitting theses jobs with the native EGEE command glite-wms-job-submit . These grid jobs are then accessing the 2.8GB data file through the command lcg-cp . So we didn't decide neither where the jobs are scheduled nor which file-replicate is used by these jobs. The EGEE middleware is deciding." Because of the I/O limitations the Biomed jobs are often quite inefficient.
Comments from Jeremy
Results given for one site but in fact all UK sites are being hit hard. e.g Lots of requests for same files. User has adjusted but more could be done.
Comments from Nick
We can contact to VO to have them change their habbits.
Comments from Jeremy
VO may just say the middleware is deficient.
Comments from Nick
Contact BIOMED VO, Action on Nick.
UKI-NORTHGRID-LANCS-HEP saw a problem with a recent WN update: GGUS:43473. The ticket seems to bounce around without anybody really knowing how to help! The point to note is that it is likely a site problem but the site/ROC has struggled to understand the problem as it (looks like it) requires middleware expert help. The site will try a reinstall with 64-bit gLite to try to remove the 64/32-bit incompatibilities but no real understanding of the problem has happened.
Comment from SteveTraylen
Steve will look some more at GGUS:43473.
Site availability does not take into account SRM V2 systems. As a result the overall RAL availability is dependent on a dcache service which is no longer considered a front line service. SRM V2 not being in the overall availability figures is a problem with the monitoring not the site. Update The WLCG Management Board decided on Tuesday to use SRMv2 in the availability calculations as of December (in lieu of the SRMv1 tests). This will be discussed with the EGEE ROC Managers to ask them to ratify this.
Comment form JohnShade
WLCG as of wednesday will consider the SRMv2 tests as the important one, should we do the same for EGEE (See John's email, and stick it in ).
On the topic of SAM, has there been any progress on centrally identifying common problems seen in SAM? On 19th November from 18:00-21:00 UK time a number of sites saw the same (top-level BDII?) problem. It would save much time if these errors could be automatically flagged as possibly due to an offsite problem.
Comment from John
We shall try and send broadcasts in such situations.
Comment from Jeremy
This would help to avoid duplication of wasted effort.

gLite Release News

  • OpsMeetingGliteReleases
  • Last Week
    • New VOMS.
    • New SunGridEngine job managers.
    • New dCache
  • Next Week
    • New FTS, includes fixes for bouncycastle updates.
    • Later their will be PATCH:2417 with a similar fix cream CE.

Java Bouncy Castle problems

Extract from broadcast: A few days ago jpackage updated bouncycastle to version 1.41. This version causes problems for several glite nodes as it places the jars in a new directory. The glite developers are currently working on patches to solve this issue. For the time being please make sure that your site DOES NOT UPGRADE to bouncycastle 1.41. Node types affected by this problem:

  • glite-UI
  • glite-MON
  • glite-CREAM
  • glite-FTS_oracle
  • glite-WN
  • glite-TORQUE_utils
  • glite-LSF_utils
  • glite-CONDOR_utils
  • glite-VOMS_mysql
  • glite-VOMS_oracle
  • glite-VOBOX
  • lcg-CE

WLCG Items

Upcoming WLCG Service Interventions

  • Interventions
  • Monday December 1st , maybe a VOMS intervention for LHC VOs. Broadcast will be sent if this is going to happen.
    • This is going to happen, transparent intervention now entered in the GOCDB.

ATLAS Service

BNL<->CNAF network problems solved. Took a couple of weeks.

ALICE Service

CMS Service

Everything quite smooth, ramping down from exercises of the past few weeks.

LHCb Service


WLCG Service Coordination

OSG Items

All recent in the last week and no tickets at escalation stage.

Action Items

Newly Created Action Items

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

Next Meeting

The next meeting will be Monday, dd mmm 2007 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610

These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2008-12-02 - JohnShade
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback