Week of 130513

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: (MariaD, AndreaS, Maarten, Felix, Marc (LHCb), Ben Jones (PES), Luca (DB), Dave (Dashboards), Xavi (Storage))
  • remote: (Xavier (KIT), John Kelly (RAL), Tomasso Boccali (CMS), Paolo Franchini (CNAF), Rob Quick (OSG), Onno (NL_T1))

Experiments round table:

Nobody present or connected!
    • Central services
      • NTR
    • T0
      • CERN_8CORE GGUS:93955 Missing release cannot be installed because the CERN_8CORE resource is associated to a strange (CMS?) CE: creamtest001.cern.ch - sftcms

  • CMS reports (raw view) -
    • In general a calm happy weekend.
    • Preparing reprocessing of the 2011 data at CERN; still waiting for validated conditions.
    • Reprocessing of MC occurring at T1 at low level; new MC at T2s but at a quite low rate.
    • CMS only Tier 2's are encouraged to make the transition to SL6 -- Sites shared with other VO's need to wait until June 1 to do this.
    • Problems pending: GGUS:93965 still there (SL6 WNs with broken MW environment). Ticket says solution is found, and waits to be deployed in Production. It is quite urgent on our side, since currently CERN is blacklisted for analysis - Given the update to the ticket, should be solved by now.
    • We had an issue with Site Status Board, not showing any service as critical. Ticket is Savannah:137445 and seems understood now.
    • Central PhEDEx agents (only in DEBUG) down in the weekend, Site Readiness will be corrected by hand. Agents back online since late morning.
The problem with VO=cms not being defined is now solved, although GGUS:93965 is still in status "In progress" as these notes are being written. CMS will verify, if solution is satisfactory.

  • ALICE -
    • CNAF: high job throughput and efficiency during the weekend, after switch to Torrent Fri late afternoon, thanks!

  • LHCb reports (raw view) -
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0: (GGUS:93975) voms-proxy-init does not work on lxplus(sl6)+cvmfs. Need to update the certs in CVMFS
    • T1:
      • GridKa - Two broken Tape Libraries. Experts aware. Xavier (GridKa) confirmed the 1st tape lib is now back and they are working on the 2nd one.
      • SARA - Sporadic failures on CEs in SUM monitor with brokerhelper: No compatible Resources. Known BDII problems maybe? (GGUS:94016). Onno (SARA) confirmed they are working on the problem.
    • Other : (GGUS:93966) Request to GOCDB to allow fine-grained SE reporting by sites. Current suggestion is to have different end points for Tape and Disk. Maarten suggested they bring this to the next WLCG Ops Coord Meeting this Thursday.

Sites / Services round table:

  • ASGC: The fts database migration is complete and the service is back.
  • BNL: not connected
  • IN2P3: ntr
  • PIC: not connected
  • FNAL: not connected
  • OSG: ntr
  • NL_T1: nothing to add
  • CNAF: ntr
  • NDGF: not connected
  • RAL: ntr
  • KIT: nothing to add

  • CERN-PROD: SLC6 capacity at CERN is now 7000 slots (about 70 kHS06). Resolving issues as they are reported. Major issue of 1.5 GB virtual memory limit: fix is rolling out today. Capacity available to Grid via ce208.cern.ch.

  • GGUS: Slides for tomorrow's MB are attached to this page and file ggus-tickets.xls is up-to-date and attached to twiki WLCGOperationsMeetings.

AOB:

Thursday

Attendance:

  • local: (MariaD, Maarten, Felix, Marc (LHCb), David, Jarka (ATLAS))
  • remote: (Xavier (KIT), Gareth (RAL), David Mason (CMS), Paolo Franchini (CNAF), Rob Quick (OSG), Ronald (NL_T1), Rolf (IN2P3), Pepe (PIC))

Experiments round table:

  • ATLAS reports (raw view) -
    • Central services
      • NTR
    • T0
      • NTR
      • under investigation within ATLAS CERN_8CORE GGUS:93955 Missing release cannot be installed because the CERN_8CORE resource is associated to a strange (CMS?) CE: creamtest001.cern.ch - sftcms

  • CMS reports (raw view) -
    • Generally calm -- some minor issues to report
    • GGUS:94029 -- One of our users escaped the reservation -- apologies for any confusion, we since have dealt with him wink There was a discussion on why the username and DN of the submitter of this GGUS ticket correspond to 2 different people (Hector and Kyle). The answer was given by Rob: Footprints allows a OIM user to open a ticket in the name of another so that the latter can receive all the updates. So the ticket was actually opened by Kyle.
    • GGUS:93965 -- VO_CMS_SW_DIR not set on ce208 -- solved monday.
    • GGUS:94075 -- Release area full at IN2P3, solved on site side, awaiting confirmation from CMS (we are working on that)
    • GGUS:94104 -- seeing possible SL6 specific job failures at CERN, we are also investigating on our side, but ticket is there as a head's up
    • LXPLUS access -- see reports on tuesday evening of trouble accessing LXPLUS, but that appears to have been related to a brief AFS server outage.

  • ALICE -
    • KISTI: many jobs quickly failed due to the absence of SL6 versions of certain analysis packages on the ALICE SW distribution servers
      • the missing packages will be added and job brokering improvements will be looked into

  • LHCb reports (raw view) -
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0: (GGUS:93975) voms-proxy-init does not work on lxplus(sl6)+cvmfs. Need to update the certs in CVMFS
    • T1:
      • GridKa - Tape library still broken holding up the staging of jobs - Any News? Xavier confirmed the tape library is still broken but the queues are not full so there should be no connection between the two. Marc will check again why staging appears to be slow.
      • IN2P3 - Some issues this morning with a user filing up /tmp. Resolved quickly.
    • Other : (GGUS:93966) Request to GGUS to allow fine-grained SE reporting by sites. Current suggestion is to have different end points for Tape and Disk. MariaD asked how many TEAMers/ALARMers may belong to more than one experiment, based on Marc's case GGUS:94084. Maarten's answer was "not many". The issue will be mentioned at the WLCG Ops Coord Meeting WLCGOpsMinutes130516.

Sites / Services round table:

  • ASGC: ntr
  • BNL: not connected
  • IN2P3: Pre-announcement of a long outage on June 11. Databases, BDII and VOboxes will respond but there will be no batch nor dCache, one of the services may remain down till June 12 included.
  • PIC: ntr
  • FNAL: not connected
  • OSG: GGUS:93650 was opened because time-outs are observed again between CERN and OSG BDIIs.
  • NL_T1: the EL6 upgrade at Nikhef will take place on Wednesday or Thursday.
  • CNAF: ntr
  • NDGF: not connected
  • RAL: There will be an outage next Tuesday already published in GOCDB. There will be no CASTOR availability nor any batch work possible.
  • KIT: nothing to add.

  • CERN: Only Dashboards were represented with ntr.

AOB:

  • ATTENTION: next meeting on Tuesday May 21 !
Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointppt ggus-data.ppt r1 manage 2347.5 K 2013-05-13 - 12:10 MariaDimou Final GGUS ALARM drills for the 20130514 WLCG MB
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2013-05-16 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback