Week of 130715

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1



  • local: Ben, Eddie, Eva, Luc, Maarten, Robert
  • remote: Alexander, David, Kyle, Lisa, Marc, Pepe, Ron, Saverio, Sonia, Tiju, Vladimir, Wei-Jen, Xavier

Experiments round table:

  • CMS reports (raw view) -
    • Continuing 2011 legacy rereco activity and some Upgrade MC generation
    • Several issues open at IN2P3:
      • GGUS:95654 SAM CE briefly red due to job submission timeout on Jul 11, then green until midnight Jul 15.
      • GGUS:95704 Data staging -- leaving ticket open until staged
      • GGUS:95720 File read error
    • Two tickets against RAL:
    • One against CNAF: GGUS:95698 -- CVMFS black hole WN -- "The wn have been closed before this tkt was filed." smile
    • One against CERN: GGUS:95713 -- SAM SRM errors on Jul 13, Green since however.

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0:
      • CERN: Wrongly terminated FTS transfers (GGUS:95642)
    • T1:
      • GridKa: currently we have no problem in staging, but we have no any information what was changed (GGUS:95135)
        • Xavier: communication with the tape system was broken for all VOs since Fri evening 22:00 CEST, fixed this morning, cause unknown; the ticket will be updated when more is known
      • PIC: Pilots aborted (GGUS:95730) solved by removing all *_sl5 queues from configuration.

Sites / Services round table:

  • ASGC - ntr
  • CNAF - ntr
  • FNAL - ntr
  • IN2P3
    • ticket GGUS:95726 was wrongly assigned to us, the problem was due to an expired DDM proxy
  • KIT
    • Wed-Thu July 24-25 site downtime, jobs will be drained on Tue
    • Tue July 23 additional downtime for LHCb SE
    • Fri July 26 additional downtime for CMS SE
  • NLT1 - ntr
  • OSG
    • the GGUS alarm re-test went OK, the original issue was understood
  • PIC
    • because of high electricity costs we will stop 25% of our CPU resources until the start of August
  • RAL - ntr

  • dashboards - ntr
  • databases
    • tomorrow morning at 10:00 CEST: transparent intervention on integrations DBs to activate encryption and checksumming in the Oracle network layer
  • GGUS: Data and instruction for tomorrow's MB attached.
  • grid services - ntr




  • local: Eddie, Eva, Jan, Ken, Luc, Maarten
  • remote: Jeremy, John, Kyle, Marc, Matteo, Michael, Roger, Ronald, Wei-Jen, Xavier

Experiments round table:

  • ATLAS reports (raw view) -
    • T0/Central services
      • atlascops (voatlas161 node unreachable) INC:342109 rebooted OK
      • No such file or directory 550 at CERN-EOS GGUS:95715 & INC:338842. User contacted to recreate the files
        • Jan: to be clear, those files have never been on EOS
    • T1
      • BNL file recovery progressing
      • RAL disk server problem. Successfully completed its rebuild, files can be accessed
        • John: the bad disk server has been drained successfully and will be tested

  • CMS reports (raw view) -
    • Continuing 2011 legacy rereco activity and some Upgrade MC generation, everything pretty quiet
    • Several tickets open at RAL
    • Currently seem to have some problem with CMSSW installs on various nodes at FNAL, SAV:138771
    • Yesterday the Castor T1TRANSFER service was degraded for some hours. But everything seems OK now.
      • Jan: activity spikes appear to have led to SRM timeouts and subsequent aborts, though there might (also) have been an issue with the CASTOR DB performance, we will look further into it
    • INC:340403 was opened on Tuesday about a machine that had high load. Got fixed yesterday, I'm not sure by whom.

  • ALICE -
    • CNAF: a plan has been developed for re-staging 2010 data (400k files) to check for corrupted files (GGUS:95073) and have such cases fixed, while avoiding contention with reprocessing campaigns: thanks!

Sites / Services round table:

  • ASGC - ntr
  • BNL - ntr
  • CNAF - ?
  • GridPP - ntr
  • IN2P3 - ntr
  • KIT
    • reminder of various downtimes next week, as announced already and recorded in GOCDB
  • NDGF
    • network maintenance 18:00-20:00 UTC today, some pools may be affected
  • NLT1 - ntr
  • OSG - ntr
  • PIC - ntr
  • RAL
    • CVMFS 2.1.12 has been deployed on all WN
    • CASTOR upgrades for ALICE and LHCb foreseen for next Tue, but not yet decided

  • dashboards - ntr
  • databases - ntr
  • storage
    • srm-lhcb has had various core dumps lately; a patch applied on Tue seems to have cured the problem; next Tue the other instances will also be patched, should be transparent


Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointppt ggus-data.ppt r3 r2 r1 manage 2314.0 K 2013-07-11 - 15:59 MariaDimou Draft GGUS slides for the 2013/07/16 WLCG MB.
PowerPointppt ggus-slides-template.ppt r1 manage 2241.5 K 2013-07-10 - 17:10 MariaDimou GGUS-related slides' template and instructions for the WLCG MB
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2013-07-18 - MaartenLitmaath
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback