Week of 180730

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes



  • local:
  • remote:

Experiments round table:

  • ATLAS reports ( raw view) -
    • variable production between 270-350k slots
      • almost 100k from CERN-P1 for most of week (but new installation and scalability issues)
      • problems with some big sites and their local storage
      • problems with grid CE for CERN-T0 resources (update & bouncycastle)
      • updates related to the pilot causing job failures
      • issues with production input file transfer rate
    • overloaded rucio readers & problems with automatic midnight restarts (new implementation in pipeline)
    • data reprocessing campaign - more inputs transferred from tape
    • storage at INFN-T1 unstable and down several times during last week
    • automatic downtime synchronization from OIM/GOCDB to AGIS not working for some SEs (requires manual blacklisting)

  • CMS reports ( raw view) -
    • smooth sailing, with complete utilization of resources.
    • a lower pressure to cern resources (T0 + HLT) is probable in the next days, in order to empty a production buffer which is getting a bit out of control
    • we deployed the new CMSSW release @ T0, which is expected to give physics grade samples from now to the end of the run
    • we started preparing the mixing library for MC 2018, already close to the 500M events needed. Main activities will switch from the tails in 2017 MC to 2018 MC.

  • ALICE -
    • Normal activity on average until the weekend
    • Central services task queue DB HW problem Sat afternoon
      • Fixed Sat late evening
    • Lowish activity also on Sunday due to a big production having a bad TTL
      • It was set too large to match any resources
      • Fixed Sunday late evening

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • EGI:
  • FNAL:
  • IN2P3:
  • KISTI:
  • KIT:
  • NDGF: Bluegrass site (the Triolith's replacement) is not up still. The work on it is postponed until August. Until then we are missing ca. 1/4 of the computational power of NDGF-T1 site.
  • NL-T1: This morning we suddenly saw major errors in our dCache environment and transfers started to fail. We tried to solve this by restarting dCache. Unfortunately dCache refused to start again and it took us several hours to find a work around. We are now back on line. We have contacted dCache support to investigate this matter.
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL: Last week's power testing went more or less OK.
  • TRIUMF: Started replicating the data to new storage at new data centre, likely can finish it in one month.

  • CERN computing services:
  • CERN storage services:
  • CERN databases: NTR
  • GGUS:
  • Monitoring:
  • MW Officer:
  • Networks: GGUS:135962 - Transfers from FNAL to DESY failing due to timeout. Re-testing was done today, but it doesn't show any obvious network issues.
  • Security: NTR


Edit | Attach | Watch | Print version | History: r20 | r15 < r14 < r13 < r12 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r13 - 2018-07-30 - MarianBabik
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback