Week of 180903

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Kate (WLCG, DB, chair), Maarten (WLCG, ALICE), Nils (computing), Borja (monitoring), Enrico (storage), Ivan (ATLAS)
  • remote: David B (IN2P3), John K (RAL), Onno (NL-T1), Christoph (CMS), Sang Un (KISTI), Dave M (FNAL), Marcelo (CNAF), Vladimir (LHCb)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Production:
      • 295 / 302 k running slots
      • Moving IT sites from APF to Harvester submission

  • CMS reports ( raw view) -
    • CPU usage ~250k cores over the last week
      • ~200k cores for production
      • ~50k cores for analysis, much less over the weekend due to some infrastructure problem, solved on Monday
    • Problem with cleaning of T0 streamer files being investigated

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • Activity
      • Data reconstruction for 2018 data
      • User and MC jobs
    • Site Issues
      • RAL: Failing disk server at RAL resulting in jobs failing to get input data

Sites / Services round table:

  • ASGC: nc
  • BNL: nc
  • CNAF:
    • ATLAS: Transfers failing (GGUS:136905) - 20K+ synchronous requests per minute.
      • This is unusual behavior for us.
      • Most of this requests are statusPtP and statusPtG that are served directly from frontend
      • Did Atlas changed anything in the workflow recently? Does other T1 observes the same thing?
    • LHCb: We have a CVMFS issue in some WNs (GGUS:136959) under investigation
Ivan commented that the Harvester submission change was only implemented Monday so it was not related to the issue. No changes will be implemented before the issue is solved. Data transfer experts also denied any changes, no other site has such issues. Marcelo remarked that advanced warning is need if ATLAS needs more resources. Further discussion to be continued by GGUS.

  • CERN computing services:
    • CERN batch farm reboots for L1TF done, servers up and running again. (Ref OTG:0045525 )
    • Cloud hypervisor rebooting campaign starting on the 12th of September. (Ref. OTG:0045522 )
  • CERN storage services:
    • EOS ATLAS slowdown on Tue 28th August. Details in OTG:0045624
    • EOS ATLAS unavailable during Saturday to Sunday night. Details in OTG:0045724
    • EOS CMS: Slowdowns reported. Might be due to a routing issue in the CC network
  • CERN databases:
    • COMPASS COMPR database patching on Wednesday. Details in OTG:0045729
    • CMS CMSR database memory replacement on Wednesday. Details in OTG:0045727
  • GGUS: NTR
  • Monitoring:
    • Draft reports for the Aug availability sent around
  • MW Officer:
  • Networks: NTR
  • Security:

AOB:

Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2018-09-04 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback