Week of 180514

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Borja (monit), Gavin (comp), Ivan (ATLAS), Julia (WLCG), Maarten (WLCG, ALICE), Remi (Storage), Vincent (Security)
  • remote: Alexander V (NLT1), Dave (FNAL), Di (TRIUMF), Jens (NDGF), John (RAL), Marcelo (CNAF), Sang Un (KISTI), Victor (JINR), Xavier (KIT)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Production - 297k / 330k slots
      • Full steam T0 utilization
      • Bug in pilot for Titan HPC lead to wrong output guid. Problem fixed and produced files are being corrected
      • Running out of simulation drained 100k cores in the last 12 hours. Recovering.

  • CMS reports ( raw view) -
    • average 245k core usage (200k for prod)
    • Couple of issues
      • GGUS:135079 GGUS:135036 towards IN2P3 need some follow up.
      • RAL file access issues in production to be sorted out (echo migration?).
      • Caltech stageout impacting production.

  • ALICE -
    • NTR

Sites / Services round table:

  • ASGC: nc
  • BNL: NTR
  • CNAF: Due to a necessary upgrade in the new Huawei SE, LHCb data needed to be moved to the even newer Huawei SE causing some unavailability to write in it during the weekend.
  • EGI: nc
  • FNAL: NTR
  • IN2P3: IN2P3-CC will be maintenance on June 12th, a Tuesday. As usual details will be available one week before the event.
  • JINR:
    • Short drop of main link to Moscow on 11.05. Resolved.
    • CEs stopped to work after reboot due to accumulated updates appeared incompatible with the software of the CREAM-CE and ARGUS. Resolved around noon Sunday.
  • KISTI: NTR
  • KIT: NTR
  • NDGF: A few small downtimes for pool restarts.
  • NL-T1: NTR
  • NRC-KI: nc
  • OSG: nc
  • PIC: nc
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services: NTR
  • CERN storage services:
    • EOSLHCB GridFTP gateways are being saturated with files transfers to CASTORLHCB, experiment informed.
  • CERN databases: nc
  • GGUS:
    • A new release is planned for Wed this week
      • Release notes
        • Some changes have been implemented for OSG support units, as announced recently
      • A downtime has been scheduled for 06:00-07:30 UTC
      • Test alarms will be submitted as usual
  • Monitoring:
    • Investigating why downtimes for service flavour HTCONDOR-CE not being correctly accounted for SAM reports
  • MW Officer: NTR
  • Networks: NTR
  • Security: NTR

AOB:

  • ATTENTION: next meeting on Tuesday May 22 !
Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r21 - 2018-05-14 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback