Week of 190826

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Olga (Computing), Maarten (ALICE), Michal (ATLAS), Julia (WLCG), Vladimir (LHCb), Miro (DB, Chair)
  • remote: Di (TRIUMF), Xavier (KIT), Mike (ASGC), Xin (BNL), Marcelo (CNAF), Sang-Un (KISTI), Jens (NDGF), Dave (FNAL), Christoph (CMS)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Activities:
      • pilot2/singularity migration converging, remaining sites/queues handled one by one
      • ongoing reprocessing campaign, i.e. intense tape staging
        • sub-optimal performance in staging at pic, FZK - to be followed with sites
    • Issues
      • Transfers from INFN-T1 tapes were failing with "Communication error on send" (GGUS:142805)
        • networking issue fixed
      • Transfers from IN2P3-CC tapes were failing with "Changing file state because request state has changed" (GGUS:142818)
        • FTS channel was saturated
      • Transfers from BNL-ATLAS are failing with "File is unavailable" (GGUS:142841)
      • urgent jobs stuck at ANALY_BNL_MCORE
        • the queue has only one running job
        • email thread with US people started

  • CMS reports ( raw view) -
    • Reprocessing of Run2 data is on going
      • Will involve re staging of RAW from tape

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • Activity:
      • MC, user jobs and data restripping.
      • Massive staging at all T1
    • Issues:
      • RAL:
        • GGUS:142350; still issues accessing files on ECHO. Under investigation.
      • CNAF:

Sites / Services round table:

  • ASGC:
    • Downtime for DPM DB update (T1/T2):
      • 2019-08-26T01:00:00 ~ 2019-08-26T10:00:00 (UTC)
      • Had been finished smoothly, and the downtime is over, now starting to get jobs

  • BNL: Issues with dCache Chimera name server recently, caused transfer failures from time to time, under investigation. (GGUS:142841)
  • CNAF: Back in action with fully electrical continuity since Wednesday afternoon
    • Issues with main router in restart. This router was already scheduled to be replaced beginning of September
    • StoRM issues mentioned by LHCb and ATLAS (also in tkt) seemed to be related to the Router issue
  • EGI: NC
  • FNAL: NTR
  • IN2P3: NC
  • JINR: dCache and Enstore upgraded to the latest version (5.2.3)
  • KISTI: NTR
  • KIT:
  • NDGF: NTR
  • NL-T1: NC
  • NRC-KI: NC
  • OSG: NC
  • PIC: NC
  • RAL: NC
  • TRIUMF: NTR

  • CERN computing services:
    • BDII service degraded over the weekend. Issue now resolved. (OTG:0051959)
    • Reduced capacity in HTCondor next Monday/Tuesday (OTG:0051883) due to planned intervention (OTG:0051379).
  • CERN storage services: NTR
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring: NTR
  • MW Officer: NTR
  • Networks: NTR
  • Security: NTR

AOB:

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2019-08-26 - MiroslavPotocky
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback