Week of 181217

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Borja (Monitoring), Maarten (ALICE, GGUS), Marian (Networks), Julia (WLCG), Marcelo (LHCb), Miro (Chair, Databases),
  • remote: Christoph (CMS), Di (TRIUMF), John (RAL), Xavier (KIT), Peter (ATLAS), Sang-Un (KISTI), Victor (JINR), Xin (BNL), Dave (FNAL), Jens (NDGF)

Experiments round table:

  • ATLAS -
    • Very quiet and stable operations
      • BNL FTS upgrade looks good and switching traffic back to that instance
      • no campaigns planned over Christmas or New Year
      • thanks to all sites for collabroation over 2018 and Merry Christmas from ATLAS Computing

  • CMS -
    • CMS continues to prepare for activities over Xmas holidays
      • Various MC campaigns
      • Continue (ideally finish) HI 'prompt'RECO and archiving to tape
    • Thanks for all support in 2018 - have relaxing holidays and good start into the coming year

  • ALICE -
    • High activity on average
    • Expectations for the end-of-year break:
      • steady MC production
      • raw data reconstruction
      • analysis probably at a lower level than usual
    • Thanks to all sites and experts for a great year to end Run 2 with!
    • Season's greetings and best wishes for 2019 !

  • LHCb -
    • Activity
      • Data reconstruction for 2018 data
      • User and MC jobs
      • Staging data for reprocessing in 2019
    • Site Issues
      • SARA: Ticket open during the weekend concerning tape migration issues, fastly fixed saturday night... Thanks a lot!
    • Thanks all Sites for this great year!

Sites / Services round table:

  • ASGC: nc
  • BNL: BNL FTS server was upgraded to version 3.8.1 n Dec 13th.
  • CNAF: NTR
  • EGI: nc
  • FNAL: nc
  • IN2P3: nc
  • JINR: NTR
  • KISTI: Finished long shutdown. No data loss reported. Checks are still ongoing
  • KIT:
    • Updated dCache behind atlassrm-fzk.gridka.de to 3.2.39. At the same time we enabled IPv6 on the entire storage element, updated the PostgreSQL database backend to version 10.6 and introduced high availability for the SRM endpoint. Additionally, we changed the name of the SRM endpoint to atlassrm-kit.gridka.de, while atlassrm-fzk has been turned into a CNAME for it. Our ATLAS contact is trying to convince ATLAS to switch the name in their frameworks, too.
    • As is widely common, GridKa will continue running on best-efford basis through Christams and new Year (from 24th of December through till 2nd of January). The on-call service is always there, as usual, to handle critical errors, either detected by our own monitoring facilities, or through GGUS alarm tickets.
  • NDGF: nc
  • NL-T1: nc
  • NRC-KI: nc
  • OSG: nc
  • PIC: nc
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services:
  • CERN storage services:
  • CERN databases: NTR
  • GGUS:
    • For the end-of-year break: GGUS is monitored by a system connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. If GGUS is available but there is a problem with the workflow (e.g. ALARM to CERN doesn't generate email notification to the operators), then WLCG should submit an ALARM ticket, notifying site FZK-LCG2 (DE-KIT), which triggers a phone call to the OCE. As a last resort, the FZK-LCG2 emergency e-mail or telephone number published in the GOCDB can be contacted.
  • Monitoring: NTR
  • MW Officer:
  • Networks: NTR
  • Security: NTR

AOB:

Season's Greetings!

  • Many THANKS for all your effort in making 2018 and whole Run 2 a great success for WLCG!
    • Further challenges and opportunities await us in 2019 and LS2!
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2018-12-17 - MiroslavPotocky
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback