WLCG Operations Coordination Minutes, April 7, 2022

Main points

Agenda

https://indico.cern.ch/event/1146515/

Attendance

  • local:
  • remote: Alessandra D (Napoli), Alessandro (ATLAS), Andrew (TRIUMF), Borja (monitoring), Dave M (FNAL), David Cameron (ATLAS), David Cohen (Technion), David South (ATLAS), Dirk (T0), Eric G (databases), Eva (databases), Federico (LHCb), Gavin (T0), Giuseppe (CMS), Igor (ATLAS), Joerg (ATLAS), Julia (WLCG), Maarten (ALICE + WLCG), Marc (CMS), Marian (networks + monitoring), Matt (Lancaster), Miltiadis (ATLAS + WLCG), Priscilla (ATLAS), Silvia (ATLAS), Stephan (CMS), Thomas (DESY), Zach (ATLAS)
  • apologies:

Operations News

  • the next meeting is planned for May 5

Special topics

Impact on WLCG of the war in Ukraine - update

  • the WLCG position is in line with the position of CERN,
    decided in the CERN Council week of March 21-25 and
    summarized here
  • the impact of potential further measures is being studied,
    in preparation for the Council week in June
  • some LHCONE connections to Russian sites have been halted
  • jobs from users in Russian institutes are no longer run at some sites

Announcement about experiment Oracle online database service for RUN3

  • Eva:
    • we will organize a meeting with experiment online representatives
      to allow support concerns to be discussed
    • as there have been almost no incidents with the online databases
      in the past years, we want to go to a best-effort support level
    • in the past we had a piquet service, but mind that with such a
      support level we still could not guarantee that problems would
      be solved quickly, as they may depend on other services that
      do not have similar (expensive) support levels

  • Alessandro:
    • as we already have dedicated meetings between the DB group and
      the experiments, we do not have to discuss that in this meeting
  • Joerg:
    • for ATLAS online the relevant contacts are the TDAQ experts
  • Marc:
    • the DB group know who the contacts are in CMS and LHCb
  • Eva:
    • and also in ALICE

  • Alessandro:
    • in this meeting we can check if the WLCG alarm procedure is still valid
  • Eva:
    • that procedure has not changed and will keep working as before
  • Eric G:
    • the CC operator remains the first contact for Oracle online incidents

Publishing of the WLCG Privacy Notice for WLCG services

see the presentation

Discussion

  • Thomas:
    • at DESY we have created data processing descriptions for services
    • can we use a template for the WLCG privacy notice requirements?
  • Julia:
    • if a site has its own privacy notice requirements,
      it can deal with those similarly to how CERN implements RoPOs
    • for WLCG, a customizable privacy notice will become available

Discussion about readiness for RUN3 from the operations perspective

  • see the experiment input
  • there were no other points mentioned during the discussion

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiment input regarding readiness for RUN3 from the operations perspective

ALICE

  • We have no concerns or suggestions at this time

ATLAS

  • No big changes in meetings or procedures required
  • Some concerns about Oracle online support where a dedicated support procedure (with phone numbers) should be clarified

CMS

  • we are ok with the WLCG Ops meetings on Monday (please do not shift to a later time)
  • we have no support procedure concern
  • we have no suggestion for Run 3 (scaling exercises still planned for Tier-0 together with the HLT/OnlineCloud)

LHCb

  • The only really specific changes to Run3 that are still missing are in the field of pit export.
  • Regular data challenges and FEST activities are organized to keep track of progress
  • Oracle Online is critical

Experiments Reports

ALICE

  • Mostly business as usual, no major incidents
    • High analysis activity until mid March in preparation for Quark Matter 2022, April 4-10
  • Run-3 preparations continuing
    • ~95% of the computing capacity is running JAliEn jobs
      • VOboxes at a few sites have not been switched yet
    • Fraction of 8-core jobs being ramped up for Run-3 workflows
      • Most sites should only receive 8-core jobs during Run 3

ATLAS

  • Smooth running with 500-600k cores
    • Main campaign is preparation of Run 3 MC samples
  • How to coordinate update of CEs to support submission from HTCondor 10 (ARC CE REST and HTCondorCE token support)
  • Sites can now turn off GridFTP storage if desired
    • Only Glasgow and some US T3s still do not support WebDAV but Rucio multi-hop will be used with those sites
  • Tape challenge very successful, to follow up with dedicated tests at a couple of sites
  • Thanks to the sites for timely delivery of 2022 pledges

Discussion:

  • Maarten:
    • a campaign is foreseen this spring for sites to upgrade their CEs
      to versions compatible with HTCondor 10, when we have converged on
      how the CEs should be configured for token support etc.
    • ATM there are ad-hoc instructions for ATLAS and CMS
    • we need to ensure X509 keeps working for other VOs

  • Julia:
    • an update on SRR is given in the dCache upgrade TF news

CMS

  • Offline and Computing Week last week and CMS Week this week
  • running smoothly with 310-410k cores
    • usual production/analysis split of 3:1
    • 20-70k cores of non-pledged contribution
    • production activity mainly Run 2 ultra-legacy Monte Carlo
  • Tier-0 activities
    • cosmic ray data taking of Run 3 ongoing
  • waiting on python3 version/port of HC
  • WebDAV commissioning ongoing
    • Tier-0,1 all sites ready/in production
    • Tier-2 all but two sites ready/in production
    • Tier-3: ten sites ready/in production, 14 sites working on it, on hold/no reply from three sites
    • CERNBox important destination without WebDAV endpoint
  • Token commissioning for HTCondor CEs in progress
    • waiting for HTCondor interface for ARC CEs
  • RAL team investigating WebDAV access failures during high ECHO activity, GGUS:156337
  • issues with EOS Erasure Coding at CERN and Vienna, GGUS:156195

LHCb

  • Running at 150-170k cores, no major issues
  • Data/Tape challenges are quite an effort but proving to be useful

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • A meeting has been organized with APEL, GRACC (OSG accounting) and HTCondor developers to discuss changes required to support two benchmarks ( HS06 and HepScore) in parallel in the accounting workflow.
  • Status of work to support two benchmarks in the accounting workflow has been presented at the GDB in March

dCache upgrade TF

  • Several dCache sites have enabled SRR publishing it with the recent dCache versions to the dCache frontend which is the recommended option since it allows to avoid operational problems, when for example, experiment areas got full. We (WLCG ops) are polishing documentations with the help of the dCache experts and sites which already passed through this exercise and will pass to the upgrade campaign asap. So far ATLAS did not notice any problems with SRR published with this approach.

Information System Evolution TF

  • Validation of the network info in CRIC is ongoing. 90 tickets were submitted, 66 are done, 24 still to go.

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • New XRootD Monitoring components
    • The two new components are ready to start setting some testing bed
      • Initial aim is to use CERN EOS Alice servers so data can be compared with the one reported by Monalisa
      • Once we are sure about data validation, further sites will be contacted for deployment of the new flow
  • Agreed on the minimum required schema for transfers to be meaningful
    • Discussion hold with the different interested developers and agreement on first draft for fields defined
      • A document with the draft will be circulated to the experiments to reach a final agreement
    • Started campaign requesting experiments to fill in the activies spreadsheet prepared to be used as the base

Discussion:

  • Julia:
    • a presentation is foreseen for the May GDB

Network Throughput WG


WG for Transition to Tokens and Globus Retirement

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

-- JuliaAndreeva - 2022-04-01
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2022-04-30 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback