WLCG Operations Coordination Minutes, March 3, 2022

Main points

Agenda

https://indico.cern.ch/event/1133785/

Attendance

  • local:
  • remote: Alessandra D (Napoli), Andrea (WLCG), Andrew (TRIUMF), Borja (monitoring), Christoph (CMS), Concezio (LHCb), David Cameron (ATLAS + ARC), David Cohen (Technion), Eric (IN2P3), Giuseppe (CMS), Julia (WLCG), Maarten (ALICE + WLCG), Matt (Lancaster), Max (KIT), Miltiadis (WLCG), Panos (WLCG), Pepe (PIC), Romain (WLCG), Shawn (AGLT2 + networks), Simone (WLCG), Stephan (CMS), Thomas (DESY)
  • apologies:

Operations News

  • the next meeting is planned for April 7

Special topics

Impact of the war in Ukraine on WLCG

see the presentation

Discussion

  • Simone:
    • the CERN Council will meet on March 8, more news after that

  • Thomas:
    • what will the mailing list be used for?
  • Simone:
    • the list is there for sites to inform us about
      policies by which they have to abide and
      for advice on how to implement them
    • the list is not for sites to get news;
      other channels will be used for that

  • Romain:
    • matters can also be escalated via security contacts

Tokens & Globus update

see the presentation

Discussion

  • Thomas:
    • what about other VOs in EGI?
    • might we need to set up ARC CEs to keep supporting X509 for them?
  • Maarten:
    • EGI have launched a survey for other VOs to consider these matters
    • some may imitate WLCG and move to IAM
    • others may switch to EGI Check-in
    • grid services will need to support multiple token providers
    • if those providers share a common basis, that should not be difficult
    • after the meeting:
      • the AuthZ WG has EGI Check-in reps
      • the common basis is the AARC Blueprint Architecture
  • Thomas:
    • VOs like ILC and Belle-II use DIRAC and hence should be fine?
  • Maarten:
    • indeed, as DIRAC will be made to work for LHCb anyway
    • DIRAC also is the default framework for small VOs in EGI
  • Concezio:
    • a DIRAC release supporting tokens is in the making

  • David Cameron:
    • HTCondor-G can still use X509 with the ARC CE REST interface
      • after the meeting: the slides have been corrected
    • will the development for the longer token lifetimes be ready on time?
  • Maarten:
    • to be followed up in the AuthZ WG

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Mostly business as usual, no major incidents
  • Run-3 preparations continuing
    • ~90% of the VOboxes switched from legacy AliEn to new JAliEn services
    • Fraction of 8-core jobs to be ramped up for Run-3 workflows
      • Most sites should only receive 8-core jobs during Run 3

ATLAS

  • Mostly smooth running with average 700k cores
    • This may go down soon with fewer opportunistic resources available (EuroHPCs and HLT farm)
  • Run 2 data and MC reprocessing campaigns effectively done, just following up few remaining problematic tasks
  • Tape challenge starting on 14 March for two weeks
  • Still issues with out of date storage reporting at dCache sites

Discussion

  • Julia:
    • the dCache SRR instabilities are being looked into

CMS

  • running smoothly with 320-400k cores
    • usual production/analysis split of 3:1
    • up to 95k cores of non-pledged contribution (40k on average)
    • utilization of US HPC allocations on track/ahead of schedule
    • production activity mainly Run 2 ultra-legacy Monte Carlo
    • new large pile-up library being made; I/O limits of site storage reached resulting in CPU inefficiencies;
  • Tier-0 activities
    • successful large scale test (P5-->Meyrin, processing, writing to tape)
      • 9.1 GB/s processing reached (48 hour average), enough even for HeavyIon
    • cosmic ray data taking for Run 3 commissioning started
  • upgrade of HammerCloud test jobs for Run 3 software/input datasets on hold
    • need python3 version/port of HC, developer estimate: several weeks
  • WebDAV commissioning ongoing
    • SRM+WebDAV at all but one Tier-1 sites ready
    • endpoint check/commissioning at Tier-3 sites in progress
  • CMSWeb service upgraded to accept tokens
  • Token commissioning for HTCondor CEs in progress
    • waiting for HTCondor interface for ARC CEs
  • preparing to tape challenge later in March
    • successful transfer tests to PIC and FNAL
  • Thanks to all sites who made their 2022 pledge already available!

LHCb

  • Running at 150-170k cores, no major issues
    • lots of webdav transfer failures involving GridKa GGUS:156238
      • side effect due to CMS "putting storage systems to their limits"
  • Transfers from P8 to CERN Tier0 performed this week
    • more than 2PB transferred over two days
    • 1.6x nominal throughput sustained
    • some further optimisations possible from LHCb online
    • a couple of issues with CTA (unbalance between the nodes, and wrong archival reports) being followed up
    • plan is to keep data on EOS and use them as input for the next data challenge (3rd and 4th week of March)

Discussion

  • Stephan:
    • regarding the fallout from CMS activities:
      • the production team were too optimistic about how many
        of those "heavy" jobs could be run concurrently, sorry!
      • the last ones should finish by the weekend
      • such productions will be controlled better from now on

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • Meeting with the experts to discuss the status of preparation for the integration of the new benchmark in the accounting workflow. Will be presented at the GDB next week

Information System Evolution TF

  • Validation of the network information in CRIC is progressing well, still ongoing.

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • New XRootD Monitoring components
    • XRootD Shoveler is ready to be used to send data to CERN AMQ from non-OSG sites
    • XRootD Collector patches are being developed and tested
    • It will require having both components ready to establish some first test flows in non-OSG sites
  • Agreed on the minimum required schema for transfers to be meaningful
    • Meeting with dCache developers held to discuss viability of required fields
      • Agreement for some of these fields (activity and vo) to be discussed on a higher level as "scitags" since they are needed for other purposes as well
    • Follow up discussions with other developers (XRootD, MonALISA, ...) to be planned/held
  • Defined first "Network Monitoring" template draft, to be filled by T1s in the near future
    • First iteration will be done with AGLT2 to check how complete it is and needs for improvements

Network Throughput WG

WG for Transition to Tokens and Globus Retirement

see the special topic

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2022-03-08 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback