WLCG Operations Coordination Minutes, Oct 20, 2022

Highlights

Agenda

https://indico.cern.ch/event/1208295/

Attendance

  • local: Maarten (ALICE + WLCG), Stephan (CMS)
  • remote: Alessandra D (Napoli), Dave M (FNAL), David B (IN2P3-CC), David C (ATLAS + ARC), Doug (BNL), Eric F (IN2P3), Giuseppe (CMS), Henryk (LHCb), Julia (WLCG), Marçal, Matt D (Lancaster), Panos (WLCG), Petr (ATLAS + Prague), Renato (EGI)
  • apologies:

Operations News

  • the next meeting is planned for Dec 1

Special topics

DPM migration campaign

see the presentation

Discussion

  • Renato:
    • the responsive sites promised to work on the migration,
      at the latest next year

  • Julia:
    • do sites that want to move to EOS know who to contact?
  • Renato:
    • they can contact the EOS team (link)

  • David C:
    • those 12 unresponsive sites may never reply: what will we do then?
    • the EOL of DPM support is tied to the EOL of CentOS 7:
      would sites be allowed to run unsupported services?
  • Renato:
    • we will need to follow up if and when that happens
    • those services would keep working for a while longer

  • Julia:
    • are there unresponsive WLCG sites?
  • Renato:
    • yes: will send the list after the meeting (done)

EGI top level BDII deployment status

The presentation needed to be postponed.

Discussion

  • Julia:
    • CERN wants to decommission the legacy lcg-bdii.cern.ch service
    • EGI have set up the lcg-bdii.egi.eu service as its replacement
    • customers of the legacy service should switch to the new service
    • we intend to discuss this further in our next meeting

  • Maarten:
    • stability checks of the new service have been running for 2 weeks
    • stress tests have been running since the weekend
    • though there were a few errors we may want to look into,
      the new service looks sufficiently ready to take over

  • Petr:
    • when can we switch off our BDII services?
  • Julia:
    • EGI depend on information served by the BDII

  • Petr:
    • when will the BDII be released for EL9?
  • Maarten:
    • will follow up with EGI Operations, who maintain the BDII product

  • Henryk:
    • LHCb still have some usage of the BDII
  • Julia:
    • we are looking into enhancing the CRIC API for that (see below)

  • Renato:
    • the catch-all BDII remains available, only the hostname will change

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

ATLAS

  • Mostly smooth running with 500-700k slots including 200-300k from HPC (Karolina EuroHPC is back in action)
  • Sites deploying EOS should set it up to use dynamic mapping based on VOMS roles and not CERN-style hard-coded gridmap files
  • The record LHC fill over the weekend of 24-25 Sept caused some problems in data export - a single dataset of 1.3PB and 270k files created at 6GB/s
  • Performed some studies during the LHC downtime on usage of HLT farm for grid jobs focussing on how we can use it between fills
  • CAs issuing host certificates need to update signature algorithms to move away from SHA-1, which is not supported in CentOS Stream 9
  • We would like tape buffer space reporting for CTA as we have for other tape systems, but this seems to be difficult

CMS

  • smooth running, few issues at CMS
  • utilized between 350k and 450k cores during last month
    • usual production/analysis split of 75% and 25%
    • significant contribution from HPCs, up to 50k cores
    • largest production activities Run 2 ultra-legacy Monte Carlo and Run 3
  • waiting on python3 version/port of HammerCloud
  • preparing migration of our DPM sites to other storage technologies
  • had a software compatibility issue that we initially thought we could resolve with a few site config changes, i.e. led to a few soon-after cancelled tickets
  • found out that Argus does not support/plans to support tokens, GGUS:159270

Discussion

  • Maarten:
    • the Argus service is only used by some of the sites
    • though conceptually nice, it has never played a critical role
    • the code is complex to support use cases that never materialized
    • it would be a big effort to add support for tokens
    • client services would need to have new callout codes as well
    • for global banning, a much simpler framework could be devised instead
    • Argus remains supported for existing use cases on CentOS 7
  • Stephan:
    • it would be good to indicate Argus is in maintenance mode
  • Maarten:
    • will follow up

LHCb

  • Smooth running around 100K jobs, mostly MC generation
  • Small number of production requests, HLT Farm switched off
  • No data processing yet
  • Significant progress on a couple of problematic and long-standing tickets

Task Forces and Working Groups

Collecting input regarding MONIT

Discussion

  • Julia:
    • thanks to the experiments for having provided very useful input to the MONIT team

GDPR and WLCG services

Discussion

  • Julia:
    • we will first finish the T1 campaign before starting with the T2 sites

Accounting TF

  • Discussion on the HEP-SCORE task force meeting regarding new benchmark deployment scenarios and implications for the accounting infrastructure. This discussion will be continued at the WLCG workshop in November, to come to a conclusion and then probably re-define the requirements for the changes in the accounting workflow
  • The draft of the specification of the accounting record supporting reporting of several benchmarks has been proposed by Adrian Coveney from APEL team

Discussion

  • Renato:
    • is there a deadline for the new benchmark?
  • Julia:
    • we hope to start using the new benchmark by April next year,
      but a decision about the time line has not been taken yet
    • it is not yet clear whether we need to support the old and
      the new benchmark concurrently or we would be able to
      apply a conversion factor instead
    • this will be further discussed at the WLCG WS in November
    • supporting 2 concurrent benchmarks would imply more work
      for the ARC and HTCondor CE devs as well

dCache upgrade TF

  • Julia:
    • only the SRR deployment still needs to be completed
    • ATLAS already switched to it

Information System Evolution TF

  • Following LHCb request CRIC team is enabling a new feature in CRIC which would allow providing LHCb with information on which services are supporting LHCb. CRIC will use BDII as an information source where possible and will provide a possibility to update service usage information directly in CRIC, in case it is missing in BDII. Then DIRAC can use CRIC API to get this information and not query BDII directly any more.

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • XRootD monitoring flow being validated with the help from ALICE
    • Numbers look good so far, although direct comparison is hard due to different timestamps being used
  • Requested OSG to deploy the CERN adapted components for further data validation
    • No news on this matter

Network Throughput WG


WG for Transition to Tokens and Globus Retirement

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

-- JuliaAndreeva - 2022-10-05
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2022-10-21 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback