WLCG Operations Coordination Minutes, Feb 4, 2021

Highlights

Agenda

https://indico.cern.ch/event/999296/

Attendance

  • local:
  • remote: Adrian (APEL), Alessandra (Napoli), Alessandro Di G (ATLAS), Alessandro P (EGI), Andreas H (DESY-ZN), Andreas P (KIT), Andrew (TRIUMF), Benjamin (EGI), Catalin (EGI), Christoph (CMS), Concezio (LHCb), Daniel (security), David Cameron (ATLAS), David Cohen (Technion), David South (ATLAS), Federico (LHCb), Gavin (CERN computing), Giuseppe (CMS), Hannah (CERN SSO), Julia (WLCG), Maarten (ALICE + WLCG), Marian (networks + monitoring), Matt (Lancaster), Panos (WLCG), Paolo (CERN SSO), Thomas (DESY-HH)
  • apologies:

Operations News

  • the next meeting is planned for March 4

Special topics

Remaining dependencies on lcg-bdii.cern.ch

see the APEL presentation

ALICE

None

ATLAS

None

CMS

None

LHCb

LHCb/DIRAC queries lcg-bdii.cern.ch:2170 to get CE info

Discussion

  • Alessandro P:
    • most sites have been using lcg-bdii.cern.ch by default
    • they can switch to the top-level BDIIs of their NGIs instead
    • or they can define the accounting message broker explicitly

  • Adrian: the explicit definition is supported since quite a while

  • Julia:
    • sites probably should just do that
    • will EGI still consider setting up a top-level BDII then?

  • Federico: we use the BDII to discover new CEs as well as CE details

  • Maarten:
    • there are risks associated with running on resources discovered in the BDII
    • normally VOs should first validate the resources they want to entrust with jobs

  • Federico:
    • we use the BDII to discover if our list of CEs for a site must be updated
    • we also use it to discover opportunistic resources for MC simulation jobs

  • Thomas: our HTCondor CEs are not published in the BDII

  • Julia: the GOCDB also has semi-static info

  • Federico: it lacks queues and whether to use single- or multi-core jobs

  • Gavin: what do other experiments do?

  • Julia: they first run tests and negotiate with the service providers

  • Adrian: in principle the GOCDB could be enhanced

  • Maarten:
    • we already looked into that in the past years
    • missing functionality was put into CRIC instead

  • Julia: CRIC can bridge the gap, but site admins would have to update it

  • Andreas P:
    • the information in the BDII cannot always be considered reliable
    • that is why there are systems like AGIS and CRIC
    • other DIRAC VOs do not have to be affected by the decommissioning of the CERN BDII

  • Alessandro P:
    • DIRAC can use a list of BDII services provided by big NGIs
    • mind: EGI needs site- and top-level BDII services for various use cases

  • Julia:
    • DIRAC can use other BDII instances
    • sites can adjust their APEL configuration or use a different BDII
    • we will follow up on these conclusions

  • Federico:
    • I can change the default in DIRAC, which will then be taken by all DIRAC VOs
    • the definition can be a list of BDIIs

  • Catalin: if CERN stops its top-level BDII, other NGIs might follow?

  • Maarten:
    • other NGIs have other communities to support
    • running a BDII may be one of the requirements

  • Catalin: EGI may set up a catch-all BDII later this year

Experiment use cases for altsecurityidentities in XLDAP service

see the presentation

ALICE

None

ATLAS

Only through CRIC

CMS

  • certificate mapping to CERN user name for CMS Grid tools
  • require CERN (or WLCG) foreign certificate registration and mapping to CERN usernames and email addresses with well defined API (not necessarily LDAP)

LHCb

  • Federico: LHCb does not use that functionality

Discussion

  • Hannah: we are trying to move away from certificates

  • Giuseppe: could the functionality be implemented in CRIC?

  • Panos:
    • CMS do not want to have the information mixed into their instance
    • we could set up a separate instance or use the WLCG CRIC for it
    • as also ATLAS have a dependency, it does not look specific to a VO

  • Paolo:
    • we implemented the functionality only to support CERN SSO use cases
    • it was never meant for storing external certificate details

  • Hannah: what would be an argument against using CRIC?

  • Julia: CRIC was meant for topology use cases, not authorization

  • Panos: it could be done

  • Julia: we need to discuss this internally

  • Christoph: could the functionality be added to the new SSO?

  • Paolo:
    • it would imply significant extra development effort
    • we currently depend a lot on the legacy infrastructure for it

  • Hannah: the current schema even is specific to Windows

  • Julia: what is the timeline for the new SSO to replace the legacy system?

  • Hannah: a few more years before the old back-end can be stopped

  • Julia: that is very similar to the timeline for phasing out certificates!

  • Alessandro Di G: how may we keep the functionality?

  • Hannah:
    • as it is related to the grid, maybe IAM could handle it instead?
    • please bring it up in the Authorization WG

  • Maarten: we need to optimize the overall effort spent on this legacy use case

  • Julia: we will take it offline

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • High to very high activity on average in the last weeks.
    • New record reached on Jan 24 and 31: 176k concurrent jobs.
  • No major problems.

ATLAS

  • Stable running the last two months including over Xmas break
  • Improvements in upgrade software mean a lot fewer jobs with very high memory requirements
  • Problem with Swiss CA affected data transfers to/from Uni Bern and CSCS
  • TPC: ATLAS will concentrate only on HTTP as a protocol, all xrootd TPC transfers have stopped
    • Migration status: dcache: 17, DPM: 23, StoRM: 1, EOS: 1, Xrootd: 2, Total: 44

CMS

  • CMS collaboration meeting this week
  • running smoothly at around 340k cores
    • KIT, CNAF and RAL contributed beyond pledge
    • usual production/analysis split of 3:1
    • main processing activities:
      • Run 2 ultra-legacy Monte Carlo
      • Run 2 pre-UL Monte Carlo
    • on track or beyond on HPC allocation use
      • sustained contribution from US HPCs
  • prefer CentOS replacement with longevity, i.e. >>5 year support cycle
  • no BDII dependence
  • require CERN (or WLCG) foreign certificate registration and mapping to CERN usernames and email addresses with well defined API (not necessarily LDAP)

LHCb

  • Federico: essentially NTR

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • Discussing with APEL developers integration of the new benchmark in the accounting flow

Archival Storage WG

Containers WG

CREAM migration TF

Details here

Summary:

  • 90 tickets
  • 57 done: 29 ARC, 27 HTCondor, 1 none
  • 8 sites plan for ARC, 6 are considering it
  • 11 sites plan for HTCondor, 6 are considering it, 5 consider using SIMPLE
  • 1 ticket without reply

Discussion

  • Marian:
    • when can CREAM support be switched off in ETF?
    • already done for CMS

  • Maarten:
    • EGI are ticketing sites that did not replace their CREAM CEs yet
    • sites have until the end of Feb to migrate without penalty
    • if a few more weeks can be tolerated, let's support CREAM until March

  • ATLAS, LHCb: no objections

  • Julia:
    • CREAM support in ETF can be switched off at the start of March
    • the remaining open tickets should be updated with that information

  • Maarten: OK

dCache upgrade TF

  • Almost done. 37 out of 41 instances migrated to 5.2.15 or higher

DPM upgrade TF

  • 34 out of 49 DPM sites have migrated to DPM 1.14 and enabled macaroons

StoRM upgrade TF

  • 10 out of 24 sites upgraded to 1.11.19

Information System Evolution TF

  • CMS CRIC has been upgraded to the latest release
  • WLCG CRIC has been upgraded to the latest release.
    • MONIT team is planning to switch to the CRIC API instead of experiments VOfeeds.
    • Improved home page
  • Discussed with the network and perfsonar experts necessary functionality in CRIC to become a WLCG network topology source
  • Migration of AGIS to ATLAS CRIC is ongoing

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

Network Throughput WG


  • perfSONAR infrastructure - 4.3.2 is the latest release
  • WLCG/OSG Network Monitoring Platform
    • Discussing with CRIC team the possiblity to use it to store the aggregated perfSONAR topology (GOCDB/OSG/NREN/etc.)
    • Work on publishing directly from perfSONAR toolkits - tests are on-going
    • An issue was identified with central configuration (psconfig/PWA), which is being investigated in collaboration with perfSONAR developers (psconfig degraded for now)
  • EU project ARCHIVER will use perfSONAR to test cloud connectivity
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Traceability WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2021-02-09 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback