WLCG Operations Coordination Minutes, Nov 5, 2020





  • local:
  • remote: Andrew (TRIUMF), Christoph (CMS), Concezio (LHCb), Dave M (FNAL), David B (IN2P3-CC), David C (ATLAS), Federico (LHCb), Giuseppe (CMS), Hector (CMS), Maarten (ALICE + WLCG), Marian (monitoring + networks), Matt D (Lancaster), Panos (WLCG), Paolo (CMS), Renato (LHCb + CBPF + ROC_LA), Stephan (CMS), Thomas (DESY)
  • apologies:

Operations News

  • the next meeting is planned for Dec 3

Special topics

ETF update

see the presentation


  • Thomas: what about the history of past tests for debugging?
  • Marian:
    • all useful outputs are sent to Monit, which keeps the history
    • ETF only has results from the current job and the previous job
  • Maarten: more logs on ETF and/or Monit to be considered only when really needed

  • Maarten: the CREAM EOL might be postponed by a few months
    • it was tied to the end of EOSC-hub, which got extended, AFAIU
    • clarified after the meeting: the CREAM EOL remains Dec 2020!

  • Renato: some ROC_LA sites may be unable to migrate before next year

  • David C: ATLAS may need to support CREAM for a few months more

  • Stephan: for CMS the Dec timeline remains OK
  • Marian: that will allow HTCondor to be upgraded already on the CMS instances

  • Maarten:
    • self-subscription for notifications is not urgent
    • people can send an e-mail or open a ticket
    • maybe it can be done via the flexible GUI of CRIC?
  • Marian: will check with the CRIC devs

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports


  • Normal to high activity on average.
    • Increased use of Singularity (through JAliEn), first at CERN.
  • No major problems.
  • CERN: migration from CASTOR to CTA.
    • Many thanks to IT-ST group!
  • F@H stopped on the grid since Oct 5.


  • Stable Grid production in the past weeks with up to ~450k concurrently running grid job slots with the usual mix of MC generation, simulation, reconstruction, derivation production and user analysis, including ~90k slots from the HLT/Sim@CERN-P1 farm and ~10k slots from BOINC. Occasional additional peaks of ~70k job slots from HPCs.
  • Started medium-term planning for Grid workloads in the next 6-12 months. Run-3 simulation will start towards the end of next year.
  • Deletion of obsolete data from T1 DATATAPE (13PB) completed over first two weeks of October
  • Deletion from MCTAPE (33PB) will start end of November
  • Migration to CRIC is almost complete, chasing down the last AGIS users
  • World-readable data on DPM: was brought up a year ago, needs some action
  • 1/3 of DPM sites still run SLC6, will there be a campaign to upgrade them?
  • TPC migration: 8 dCache, 14 DPM now using HTTP-TPC
    • We plan to switch off HTTP-TPC on the following sites not supporting macaroons at the end of the year to allow including larger sites:
    • FMPhI -UNIBA DPM - upgraded not working yet working on it
    • NCG-INGRID-PT storm - still using YAIM to configure, working on it
    • UKI-SOUTHGRID-OX-HEP DPM - agreed to move to xcache part of the storage so not sure how much to push them if at all
    • INFN-MILANO-ATLASC - storm unanswered ticket
    • GRIF-LAL DPM - Emmanouil is currently taking over admin duties here
    • SE-SNIC-T2 - dcache unanswered ticket
    • Australia-ATLAS DPM - sort of unanswered ticket perhaps assigned to the wrong address
    • TR-10-ULAKBIM - DPM upgraded, not working yet, ongoing ticket


  • Maarten:
    • we will follow up on the DPM HTTP read access issue
    • please send example endpoints via e-mail

  • Maarten:
    • we had no intention to follow up on services remaining too long on SL6
    • we consider the OS a standard Linux system administration matter
    • site admins must not count on reminders or campaigns for OS upgrades
    • we do such things only for grid middleware and the WN environment
      • e.g. to have the desired support for Singularity or experiment SW
    • production services should run supported MW on a supported OS


  • running smoothly at around 260k cores
    • usual production/analysis split of 4:1
    • main processing activities:
      • Run 2 ultra-legacy Monte Carlo
      • Run 2 pre-UL Monte Carlo
    • on track or beyond on HPC allocation use
  • migration to Rucio progressing well
    • in the tail of dataset synchronization due to bug
    • data recall from tape for analysis working
    • CTA testing ongoing
    • need FTS v3.10 to guarantee data written to tape
  • migration of CREAM-CEs continuing
    • 0/11/3 Tier-1/2/3 sites with CREAM-CE(s) remaining
  • VM migration for CERN hardware decommissioning ongoing
    • good progress
    • requested/received extensions for a few machines
  • SL 6 migration ongoing


  • smooth running at ~110k cores, MC-dominated
  • migration to CTA ongoing
    • read/write tests EOS/CTA performed
    • exercise data transfers from the pit in the near future
  • some cleanup of ETF tests needed (e.g. machine-job-features)

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

Archival Storage WG

Containers WG

CREAM migration TF

Details here


  • 90 tickets
  • 32 done: 16 ARC, 16 HTCondor
  • 14 sites plan for ARC, 11 are considering it
  • 20 sites plan for HTCondor, 10 are considering it, 7 consider using SIMPLE
  • 2 tickets without reply

dCache upgrade TF

DPM upgrade TF

StoRM upgrade TF

Information System Evolution TF

IPv6 Validation and Deployment TF

Detailed status here.


MW Readiness WG

Network Throughput WG

  • Update on WG activities and plans will be presented at WLCG ops coordination (tentative Dec)
  • perfSONAR infrastructure - 4.3.0 was released this week
    • Release notes https://www.perfsonar.net/releasenotes-2020-11-02-4-3-0.html
    • Release focused on python3 migration, but also contains important changes in PWA and new testing tools were added: ethr and s3 benchmark
    • Bug was identified and reported to developers yesterday impacting around 36 nodes in OSG/WLCG (out of 166 that auto-updated); bug-fix release (4.3.1) is in the works
  • WLCG/OSG Network Monitoring Platform
    • Work on publishing directly from perfSONAR toolkits - testing started for USATLAS/USCMS sites
    • AGLT2 had a major outage due to air conditioning failure this week, which impacted some of the central services (psconfig, psetf, psmad)
  • EU project ARCHIVER plans to use perfSONAR to test cloud connectivity
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Traceability WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments


Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2020-11-09 - MaartenLitmaath
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback