DRAFT

WLCG Operations Coordination Minutes, Nov 5, 2020

Highlights

Agenda

https://indico.cern.ch/event/970604/

Attendance

  • local:
  • remote:
  • apologies:

Operations News

Special topics

ETF update

see the presentation

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Normal to high activity on average.
    • Increased use of Singularity (through JAliEn), first at CERN.
  • No major problems.
  • CERN: migration from CASTOR to CTA.
    • Many thanks to IT-ST group!
  • F@H stopped on the grid since Oct 5.

ATLAS

  • Stable Grid production in the past weeks with up to ~450k concurrently running grid job slots with the usual mix of MC generation, simulation, reconstruction, derivation production and user analysis, including ~90k slots from the HLT/Sim@CERN-P1 farm and ~10k slots from BOINC. Occasional additional peaks of ~70k job slots from HPCs.
  • Started medium-term planning for Grid workloads in the next 6-12 months. Run-3 simulation will start towards the end of next year.
  • Deletion of obsolete data from T1 DATATAPE (13PB) completed over first two weeks of October
  • Deletion from MCTAPE (33PB) will start end of November
  • Migration to CRIC is almost complete, chasing down the last AGIS users
  • World-readable data on DPM: was brought up a year ago, needs some action
  • 1/3 of DPM sites still run SLC6, will there be a campaign to upgrade them?
  • TPC migration: 8 dCache, 14 DPM now using HTTP-TPC
    • We plan to switch off HTTP-TPC on the following sites not supporting macaroons at the end of the year to allow including larger sites:
    • FMPhI -UNIBA DPM - upgraded not working yet working on it
    • NCG-INGRID-PT storm - still using YAIM to configure, working on it
    • UKI-SOUTHGRID-OX-HEP DPM - agreed to move to xcache part of the storage so not sure how much to push them if at all
    • INFN-MILANO-ATLASC - storm unanswered ticket
    • GRIF-LAL DPM - Emmanouil is currently taking over admin duties here
    • SE-SNIC-T2 - dcache unanswered ticket
    • Australia-ATLAS DPM - sort of unanswered ticket perhaps assigned to the wrong address
    • TR-10-ULAKBIM - DPM upgraded, not working yet, ongoing ticket

CMS

  • running smoothly at around 260k cores
    • usual production/analysis split of 4:1
    • main processing activities:
      • Run 2 ultra-legacy Monte Carlo
      • Run 2 pre-UL Monte Carlo
    • on track or beyond on HPC allocation use
  • migration to Rucio progressing well
    • in the tail of dataset synchronization due to bug
    • data recall from tape for analysis working
    • CTA testing ongoing
    • need FTS v3.10 to guarantee data written to tape
  • migration of CREAM-CEs continuing
    • 0/11/3 Tier-1/2/3 sites with CREAM-CE(s) remaining
  • VM migration for CERN hardware decommissioning ongoing
    • good progress
    • requested/received extensions for a few machines
  • SL 6 migration ongoing

LHCb

  • smooth running at ~110k cores, MC-dominated
  • migration to CTA ongoing
    • read/write tests EOS/CTA performed
    • exercise data transfers from the pit in the near future
  • some cleanup of ETF tests needed (e.g. machine-job-features)

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

Archival Storage WG

Containers WG

CREAM migration TF

Details here

Summary:

  • 90 tickets
  • 32 done: 16 ARC, 16 HTCondor
  • 14 sites plan for ARC, 11 are considering it
  • 20 sites plan for HTCondor, 10 are considering it, 7 consider using SIMPLE
  • 2 tickets without reply

dCache upgrade TF

DPM upgrade TF

StoRM upgrade TF

Information System Evolution TF

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

MW Readiness WG

Network Throughput WG


  • Update on WG activities and plans will be presented at WLCG ops coordination (tentative Dec)
  • perfSONAR infrastructure - 4.3.0 was released this week
    • Release notes https://www.perfsonar.net/releasenotes-2020-11-02-4-3-0.html
    • Release focused on python3 migration, but also contains important changes in PWA and new testing tools were added: ethr and s3 benchmark
    • Bug was identified and reported to developers yesterday impacting around 36 nodes in OSG/WLCG (out of 166 that auto-updated); bug-fix release (4.3.1) is in the works
  • WLCG/OSG Network Monitoring Platform
    • Work on publishing directly from perfSONAR toolkits - testing started for USATLAS/USCMS sites
    • AGLT2 had a major outage due to air conditioning failure this week, which impacted some of the central services (psconfig, psetf, psmad)
  • EU project ARCHIVER plans to use perfSONAR to test cloud connectivity
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Traceability WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r12 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2020-11-05 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback