DRAFT

WLCG Operations Coordination Minutes, month day, year

Highlights

Agenda

Attendance

  • local:
  • remote:
  • apologies:

Operations News

Special topics

CNAF outage

Upgrade of storage at the WLCG sites to versions required for TPC

Review of the situation at the T1 sites

Site Storage Implementation and Version Target Need upgrade When is planned Comment
BNL-ATLAS dCache 3.0.11, 4.2.22 5.2.0 YES    
FZK-LCG2 dCache 3.2.39 & Xrootd 4.8.4 5.2.0 & 4.9.1 YES    
IN2P3 -CC dCache 4.2.34-1 & Xrootd 4.6.11 5.2.0 & 4.9.1 YES    
INFN-T1 StoRM 1.11.15 & Xrootd 3.3.2-1, 4.8.4 1.11.15 & 4.9.1 YES   StoRM is OK, Xrootd needs upgrade
JINR-T1 dCache 5.2.3 5.2.0 NO    
KR-KISTI EOS 4.4.23 & Xrootd 4.8.4 4.4.39 & 4.9.1 YES    
NDGF-T1 dCache 5.2.2 5.2.0 NO    
NIKHEF-ELPROD DPM 1.9.0 1.12.1 YES    
SARA-MATRIX dCache 4.2.34 5.2.0 YES    
RAL-LCG2 Castor & ECHO ? ?   For the current Echo version TPC via xrootd is working, currently working on implementing http via the XrootD plugin. Separately also working on getting http TPC transfers to work to S3 buckets in Echo (via DynaFed).
RRC-KI dCache 3.2.* & EOS 4.2.29 5.2.0 & 4.4.39 YES    
TRIUMF-LCG2 dCache 4.2.39 5.2.0 YES    
Taiwan-LCG2 DPM 1.12.1 1.12.1 with DOME YES   Version is fine, reconfiguration for Dome is required
USCMS-FNAL dCache ? 5.2.0 ?    
pic dCache 4.2.32 5.2.0 YES   Our plan is to upgrade all postgres BD and zookeeper before the end of 2019 and also check dCache 4.2 new features (macaroons, dCache Alarm, HA and QoS). Then we will start testing 5.2 on 2020 and our plan is to upgrade to 5.2 during spring 2020
Link to CRIC for T1s storage info

Tier1 site support teams, in case you did not check and update info in CRIC, please, do ASAP.

Detailed instructions how to proceed

New accounting workflow

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • NTR

ATLAS

  • Smooth Grid production over the past weeks with ~320k concurrently running grid job slots with the usual mix of MC generation, simulation, reconstruction, derivation production, user analysis and a dedicated reprocessing campaign (see below). In addition ~90k job slots from the HLT/Sim@CERN-P1 farm when it was not used for TDAQ purposes. Some periods of additional HPC contributions with peaks of ~50 k concurrently running job slots running simulation using EventService.
  • Since August 8th running a special reprocessing campaign of 2018 data (~7 PB, 3mio files) using the data carousel setup. This requires a stage-in of all RAW inputs from tape at the Tier1s. A few notes and possible future improvements:
    • Expected throughput: Stage 7 PB in 2 weeks: 5.8 GB/s, for a 10% Tier1: 580 MB/s
    • INFN-T1 could not be used due to its ~2 weeks downtime - CERN CTA was used very successfully instead.
    • Observed contention in the data export along the tape -> disk buffer -> data disk path at Triumf and IN2P3-CC since the tape staging was faster than the copying away of the data from the disk buffer - room for improvements on the FTS and dCache side .... more news from FTS experts.
    • Suboptimal tape staging performance observed at FZK and PIC
    • Tier1s: are the the file pins are respected on the disk tape buffer ?
    • Improve the ATLAS WFMS task release threshold w.r.t the optimal fraction of available inputs files on disk after stageing from tape
  • Can dCache please have a look and implement the following space token writing feature request which is critical for TPC and non-gridftp stores: https://github.com/dCache/dcache/issues/3920
  • CentOS7 site migration still not finished: https://twiki.cern.ch/twiki/bin/view/AtlasComputing/CentOS7Deployment?sortcol=2;table=2;up=0#sorted_table Have set the analysis queues at 13 sites to job-broker-off mode and will do the same for the corresponding production queues on September 15th in case of no visible progress.
  • Now in the tails of switching to the new PanDA worker node pilot version2 + singularity. Only moving CentOS7 queues. Excluding CERN-P1, there are the following jobs slots converted/not-converted: ~300k pilot2, ~260k pilot2+singularity, ~20k pilot1(and still to be migrated to pilot2+singularity+CentOS7)

CMS

  • Rather smooth running over the summer
    • No major site issues, at some (longer than usual) delays due to site staff being on vacation
  • Main activities
    • Reconstruction of 'parked' b-physics events ~75% done
    • Reconstruction of HI data almost finished
    • Started of full reconstruction of Run2 data and MC
      • This will keep us busy for most of LS2
    • All requires quite some tape staging
  • Migration to CRIC continues
    • Legacy CMS siteDB now frozen

LHCb

  • Smooth production of both MonteCarlo and real data
    • No major site issues
    • CNAF outage slightly delayed our re-processing campaign of Run1 data, but now catching up
  • Starting using containers, the following requirements are needed (LHCb VO card updated accordingly)
    • CentOS7
    • Singularity
    • user namespaces
    • /cvmfs/cernvm-prod.cern.ch (in addition to /cvmfs/lhcb.cern.ch, /cvmfs/lhcb-condb.cern.ch, /cvmfs/grid.cern.ch) on the worker nodes
    • we will be ticketing sites (starting from Tier1s, then Tier2s) that are not ready with those.

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • See presentation of the new accounting workflow

Archival Storage WG

Containers WG

CREAM migration TF

dCache upgrade TF

DPM upgrade TF

Information System Evolution TF

  • MONIT team started integration with CRIC to get rid of dependency on REBUS

IPv6 Validation and Deployment TF

Detailed status here.

Machine/Job Features TF

Monitoring

  • Status report presentation is planned for the October meeting

MW Readiness WG

Network Throughput WG


Squid Monitoring and HTTP Proxy Discovery TFs

  • Nothing to report

Traceability WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r23 | r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r19 - 2019-08-29 - ConcezioBozzi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback