WLCG Operations Coordination Minutes, July 2, 2020





  • local:
  • remote: Alberto (monitoring), Andrew (TRIUMF), Borja (monitoring), Concezio (LHCb), Dave (FNAL), David (Technion), Felix (ASGC), Gavin (T0), Giuseppe (CMS), Horst (Oklahoma), Johannes (ATLAS), Maarten (ALICE + WLCG), Matt (Lancaster), Nikolay (monitoring), Pedro (monitoring), Stephan (CMS), Vincent (security)
  • apologies:

Operations News

  • The next meeting is planned for Sep 3


  • Concezio:
    • the MJF functionality is not critical for LHCb at WLCG sites
    • we do depend on it for Vac and cloud setups, though
  • Maarten:
    • WLCG Ops Coordination is concerned with activities that potentially are
      of interest to more than 1 experiment
    • for MJF that looked the case a few years ago, but in the end only LHCb
      wanted to pursue that functionality
    • we just move the TF to the page listing the closed TFs (done),
      all its materials remain available for continued use

Special topics

WLCG Critical Services proposal followup

  • There were no objections to go ahead with proposed changes,
    which can still be fine-tuned further as we implement them
  • To be finalized by autumn

CERN Grid CA OCSP incident

  • After a scheduled intervention on June 24, the CERN Grid CA OCSP service
    became inaccessible from outside CERN (OTG:0057432)
  • Requests to the service were dropped by the CERN perimeter firewall
  • A CREAM CE will try to check a client certificate's status via OCSP,
    if the existence of such an endpoint is indicated in the certificate details
    • It appears other CE flavors rely on CRLs only and just ignore OCSP services
  • Checks of CERN Grid CA certificates were then hanging until a timeout was reached
  • The CREAM client code would time out first, thus failing job submissions that
    used CERN Grid CA certificates
  • This affected the 4 experiments and, through the SAM tests, sites running CREAM
    • Some A/R recomputations may be needed
  • The service was restored about 24h later on June 25
  • Some improvements are foreseen to make reoccurrence a lot less probable

  • Further details here


  • Maarten: the trouble was that requests were not failing quickly;
    as far as we know, if e.g. the connection is refused, it is not fatal

  • Stephan: is only CREAM concerned? we saw a ticket implicating an Xrootd service
  • Maarten: please send me the details and I will look into the need for followup
    • after the meeting: it probably was a mistaken inference

SAM migration progress

see the presentation

  • Also see the GDB presentation that will happen next week
  • There were no comments

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports


  • Mostly business as usual, no major issues


  • Stable Grid production with up to ~380k concurrently running grid job slots with the usual mix of MC generation, simulation, reconstruction, derivation production and user analysis, including ~45k slots from the HLT/Sim@CERN-P1 farm and ~15k slots from Boinc. Occasional additional peaks of 200k job slots from HPCs.
  • Continuing with about 60k job slots used for Folding@Home jobs since 4 April. 50% from ~55 different grid sites via opt-in and 50% at CERN-P1
  • No other major other issues apart from the usual storage or transfer related problems at sites
  • Finishing grand unification of production+analysis queues in PanDA in the next days.
  • All systems recovered quickly from Oracle/DBonDemand downtime last Saturday - would appreciate to avoid such downtimes over the weekend next time
  • CTA in production for ATLAS since Monday - still fixing some issues in Rucio/middleware


  • Covid-19 compute contributions being returned to experiment use
  • main processing activities:
    • Run 2 ultra-legacy Monte Carlo
    • Run 2 pre-UL Monte Carlo
  • migration to Rucio ongoing
    • production of nanoAOD samples configured for PhEDEx being bumped up to complete more quickly


  • still running F@H on part of HLT farm
  • large MC requests coming up so we are going to reduce this Covid-19-related activity
  • processing (small) samples of lead-lead collisions and lead-neon fixed target collisions
  • grid drained in preparation for the CERN Oracle/DBOD outage of last Saturday, DIRAC services and agents switched off, then on again after the outage, everything went extremely smoothly

Discussion on F@H reductions

  • Maarten: it is perfectly defensible to ramp down resources for F@H,
    as we have already done a lot and we cannot neglect our own duties

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

Archival Storage WG

Containers WG

CREAM migration TF

Details here


  • 90 tickets
  • 14 done: 7 ARC, 7 HTCondor
  • 16 sites plan for ARC, 15 are considering it
  • 20 sites plan for HTCondor, 14 are considering it, 8 consider using SIMPLE
  • 14 tickets on hold, to be continued in the coming weeks / months
  • 7 tickets without reply
    • response times possibly affected by COVID-19 measures

dCache upgrade TF

DPM upgrade TF

StoRM upgrade TF

Information System Evolution TF

IPv6 Validation and Deployment TF

Detailed status here.

Machine/Job Features TF


MW Readiness WG

Network Throughput WG

Traceability WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments


Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2020-07-03 - MaartenLitmaath
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback