WLCG Operations Coordination Minutes, October 11th, 2018

Highlights

Agenda

https://indico.cern.ch/event/757611/

Attendance

Operations News

Special topics

Middleware News

Important notice concerning the support of TLS v1.2 on WLCG

  • On Sep 21 a Globus update in the EPEL repositories made TLS v1.2
    the only version to be supported for security handshakes in GSI.
    • The concerned package is globus-gssapi-gsi-13.10 .
  • Unfortunately, a significant number of grid services in WLCG
    were not ready for that change and started running into failures.
  • We therefore asked for the minimum supported version to be set
    to TLS v1.0 again and we arranged for services like the FTS either not to
    apply the Globus update yet, or to adjust /etc/grid-security/gsi.conf :
       MIN_TLS_PROTOCOL=TLS1_VERSION_DEPRECATED
       
  • Version globus-gssapi-gsi-14.7-2 has that temporary workaround
    and should soon become available in EPEL.
    • It currently is present in the EPEL-testing repositories.
  • In the meantime we would like all potentially affected services
    to be checked and updated as needed.
  • Such services may directly depend on Globus themselves,
    but could also be based on Java instead.
  • Of particular concern are SRM, GridFTP, CE and Argus services.
    • SRM services listen on port 8443 (dCache), 8444 (StoRM) or 8446 (DPM).
    • The CREAM CE service listens on port 8443.
    • GridFTP services used by CREAM, ARC and SE head nodes listen on port 2811,
      while the port may be unpredictable on SE disk servers.
    • Argus listens on port 8154.
  • To test SRM, CREAM, Argus or any other HTTPS service, please run a command like this:
       openssl s_client -tls1_2 -connect HOST:PORT 2>&1 < /dev/null |
          egrep '^New|Protocol|known|Bad|refused|route'
       
  • The following output is a sign of failure:
       New, (NONE), Cipher is (NONE)
       
  • To test a GridFTP server, one needs a valid VOMS or grid proxy:
       env GLOBUS_GSSAPI_MIN_TLS_PROTOCOL=TLS1_2_VERSION uberftp HOST pwd
       
  • If any of those commands fails due to the TLS v1.2 requirement:
    please update Java/Globus on the affected service to a recent version,
    restart the service and try again.
  • We will need to set a deadline for TLS v1.2 support to early 2019
    and will let you know when the timeline has become clearer.
  • Please report issues you encounter through the usual channels.

Tier 0 News

  • CERN would like to ask the experiments what notice they would need to have the majority of batch resources here changed to CC7, assuming any intervention would take a couple of weeks to roll-out.

An action for the experiments has been created

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Normal activity levels on average
  • No major issues

ATLAS

  • Smooth Grid production over the last weeks with ~300k concurrently running grid job slots. Additional HPC contributions with peaks of ~50k concurrently running job slots and ~10k jobs from Boinc.
  • Commissioning of the Harvester submission system via PanDA is on-going on the Grid. CERN, the TW, ES, IT, UK cloud have largely been migrated.
  • Heavy Ion throughput tests from CERN point1 to EOS to Tape and 3 Tier1s worked all fine.
  • The first part of the tape carousel R&D campaign at the Tier1s using 200-300 TB of AOD is finished. Stage-in rate from 300 MB/s to 3 GB/s at the different sites have been observed.

CMS

  • LHC running well and CMS is collecting good data, two more weeks of p-p running
  • heavy-ion P5-->EOS rate test successful on day two
  • finalizing software and operation model for heavy-ion run in November
  • stability of EOS fuse mount improved but still encountering read issues (e.g. on 2018-Oct-10)
  • two CMS EOS crashes in the last two weeks, ?both on Thursdays?
  • Fermilab FTS issue traced down to slow CERN-->Fermilab transfers, being investigated, GGUS:137632
  • switched from 2017 Monte Carlo configuration to 2018 MC to be the dominant workflow
  • compute systems busy at above 200k cores, usual mix of about 75% production and 25% analysis

LHCb

  • Operations as usual, nothing specific to report

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

Archival Storage WG

Update of providing tape info

PLEASE CHECK AND UPDATE THIS TABLE
Site Info enabled Plans Comments
CERN YES    
BNL YES    
CNAF YES   Space accounting info is integrated in the portal. Other metrics are on the way
FNAL YES    
IN2P3 YES   Space accounting info is integrated in the portal. Other metrics are on the way
JINR YES    
KISTI YES   KISTI has been contacted. Will work on in the second half of September
KIT YES    
NDGF NO   NDGF has a distributed storage which complicates the task. Discuss with NDGF possibility to do aggregation on the storage space accounting server side. Should be accomplished by the end of the year
NLT1 YES   Almost done, waiting for opening of the firewall, order of couple of days
NRC-KI YES    
PIC YES   Space accounting info is integrated in the portal. Other metrics are on the way
RAL YES   Space accounting info is integrated in the portal. Other metrics are on the way
TRIUMF YES    
One can see all sites integrated in storage space accounting for tapes here

Information System Evolution TF

  • Ongoing discussion on the publishing of the CE configuration via JSON file. More details can be found here
  • Storage Resource Reporting implementation by all WLCG storage middleware providers is progressing. More details here
  • Next WLCG IS Evolution Task Force meeting will take place on the 18th of October. Will continue discuss json file structure for CE configuration publishing. UK sites will present their first experience with publishing CE description in json format.

IPv6 Validation and Deployment TF

Detailed status here.

Machine/Job Features TF

Monitoring

MW Readiness WG

Network Throughput WG


Squid Monitoring and HTTP Proxy Discovery TFs

  • LHC@Home is now almost completely switched to using openhtc.io (Cloudflare) cached cvmfs & CMS Frontier services instead of using squids at CERN & Fermilab (except for a small trickle of jobs accessing only /cvmfs/grid.cern.ch). Web Proxy Auto Discovery (WPAD) is used to discover squids when LHC@Home jobs are run at WLCG sites.
  • Plans are being made to integrate a shoal service (for dynamically registering squids) with the WLCG WPAD service. This is intended for squids running in clouds serving WLCG jobs. We will also exclude the dynamically registered squids from being treated as worker nodes in the failover monitor.

Traceability WG

Container WG

Action list

Creation date Description Responsible Status Comments
03 Nov 2016 Review VO ID Card documentation and make sure it is suitable for multicore WLCG Operations In progress GGUS:133915
07 Jun 2018 GDPR policy implementation across WLCG and experiment services WLCG Operations + experiments Ongoing Details here

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments
13 Sep 2018 moving most of CERN batch to CC7 all - 11 Oct   how much advance warning needed?

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

-- JuliaAndreeva - 2018-10-08

Edit | Attach | Watch | Print version | History: r16 | r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2018-10-11 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback