Globus retirement considerations in WLCG
Introduction
In the WLCG Management Board meeting of March 17, 2020, there was a discussion
about a possible retirement timeline for the remaining WLCG dependencies on Globus:
Prompted by the plans and timelines that already exist in OSG:
For WLCG, removing the dependency on GridFTP is being tackled in the TPC WG
of the DOMA project:
Currently, GridFTP is also being used for job submissions to CREAM instances,
which have almost all been decommissioned by now, and ARC CE instances,
which already support HTTPS as an alternative.
While it looks viable for WLCG not to depend on GridFTP by the end of 2021,
can we actually remove Globus as a build dependency from the various storage
service implementations (see below) that currently make use of it?
Also taking into account that other communities may be unable to replace
X509 certificates with tokens as "quickly" as WLCG hopes to do, implying
there may need to remain code in the affected implementations that is able
to deal with X509 somehow. Such code may depend on Globus GSI today.
Furthermore, the transitional X509 + VOMS support in the
new WLCG AAI
(for services or workflows that do not support WLCG tokens yet)
may rely on the
RCauth online CA
, which makes use of a
MyProxy server to communicate with its HSM.
More on GSI: besides its use in conjunction with GridFTP and SRM,
WLCG already has a critical dependency on it through MyProxy. For as long
as we need the latter, it may not be a big deal to support GSI in addition,
though in theory, MyProxy could be made independent of it.
Another very important dependency on GSI comes through GSI-OpenSSH,
which is used by VOBOX instances to give login access to privileged members
of supported VOs.
WG for Transition to Tokens and Globus Retirement
The
WG for Transition to Tokens and Globus Retirement
has been mandated to coordinate the many parties involved in the transition from X509 to
WLCG tokens and the gradual phaseout of dependencies on Globus.
Current support
The remaining relevant parts of Globus and a few related products are currently
being maintained by the
Grid Community Forum
,
with contributions from several partners:
- EGI, particularly NIKHEF
- OSG, particularly U Wisconsin-Madison for builds and releases
- NorduGrid, particularly Uppsala
- HPC Center Stuttgart
- ...
Currently supported components:
- GSI
- To deal with X509 in many products
- A critical dependency of the other components
- GridFTP
- Currently the protocol is the main workhorse for data transfers
- The DOMA TPC WG foresees to have it replaced by WebDAV in the course of 2021
- Also used for job submissions to ARC (and CREAM)
- MyProxy
- The standard solution for storing long-lived proxies
- GSI-OpenSSH
- SSH extension supporting X509 proxies for authentication
- Critical for access to WLCG VOBOX instances
- UberFTP
- A handy GridFTP client tool
- In WLCG only used via CREAM, as far as we know
Development aspects
Most of the code is essentially stable. Some fixes may be needed for bugs
that e.g. may be encountered on CentOS 8. The biggest concern
probably was the postponed support for
TLS v1.3
, which has become
available since 3 September 2021. However, if we were forced to start
using that version, further debugging effort might be required from experts
in GSI code and/or other MW involved.
Products that are not affected
- dCache
- XRootD
- EOS
- Echo
- StoRM (webdav/https globus independent)
- DPM (webdav/https globus independent)
Affected products
Grid Community Forum products
- Globus GridFTP server & client
- GSI libraries
- GSI-OpenSSH
- MyProxy server & client
- UberFTP
Argus
- Indirect dependency on GSI through the
argus-gsi-pep-callout
plugin
that is typically used by CE services to refer to Argus
- Such call-outs serve components that themselves depend on GSI
ARC CE
Info provided by Balazs Konya:
- Once the gridftp-jobplugin job submission is dropped (e.g. either the
EMI-ES or the REST interface is used) and third party storage servers
do not require GridFTP, ARC is Globus-free.
- Nothing in ARC pulls Globus dependencies by default,
all Globus dependencies are separated in modular packaging.
DPM
Info provided by Fabrizio Furano:
- Globus is expected to be ejectable with a little tinkering.
- For HTTP it uses a library called
libgridsite
, which does not depend on Globus.
FTS
Info provided by Mihai Patrascoiu:
- FTS has the following as a build dependency:
- Regarding
gfal2
, of course, the dependencies revolve around the SRM and GridFTP plugins:
-
srm-ifce-devel
-
globus-gass-copy-devel
GFAL2
- Only the plugin libraries for GridFTP and SRM depend on GSI.
HTCondor
- X509 proxy support depends on GSI.
- HTCondor-G job submission to ARC CEs currently depends on GridFTP and GSI.
- HTCondor CE authorization often depends on Globus call-outs to LCMAPS or Argus.
LCMAPS
Info provided by Mischa Sallé:
- A (potential) dependency of various relevant products:
- ARC, HTCondor CE (and CREAM)
- E.g. for VOMS mappings or GSI call-outs to Argus
- Globus GridFTP server, for VOMS mappings using GSI call-out (i.e. the
lcas-lcmaps-gt4-interface
)
- Used in that way by StoRM
- GSI-OpenSSH, ditto
- Not on the WLCG VOBOX yet (currently
edg-mkgridmap
is still used instead)
- XRootD plugin
- At least OSG, (US)ATLAS, (US)CMS and GridPP appear to depend on it
- Other products?
- LCMAPS (minus its Globus interface) could be made independent of GSI
- Note: the Globus interface is probably only used by the
lcas-lcmaps-gt4-interface
which itself depends on Globus
StoRM
Info provided by Andrea Ceccanti:
- The StoRM SRM frontend depends on CGSI-gSOAP and Globus.
- We are working on an alternative that won't require those dependencies.
- Not sure we will be ready by the end of 2021, though....
- The StoRM WebDAV service does not depend on Globus at all.
- We are going to keep SRM, that will be needed for tape interaction at CNAF.
WLCG VOBOX
- GSI-OpenSSH server + client
- MyProxy client
Others
Affected experiments
ALICE
- No (recent) dependencies on Globus in offline frameworks.
- Critical VOBOX dependencies on MyProxy and GSI-OpenSSH.
ATLAS
- PanDA:
- The Proxy Cache mechanism requires a credential uploaded to CERN’s myproxy server and fetched by the panda server using myproxy client (link to Harvester documentation
)
- X509 authentication via mod_gridsite
- Harvester:
- ARC-CE submission
- ARC supports a GridFTP interface built on Globus and an HTTPS interface based on SOAP/XML
- Most sites still use the GridFTP interface
- aCT can submit to the HTTPS interface as well as GridFTP
- HTCondor-G currently can only submit to GridFTP interface
- Medium-term plan is for ARC to develop a new REST HTTP interface and HTCondor-G to develop a REST client. ETA mid 2021.
- HTCondor-CE submission
- Version 4+ supports WLCG token/SciToken and GSI based authentication
- Harvester to HTCondor-CE submission demo: ATLASPANDA-505
- Rucio:
- No direct dependency on Globus
- Nota bene
- X509 authentication relies on
mod_gridsite
and voms-proxy-init
for VOMS
- Is there a dependency for those on Globus libraries? I didn't see any?
- Those do not depend on Globus
- Dependency on GFAL for clients, which includes a Globus plugin for gsiftp/srm support
- GlobusOnline support is an optional transfer tool and requires Globus SDK
- Tape access:
-
replacing tape endpoint SRM/GridFTP out of scope of WLCG DOMA TPC working group
-
dCache developers aware of disappearing SRM/GridFTP but no concrete plans yet
- After some development, WLCG DOMA TPC is replacing SRM/GridFTP with SRM/https at dcache and storm sites.
- TAPE REST API subgroup for longer term common solution to tape access
- Tools:
- arcproxy: no dependency on Globus
CMS
- SRMv2, GridFTP, gsiftp is used for data transfer (we are in the process of switching to different protocols and will have a better view end of the year 2020)
- X509 certificate authentication is used in many places
- Global Pool (i.e. all job submissions for production and analysis), SAM, GGUS, etc. (there is work ongoing on CMS side; we'll have a better view after capability tokens are in common use, next year)
- CRAB (tool for submission of user jobs to the Grid) depends on MyProxy in order to be able to submit jobs (including allowing proxy renewal for running jobs by HTCondor) and to move files with user credentials to user home directories at various T2's. Generally speaking MyProxy can only be decommissioned after all services stop using x509 authentication.
- CRAB submission from central server to schedd's uses X509-authenticated HTCondor APIs.
- CRAB will be some work - we'll likely have to carry around both tokens and proxies for a while here.
- Several production services inside CMSWEB also rely on MyProxy to periodically refresh credentials for internal and external communication.
- CMS currently uses Rucio via X509 and all that ATLAS wrote on that applies here as well
- The glideinWMS 3.7.1 will have the ability for all the internal communications on the global pool to be done via tokens instead of GSI. This will allow the global pool to be upgraded to be GSI-free.
- For submitting to CEs, HTCondor is following the progress in ARC toward non-GSI submission. HTCondor-CE already works.
- The Workload Management eco-system (including WMAgent) relies on X509 certificates to authenticate to other central services (mostly CMSWEB), so it would need to be adapted to work with tokens.
- WMAgent job submission - via condor - also makes use of X509 certificates and VOMS roles attached to it (in order to access specific storage paths), so tokens would need to be adopted as well.
- A first prerequisite is to enable this on the HTCondor side and start working to submit jobs with tokens from there.
CMS intends to follow the
published schedule
. It would be useful to ask middleware developers to provide roadmaps on their progress similar to what OSG has done.
LHCb
- DIRAC by itself does not depend on globus, but the middleware dependencies that DIRAC use do (those middlewares are also in this same wiki entry).
- DIRAC does not depend on MyProxy
Other communities
WLCG sites need to support other communities that may not all be able to move
to tokens on the same timescale as WLCG. That could imply that some effort
would need to be found, possibly coordinated with EGI, to maintain at least
the GSI-related components, assuming that SRM and GridFTP can already
be replaced with WebDAV and/or Xrootd, as both are supported by GFAL2.
In principle X509 proxies do not imply the use of GSI, as there are other
libraries (e.g. GridSite) that support X509, but in practice there may be hard
dependencies on GSI for X509 in other SW that is vital to other communities.
In fact, even some of the LHC experiments might be facing such issues
and have to re-implement the handling of X509 in some of their SW,
unless GSI remained supported for a number of years more.
External documents