This document is now being maintained in MS-Word. The latest version is:

Old text follows

Usability Issues in WLCG Security

We start with a discussion of current usability challenges identified by the TEG participants related to security in the WLCG. We divide the list of issues into problems typically seen by scientific researchers who comprise the WLCG's users, and the WLCG administrators.

From the Research Perspective

  • Credential management: Some WLCG users have trouble with credential management throughout the lifecycle, including obtaining a credential (including the time required) and maintaining it securely without losing it or forgetting the encryption pass phrase. This problem is increased when a user attempts to undertake multiple activities, which require different VOMs attributes; This requires juggling multiple proxies, with different VOMS attributes in each, and brings the risk of getting them mixed up, accidentally overwriting them, etc.

  • Incoherent proxy storage on complex systems: Proxy credentials are by default stored in /tmp, which for some systems (e.g. clusters) is not shared across the whole system, leading to incoherencies in availability of the proxy to activities.

  • Lack of web authentication: X.509 credentials are difficult to import into or export from browsers, making it very difficult to use a given credential for both web and command-line activities. Additionally, support for RFC 3820 proxy certificates by web services is weak.

  • Lack of internationalization: Support for internationalization of names in X.509 distinguished names is not complete and use of non-ASCII characters can cause failures.

From the Administrator's Perspective

  • Managing revocation: Revocation of certificates is accomplished through certificate revocation lists (CRLs), which have finite lifetimes (typically in the range of a week to a month). When a CRL expires, software still using that CRL will fail safe and fail all authentication attempts. This means that timely updates of CRLs are an operational necessity and any failure of a certificate authority to produce a new CRL or a relying party to obtain and properly install it results in a service outage.

  • Expired host and service certificates: Certificates used by hosts and services have finite lifetimes, typically a year or small number of years. After they expire, clients attempting to use those hosts and services will experience failures. Currently administrators are responsible for "manually" monitoring these certificates and renewing them before expiration to provide continuous service. The case of a cluster with tens or hundreds of certificates may pose issues of scale.

  • Managing authorization policies: Authorization is controlled by policies that are encoded in a number of different places: grid-mapfiles, CA signing policies, VOMS, GUMS, etc. Authoring and maintaining all of these to coherently represent the mission of the projects and VOs is a complicated, error-prone task.

  • Client authorization of hosts and services: placeholder, to be completed Clients expect services to identify themselves with host certs by default and therefore may need special configuration to accept anything else, viz. a service certificate.

  • Inconsistent user banning mechanisms: It is occasionally necessary to "ban" a user, or suspend their ability to access resources, due to a suspected or confirmed security incident, misbehaviour of the user or their software, or other administrative reasons. This differs from certificate revocation which is meant to globally revoke a user's authentication credentials, and may not be an appropriate vehicle due to the scope or longer-term authentication effects. Similar to more general problems of managing authorization policies, software services differ in how they allow for banning or whether they even do, leading to a inconsistent, error-prone process.

  • Mixing of authentication and authorization: Proxy certificates, through their use of impersonation (allowing one entity to appear as another entity) to support delegation, are inherently a mix of two security concepts - authentication (identification of an entity) and authorization (what is that entity allowed to do). This creates some usability problems such as masking of a delegatee's identity in logging (they get logged as the delegator) and confusion between the use of entity names and VOMS attributes in authorization policies.

  • Lacking debugging and forensics: Debugging of failures is a challenge caused by the multiple software stacks with different error propagation characteristics and capabilities, and the distributed nature of the system (and the resulting logs) and its users community. This causes challenges in accessing needed log and configuration data for debugging (assuming it exists), and interpreting it as logging across different software components is not standardized in terms of content, syntax or semantics.

  • Inconsistent proxy certificate implementations: Inconsistent implementations of support for proxy certificates lead to incompatibilities and difficulty in other administrative tasks, such as infrastructure upgrades. An example is the lack of RFC 3820 proxy certificate support in some software, leading to difficulties in removing dependencies on weak cryptographic algorithms.

  • X.509 validation overhead: Currently the validation of X.509 credentials is done for each network connection within the WLCG. Each of these validations is equivalent to the validation done for a TLS/SSL handshake as part of a web HTTPS connection. And while Google has done significant work in minimizing this overhead, the aggregate load across the WLCG may be significant.

Possible Improvements

We now turn from problem identification to some possible improvements to deal with those problems. We sort the possible improvements into short-term, defined as something that could be put into place in less than 2 years, and long-term, or something that would require more than 2 years to put into place.

Short-Term

In general, effort should be made to "hide" PKI/X.509 from the end-users of the system as much as possible. Options here include:

  • Minimizing the enrollment process by leveraging existing sources of identity and authentication. For example, leveraging existing member authentication services either directly, e.g., the IGTF MICS profile, or through federated identity, e.g. the CILogon service.

  • Minimizing the credential management process through the use of short-lived credentials. Short-lived credentials (i.e., those with lifetimes of less than 10^6 seconds or roughly 11 days) as defined by the IGTF SLCS profile in combination with the leveraging of existing authentication mechanisms offers users easy to obtain credentials that are in effect "disposable," removing burdens for long-term curation, migration between computers, renewal, etc.

Other improvements for researchers include:

  • Tools for multiple credentials. Some users need to juggle multiple credentials with different attributes or from different CAs for connecting to different services with different trust policies, or for acting in different roles. A set of tools or improvements to existing client software to allow users to maintain multiple "roles" each with a different credential and switch easily between those roles would be beneficial.

From the administrator perspective, short-term improvements include:

  • Tools for service credential lifecycle management. Develop a set of supporting tools to help monitor and renew host and service credentials to prevent expiration failure and reduce workload during normal operation.
    • Automatic renewal may not be possible in many cases, but a cron job could at least alert the admin on time.

  • Improved revocation. In addition to short-term tools for improved CRL management, an analysis should be undertaken with regards to the trade-off between availability and security with regards to the behavior of software in "failing safe" when an expired CRL is encountered. Allowing configuration of behavior such that software will continue to function with an expired CRL but produce strong warnings may be desirable in some situations and reduce availability failures.

  • Standards for logging. Having a set of defined use cases, derived requirements and ultimately standards for logging would aid in a variety of activities from debugging to cybersecurity incident response.

  • Usability Evaluation. Having an evaluation of the usability of grid security and relevant tools performed by expert in the area of usability and human computer interaction could yield improvements or new tools that could be made to improve ease-of-use.

Long-Term

Some of the long-term improvements include:

  • A coherent grid security implementation library. Having a coherent implementation of grid security features (e.g., RFC 3820 Proxy Certificates) in appropriate languages and frameworks would address many incompatibility challenges being faced today (e.g., difficulties in migrating to modern hash algorithms because of lack of support in some software stacks). A challenge here is the lack of a generic security library across multiple languages - each language has its own de facto standard(s) (e.g., OpenSSL, Bouncy Castle) that are not coordinated. Having a single grid security library, with implementations in the most popular languages and a standard test suite, would address this incoherency and make development of new grid-based applications both easier and more secure.

  • Consider re-implementation leveraging experiences. Over the past fifteen years, there has been much learned about how global-scale grids operate in practice. Rather than incremental improvements described to this point, a next generation may be considered. Aspects to consider would be to re-approach impersonation and alternatives that separate authentication and authorization more cleanly in the delegation process; push public key cryptography to the outer edges and use some sort of session token (e.g., as in the Condor CEDAR implementation) for transaction authentication; implementation of role and group authorization, particularly in the context of the growth of data.

-- VonWelch - 05-Mar-2012

Topic attachments
I AttachmentSorted ascending History Action Size Date Who Comment
Unknown file formatdocx WLCG-TEG-Security-Usability-v0.1.docx r1 manage 126.9 K 2012-04-12 - 16:39 VonWelchExternal Usability and Security write up v0.1 (word version)
PDFpdf WLCG-TEG-Security-Usability-v0.1.pdf r1 manage 123.4 K 2012-04-12 - 16:40 VonWelchExternal Usability and Security write up v0.1 (pdf version)
Unknown file formatdocx WLCG-TEG-Security-Usability-v0.2.docx r1 manage 125.3 K 2012-04-17 - 20:43 VonWelchExternal Usability and Security write up v0.2 (word version)
PDFpdf WLCG-TEG-Security-Usability-v0.2.pdf r1 manage 124.0 K 2012-04-17 - 20:43 VonWelchExternal Usability and Security write up v0.2 (pdf version)
Unknown file formatdocx WLCG-TEG-Security-Usability-v0.3.docx r1 manage 138.3 K 2012-04-30 - 22:34 VonWelchExternal Usability and Security write up v0.3 (word version)
PDFpdf WLCG-TEG-Security-Usability-v0.3.pdf r1 manage 131.1 K 2012-04-30 - 22:35 VonWelchExternal Usability and Security write up v0.3 (pdf version)
Unknown file formatdocx WLCG-TEG-Security-Usability-v0.4.docx r1 manage 38.1 K 2012-05-03 - 17:11 VonWelchExternal Usability and Security write up v0.4 (word version)
PDFpdf WLCG-TEG-Security-Usability-v0.4.pdf r1 manage 132.4 K 2012-05-03 - 17:12 VonWelchExternal Usability and Security write up v0.4 (pdf version)
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2012-05-03 - VonWelchExternal
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback