Security models for Grid data management.
There are three models for discussion, both with pros and cons.
The three models differ in the place where the authorization decision is made and enforced:
- Model 1.
- Authorization Decision and Enforcement is done in the underlying Storage Resource Layer (i.e. the SRM) SRM-Model
- Model 2.
- Authorization Decision and Enforcement is done in the Access Protocol Layer (i.e. in the I/O and gsiftp services) IO-Model
- Model 1.5
- Authorization Decision is done outside the Storage Resource Layer and Enforcement is done in the underlying Storage Resource Layer
Model 2 is described in detail in the
gLite architecture
. (Model 1 was
not considered there due to the disadvantages of that model - see later).
In the discussion below we denote:
- o these are conclusions -- comments to the actual statement
- + these are advantages
- - these are disadvantages
Definitions
- SE
- Storage Element, which has a management (SRM); transfer (gridftp or https) and access protocol (rfio, dcap or local file) service
- SRM
- Storage Resource Manager
service
Model 1.: SE decision and enforcement
Model 1: The SE does the file authorization decision and the enforcement of the decision..
1.1 The SRM provides posix ACLs.
- + well-defined semantics
- + straightforward to implement
- - not standardized by the SRM v1.1 standard yet, but included in SRMv2 and SRMv3
- - not implemented by all SRMs yet (exception is LCG DPM)
- - difficult to keep synchronized across SRMs, especially if a whole directory tree has to be created to replicate a file. this is problematic as the ownership of higher level directories may be different from the actual file, in addition the principal replicating the data may not be the owner of the file at all. maybe this necessitates a specific 'replicate' ACL (not posix anymore) and a dedicated replication service (could be built into FTS)?
1.2 The SRM namespace is relevant - it has to hold the authorization semantics.
- o the SURL directory names will have to be known and managed explicitly by the users.
- - only SRM v2 provides a full set of namespace management methods
- - if users/groups are changed or deleted, each SRM has to be updated individually for all files owned by the given principal, which is potentially an expensive operation
- +/- the SURL namespace has semantics now which can be made use of
1.3 Writing data into SE
The user has to write into a namespace where it's authorized to write to.
- o on file creation the directory has to exist and be writable by the user
- - VOs have to be given new namespaces on allocation as an administrative extra operation
1.4 Reading data from SE
The user's credentials are checked for authorization (ACLs) and it's allowed to read only according to that
- + doesn't matter what protocol is used to access the SRM the authorization decision will be made based on the SURL namespace by the SRM
1.5 Consistency of security information across sites
- - a dedicated consistency service will have to be provided that makes sure that replicas across SRMs have the same authorization information, see also comment in point 1.1
- o the LFN namespace in a Grid catalog would also need to be synchronized with the SURL namespace for the security semantics to make sense. probably there should be a strong tie between the LFN and the SURL in this case (manybe just a difference in prefix?)
- o GUIDs don't carry any security semantics as the SURLs carry the authoritative authorization information locally on the SRM level. Logical File Names and GUIDs are only convenience information on top of the SURL namespace and can completely be decoupled.
- - Renaming and changing ACLs will become very expensive operations.
Model 2: VO decision and enforcement
Model 2: A Grid access service does the file authorization decision
and the enforcement of this decison. This access service is tied to
a protocol, i.e. glite-io or xrootd.
2.1 ACLs are stored in the Grid File Authorization Service based on GUIDs.
- o The SURLs in an SRM do not need to carry any semantic information ie their namespace is irrelevant for authorization.
- o LFN-GUID mappings are decoupled and can be stored in a grid file catalog. The LFN namespace can be secured in the catalog but only the GUID ACLs count.
- + rename and change acl operations are cheap.
- o the SURL does not have to be in sync with the LFN namespace, which means that orphaned files cannot be associated with anyone based on just the SURL. for accounting purposes the grid access door has to be used.
2.2 These are NTFS - like ACLs ie not posix.
The difference is on the traversal of the hierarchy: it is irrelevant in the NTFS
case, only the final, leaf ACL counts.
- + ACL synchronization across sites does not depend on the local SRM semantics.
- - The FAS has to be distributed and synchronized separately.
2.3 Writing into the storage has to go through the grid access door.
The Back door (direct native i/o to the storage device) has to be
secured somehow (i.e. a grid service user may own all the data in the SRM,
which in turns is runnig the grid access door).
- - each storage device has to be interfaced with the Grid access service - gsiftp and an i/o as default, in the gLite model it's glite i/o.
- +/- Replication including all security data is possible via a dedicated replication service (FPS). After the data transfer it has to update the local FAS information with the proper full ACLs. This again is a 'superuser' operation which should only be allowed to be done by the dedicated replication service. So here we end up with a network of trusted grid services that are allowed to do 'superuser' operations on the SRM grid access doors. with X509 this can get complicated as both the service and the user credentials have to be passed at connection time.
2.4 Reading data from the storage has to be authorized by the FAS
- - the data authorization is potentially complicated: the access door has to do one of three things to authorize access :
- a callout to the FAS for authorizing the given user's credentials for the operation
- trust the service that is trying to read the data on the user's behalf as a superuser service (like the FPS)
- accept a token that has been signed by the FAS that the user is indeed allowed to perform the requested operation (CAS model)
- - this may be an expensive operation in terms of performance
- + it's consistent across different SRM implementations and does not need any synchronization of SRM information
2.5 Synchronization of data across sites and between access domains
- - in this model data that has been put into the storage through other means than the grid access door has to be 'moved' to the grid authorization domain explicitly - this may be an administrative operation
- - back-door changes cannot be tracked and will destroy consistency
- + consistency across sites is as good as the FAS distribution allows it. distributing the authorization catalog is probably an easier problem than to agreeing with all SRM implementations on the authorization semantics.
- + also SRMs with very limited security capabilities (e.g unix) can be taken as a fully secure grid storage service.
Model 1.5: VO decision, SE enforcement
It is possible to merge these two models if we do the following:
- make the authorization decision in the VO affiliated File Authorization Service (FAS)
- enforce the decision in the site affiliated Storage Element (SE)
- make the native i/o adhere to the grid door concept and provide the same semantics for additional doors (gsiftp)
- trust the grid services to do special operations on behalf of the user (replication use case where all authz information has to be copied).
There are still two ways to implement this model:
Push mode
The user acquires an authorization token from the FAS, before any operation happens
and passes this token to the Storage Element (or any intermediate services, like
FTS).
The Storage Element evaluates the token, trusting the FAS that it made the right
decision, and grants or denies access to the file.
- + read performance is the same as the native performance of the SE, there is no intermediate service, which could create a bottleneck
- o SE has to trust the FAS
- o SE has to understand the authorization token (standardization?!)
Pull mode
Upon any request from the user (or from an intermediate service, like
FTS, which
acts on the user's behalf), the SE checks the permission in the FAS.
- + read performance is the same as the native performance of the SE, there is no intermediate service, which could create a bottleneck
- o SE has to trust the FAS
- o SE has to be able to communicate with the VO FAS service
- - SE serving multiple VOs would have to communicate with multiple File Authorization services, depening on the user's VO
This would make the SEs very grid-like already and has the advantage of
forcing a uniform synchronizable ACL semantics on all SEs.
--
AkosFrohner - 31 May 2006