DRAFT DRAFT DRAFT

This is the EGEE-III requirements list that has been initially populated with the remaining requirements from EGEE-II. The requirements will be regularly assessed by the TMB and new requirements can be added after discussion in the TMB.

Index Issue/Requirement Res Estimated Cost Comments Status
100 Security, authorization, authentication - - - -
101 (was 101, 308, 501, 502, 503, 513, 572) VOMS groups and roles used by all middleware;
Support for priorities based on VOMs groups/roles;
All user level commands must extract VO information from a VOMS proxy;
Services must provide access control based on VOMS groups/roles. Critical is fine grained control to: files, queues and metadata;
Ability to control access to data based on the schema with a granularity of entries and fields
JRA1,SA3
tracked in:
2926, 2927, 2928, 2935
Short term solution for job priorities is to use VOViews.
Available on the LCG-RB. Code frozen for gLite WMS (wait for the first post-glite 3.0 release of WMS)
See: Jeff T. document
Longer term use GPBOX to define and propagate VO policies to sites. Prototype targetted to the gLite 3.0 infrastructure is available but not integrated and tested.
O(10) groups O(3) roles. ATLAS: O(25) groups, O(3) roles LHCb: Not clear item, should be specified for each service separately; ATLAS remarked that this should work without a central service Sites: VOMS should be used by users too; non-VOMS proxy means no special roles or privileges at site. -
103 Automatic handling of proxy renewal Sec. - users should not need to know which server to use to register their proxies for a specific service LHCb: item should be split by services -
103a proxy renewal within the service JRA1 - - -
103b establishment of trust between the service and myproxy Sec
tracked in 2929
     
103c find the right myproxy server SA3 (configuration)    
104 Automatic renewal of Kerberos credentials via the GRID      
105 Framework and recommendations for developing secure experiment specific services JRA1   including delegation and renewal LHCb: this should include certification of already developed by experiments security frameworks;
this is also a requirement from the VO boxes group Sites: agree this is a requirement.
 
110 Information System - - -  
111 (was 111, 130) Stable access to static information;
No direct access to the information system for any operation
a) caching of endpoint information in the clients and
b} not to need to go to the information system if the information is already available elsewhere (e.g. through parameters)

tracked in 3069
  service end points, service characteristics
stressed by LHCb LHCb: 124,128,130 to be merged Sites: should be addressed by split of IS into static and dynamic parts, currently discussed within GD
-
120 Storage Management - - - -
122 Same semantic for all SEs SRM group -   Sites: isn't this agreed part of Witzig proposal?
125 Disk quota management storage solution group - at group and user level  
126 Verification of file integrity after replication JRA1
tracked in 2932
  checksum (on demand), file size  
200 Data Management - - -  
210 File Transfer Service - - -  
213 Real-time monitoring of errors       has to "parser friendly" and indicate common conditions (destination down..)  
217b SRM interface integrated to allow specification of lifetime, pinning, etc.   LHCb: Different types of storages should have different SE's, pinning is important here storage type done in 217; this lists the remaining work
218 Priorities, including reshuffling        
240 Grid File Catalouge - - -  
243 Support for the concept of a Master Copy     master copies can't be deleted LHCb: can not be deleted the same way as other replicas  
244 Pool interface to LFC pool   access to file specific meta data, maybe via a RDD like service LHCb: Pool to use gfal which will be interfaced to LFC  
250 Performance:   - - -  
260 Grid Data Management Tools   - - -
262a POSIX file access based on LFN tracked in 2936      
262b including "best replica" selection based on location, prioritization, current load on networks     research problem    
263 file access libs. have to access multiple LFC instances for load balancing and high availability      
264 reliable registration service     supporting ACL propagation and bulk operations  
265 reliable file deletion service that verifies that actions have taken place and is performant       Sites: ATLAS is asking us for this.  
266 Staging service for sets of files       LHCb: item not clear  
300 Workload Management   - - -
302 Single RB end point with automatic load balancing JRA1/SA3 Design required. Major issue. No estimate    
303 No loss of jobs or results due to temporary unavailability of an RB instance     Standard linux HA available now (Two machines, one dies, one takes up)
Multiple RB plus network file system (N RB�s using NFS/AFS shared disk, hot swap RB�s to replace failed ones with same name, IP, certs, status, jobs within minutes): 1 FTE*month (JRA1/WM) + N (~3) months test
   
307 Latency for job execution and status reporting has to be proportional to the expected job duration JRA1   Support for SDJ at the level of middleware is in the first post gLite 3.0 release of WMS.    
310 Interactive access to running jobs JRA1   job file perusal in gLite 3.0.
For basic functionalities like top, ls need ~ 1 FTE month. More design needed for full functionalities.
and commands on the WN like: top, ls, and cat on individual files  
311 (was 311, 405) All CE services have to be accessible directly by user code; Computing element open to VO specific services on all sites tracked in 3072 CEMon already in gLite 3.0. CREAM prototype targetted to the gLite 3.0 infrastructure is available but not integrated and tested. Sites: what does this mean?  
312 Direct access to status of the computing resource (number of jobs/VO ...) JRA1 Using CEMon (in gLite 3.0) Sites: don't we already have this in the info system? sites likely to reject any proposal to query computing element directly, yet another way to poll the system to death. this is why we have an information system. users will have to accept that the information may be O(1 min) old.  
313 (was 313, 314) Allow agents on worker nodes to steer other jobs; Mechanism for changing the identity (owner) of a running job on a worker node
tracked in 3073
glexec available as a library in gLite 3.0
O(1 FTE month) to have it as a service usable on the WNs, but it is a security issue to be decided by sites
Sites: what does "agents on WN steer other jobs" mean?  
320 Monitoring Tools   - 0 - - -
321 Transfer traffic monitor     0    
322 SE monitoring, statistics for file opening and I/O by file/dataset, abstract load figures     0    
324 Publish subscribe to logging and bookkeeping, and local batch system events for all jobs of a given VO     0    
330 Accounting   - 0 - - -
331 By site, user, and group based on proxy information   all applications
tracked in 2941
Suitable log files from LRMS on LCG and gLite CE in first post-gLite 3.0 release. DGAS provides the needed provacy and granularity. APEL provides an easy collection and representation mechanism for aggregate information. DGAS; all applications should check whether the currently available information is enough  
333 Storage Element accounting     0  
400 Other Issues     0 have been grouped under Deployment Issues and partially deals with services provided at certain sites
402 Different SE classes, MSS, Disk with access for production managers, public disks storage     0    
500 From Cal's list   0 - - -
510 Short Deadline Jobs SDJ WG 0 - - -
511 The release should support SDJ at the level of the batch systems     0 required for glite 3.0
512 The resource broker has to be able to identify resources that support SDJs     In first post-gLite 3.0 release of WMS as far as 511/406 are satisfied required for glite 3.0. BUG:31278
514 Modify system to ensure shortest possible latency for SDJs     design needed longer term
520 MPI MPI WG     Use cases: Running large scale parallel applications on the grid effectively
521b Publication of the maximum number of CPUs that can be used by a single job   NA4
tracked in 2938
0 required for glite 3.0
530 Disk Space Specification     Handled with information pass-through via BLAH. Available as a prototype in the first post-gLite 3.0 release. Would need at least 1 FTE month for each supported batch system to use it. Usecases: Jobs need scratch space, shared between nodes (MPI) or local and will fail if this resource is not available
531 Specification of required shared disk space     As in 530 required for glite 3.0. Needs deployment of CREAM CE + plug-ins
532 Specification of required local scratch disk space     As in 530 required for glite 3.0
540 Publication of software availability and location     0 Usecases" applications use certain software packages frequently. Not all have standard locations or versions.
541 (was 541,542) Publication of the Java and Python version; Mechanism to find the required versions of those packages     0 required for glite 3.0; discussion not conclusive yet Sites: note this is an old HEPCAL requirement
550 Priorities for jobs Job Priorities WG 0  
551 Users should be able to specify the relative priority of their jobs     0 required for glite 3.0
552 A VO should be able to specify the relative priority of jobs     0 required for glite 3.0. Groups can have different priorities, but VO control is not available
553 VO and user priorities must be combined sensibly by the system to define an execution order for queued jobs     0 required for glite 3.0
580 Encryption Key Server     0 Usecases: data can be highly sensitive and must be encrypted to control access
581 Ability to retrieve an encryption key based on a file id     0  
582 Ability to do an M/N split of keys between servers to ensure that no single server provides sufficient information to decrypt files   0  
583 Access to these keys must be controlled by ACLs   0  
590 Software License Management   0  
591 Ability to obtain licenses for a given package form a given server   0  
592 Access to the server must be controlled via ACLs based on grid certificated   0  
592 The system should know about the availability of licenses and start jobs only when a license is available   0  
600 Database Access DB access WG 0 Usecases : Application data resides in relational and XML DBs. Applications need access to this data based on grid credentials Sites: a very oft requested feature by non-HEP users!
601 Basic access control based on grid credentials   NA4
tracked in 2937
0 NA4 to evaluate ogsa-dai
602 Fine-grained control at table, row and column level   0  
603 Replication mechanism for data bases   0  
604 Mechanism to federate distributed servers (each server contains a subset of the complete data   0 Sites: a very oft requested feature by non-HEP users (esp. biobanking)
701 OutputData support in JDL   From Savannah bug #22564  

-- ErwinLaure - 11 Jun 2008

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2008-06-11 - ErwinLaure
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback