DRAFT DRAFT DRAFT

This is a merger between Flavia's list of issues that the experiments have supported and Cal's list of requirements. There are fields in the table where the users can assign priorities and the developers can express estimated costs. Each VO should distribute a total of 100 "priority points" between the different issues. The units for the estimated costs are in person weeks.

Under the heading "Origin" you can find wether this is item came from Flavia's list or from Cal's (FD/NA4). You might notice that there no security or deployment issues in this list. These will be prioritized by different means.

A few of the NA4 requirements have a considerable overlap with the issues summarized in Flavia's list. The difference is mostly that the NA4 requirements are more general.

Index Issue/Requirement Origin ALICE ATLAS CMS LHCb Biomed NA4 Estimated Cost Comments
100 Security, authorization, authentication FD - - - - - - - -
101 VOMS groups and roles used by all middleware FD 0 0 0 0 0 0 0 O(10) groups O(3) roles
102 VOMS supporting user metadata FD 0 0 0 0 0 0 0 see list for details
103 Automatic handling of proxy renewal FD 0 0 0 0 0 0 0 users should not need to know which server to use to register their proxies for a specific service
104 Automatic renewal of Kerberos credentials via the GRID FD 0 0 0 0 0 0 0  
105 Framework and recommendations for developing secure experiment specific services FD 0 0 0 0 0 0 0 including delegation and renewal
110 Information System FD - - - - - - - -
111 Stable access to static information FD 0 0 0 0 0 0 0 service end points, service characteristics
112 Identical GLUE schema for gLite and LCG FD 0 0 0 0 0 0 0  
120 Storage Management FD - - - - - - - -
121 SRM used by all Storage Elements FD 0 0 0 0 0 0 0 SRM as specified in the Baseline Services Working Group Report
122 Same semantic for all SEs FD 0 0 0 0 0 0 0  
123 Smooth migration from SRM v1 to v2, gfal and FTS should hide differences FD 0 0 0 0 0 0 0  
124 Direct access to SRM interfaces FD 0 0 0 0 0 0 0 SRM client libs.
125 Disk quota management FD 0 0 0 0 0 0 not before 3Q 2006 ( CASTOR, dCache, DPM) at group and user level
126 Verification of file integrity after replication FD 0 0 0 0 0 0 0 checksum (on demand), file size
127 Verification that operations have had the desired effect at fabric level FD 0 0 0 0 0 0 0  
128 Highly optimized SRM client tools FD 0 0 0 0 0 0 0  
129 Python binding for SRM client tools FD 0 0 0 0 0 0 0  
130 No direct access to the information system for any operation FD 0 0 0 0 0 0 0 stressed by LHCb
200 Data Management FD - - - - - - - -
210 File Transfer Service FD - - - - - - - -
211 FTS clients on all WNs and VOBOXes FD 0 0 0 0 0 0 0  
212 Retry until explicit stopped FD 0 0 0 0 0 0 0  
213 Real-time monitoring of errors FD 0 0 0 0 0 0 0 has to "parser friendly" and indicate common conditions (destination down..)
214 Automatic file transfers between any two sites on the Grid FD 0 0 0 0 0 0 0 not linked to a catalogue, file specified via SURL
215 Central entry point for all transfers FD 0 0 0 0 0 0 0  
216 FTS should handle proxy renewal FD 0 0 0 0 0 0 0  
217 SRM interface integrated to allow specification of storage type, lifetime, pinning, etc. FD 0 0 0 0 0 0 0  
218 Priorities, including reshuffling FD 0 0 0 0 0 0 0  
219 Support for VO specific plug-ins FD 0 0 0 0 0 0 0  
230 File Placement Service FD - - - - - - - -
231 FPS plug-ins for VO specific agents FD 0 0 0 0 0 0 0  
232 FPS should handle routing FD 0 0 0 0 0 0 0  
233 FPS should handle replication FD 0 0 0 0 0 0 0 choosing the sources automatically
234 FPS should handle transfers to multiple destinations FD 0 0 0 0 0 0 0  
240 Grid File Catalouge FD - - - - - - - -
241 LFC as global and local catalogue with a peak access rate of 100Hz FD 0 0 0 0 0 0 0  
242 Support for replica attributes: tape, pinned, disk, etc. FD 0 0 0 0 0 0 0  
243 Support for the concept of a Master Copy FD 0 0 0 0 0 0 0 master copies can't be deleted
244 Pool interface to LFC FD 0 0 0 0 0 0 0 access to file specific meta data, maybe via a RDD like service
250 Performance: FD - - - - - - - -
251 Emphasis on read access FD 0 0 0 0 0 0 0  
252 Unauthenticated read-only instances FD 0 0 0 0 0 0 0  
253 Bulk operations FD 0 0 0 0 0 0 0  
260 Grid Data Management Tools FD - - - - - - - -
261 lcg-utils available in production FD 0 0 0 0 0 0 0  
262 POSIX file access based on LFN FD 0 0 0 0 0 0 0 including "best replica" selection based on location, prioritization, current load on networks
263 file access libs. have to access multiple LFC instances for load balancing and high availability FD 0 0 0 0 0 0 0  
264 reliable registration service FD 0 0 0 0 0 0 0 supporting ACL propagation and bulk operations
265 reliable file deletion service that verifies that actions have taken place and is performant FD 0 0 0 0 0 0 0  
266 Staging service for sets of files FD 0 0 0 0 0 0 0  
300 Workload Management FD - - - - - - - -
301 Configuration that defines a set of primary RB's to be used by the VO for load balancing and allows defining alternative sets to be used in case the primary set is not available FD 0 0 0 0 0 0 0  
302 Single RB end point with automatic load balancing FD 0 0 0 0 0 0 0  
303 No loss of jobs or results due to temporary unavailability of an RB instance FD 0 0 0 0 0 0 0  
304 Handling of 10**6 jobs/day FD 0 0 0 0 0 0 0  
305 Using the information system in the match making to sent jobs to sites hosting the input files AND providing sufficient resources FD 0 0 0 0 0 0 0  
306 Better input sandbox management (caching of sandboxes) FD 0 0 0 0 0 0 0  
307 Latency for job execution and status reporting has to be proportional to the expected job duration FD 0 0 0 0 0 0 0  
308 Support for priorities based on VOMs groups/roles FD 0 0 0 0 0 0 0 ATLAS remarked that this should work without a central service
309 RB should reschedule jobs in the internal task queue FD 0 0 0 0 0 0 0  
310 Interactive access to running jobs FD 0 0 0 0 0 0 0 and commands on the WN like: top, ls, and cat on individual files
311 All CE services have to be accessible directly by user code FD 0 0 0 0 0 0 0  
312 Direct access to status of the computing resource (number of jobs/VO ...) FD 0 0 0 0 0 0 0  
313 Allow agents on worker nodes to steer other jobs FD 0 0 0 0 0 0 0  
314 Mechanism for changing the identity (owner) of a running job on a worker node FD 0 0 0 0 0 0 0  
320 Monitoring Tools FD - - - - - - - -
321 Transfer traffic monitor FD 0 0 0 0 0 0 0  
322 SE monitoring, statistics for file opening and I/O by file/dataset, abstract load figures FD 0 0 0 0 0 0 0  
323 Scalable tool for VO specific information (job status/errors/.. FD 0 0 0 0 0 0 0  
324 Publish subscribe to logging and bookkeeping, and local batch system events for all jobs of a given VO FD 0 0 0 0 0 0 0  
330 Accounting FD - - - - - - - -
331 By site, user, and group based on proxy information FD 0 0 0 0 0 0 0 DGAS
323 Accounting by VO specified tag that identifies certain activities. These could be MC, Reconstruction, etc. FD 0 0 0 0 0 0 0  
330 Storage Element accounting FD 0 0 0 0 0 0 0  
400 Other Issues FD 0 0 0 0 0 0 0 have been grouped under Deployment Issues and partially deals with services provided at certain sites
401 Read-only mirrors of LFC service at several T1 centers updated every 30-60 minutes FD 0 0 0 0 0 0 0  
402 Different SE classes, MSS, Disk with access for production managers, public disks storage FD 0 0 0 0 0 0 0  
403 XROOTD at all sites FD 0 0 0 0 0 0 0  
404 VOBOX at all sites FD 0 0 0 0 0 0 0 requested by Alice, ATLAS, CMS. LHCb requested T1s and some T2s
405 Computing element open to VO specific services on all sites FD 0 0 0 0 0 0 0 including direct access to information bypassing the information system ( CREAM and CMon)
406 dedicated queues for short jobs FD 0 0 0 0 0 0 0  
407 Standardized CPU time limits FD 0 0 0 0 0 0 0  
408 Tool to manage VO specific site dependent environments FD 0 0 0 0 0 0 0  
409 Rearranging priorities of jobs in the local queue FD 0 0 0 0 0 0 0 ATLAS: Requirement for a priority system including local queues at the sites, able to rearrange the priority of jobs already queued at each single site in order to take care of new high priority jobs being submitted. Such system requires some deployment effort, but essentially no development since such a feature is already provided by most of the batch systems, and is a local implementation, not a Grid one.
500 From Cal's list NA4 - - - - - - -  
501 All user level commands must extract VO information from a VOMS proxy NA4 0 0 0 0 0 0 0 required for glite 3.0
502 Membership with multiple organizations must work correctly NA4 0 0 0 0 0 0 0 required for glite 3.0
503 Services must provide access control based on VOMS groups/roles. Critical is fine grained control to: files, queues and metadata NA4 0 0 0 0 0 0 0 required after glite 3.0
510 Short Deadline Jobs NA4 - - - - - - -  
511 The release should support SDJ at the level of the batch systems NA4 0 0 0 0 0 0 0 required for glite 3.0
512 The resource broker has to be able to identify resources that support SDJs NA4 0 0 0 0 0 0 0 required for glite 3.0
513 SDJ resource access should be controlled via ACLs NA4 0 0 0 0 0 0 0 after glite 3.0
514 Modify system to ensure shortest possible latency for SDJs NA4 0 0 0 0 0 0 0 longer term
520 MPI NA4 0 0 0 0 0 0 0 Use cases: Running large scale parallel applications on the grid effectively
521 Use a batch system that can handle the "CPU count problem" NA4 0 0 0 0 0 0 0 required for glite 3.0, his problem arises because of a scheduling mismatch in the versions of maui/torque used by default. The end result is that typically an MPI job can only use half of the CPUs available on a site, yet the broker will happily schedule jobs which require more on the site. These jobs will never run.
521 Publication of the maximum number of CPUs that can be used by a single job NA4 0 0 0 0 0 0 0 required for glite 3.0
522 Publication of wether the home directories are shared (alternatively transparently move sandboxes to all allocated nodes NA4 0 0 0 0 0 0 0 required for glite 3.0
523 Ability to run code before/after the job wrapper invokes "mpirun" NA4 0 0 0 0 0 0 0 required after glite 3.0. This will allow compilation and setup of the job by the user
530 Disk Space Specification NA4 0 0 0 0 0 0 0 Usecases: Jobs need scratch space, shared between nodes (MPI) or local and will fail if this resource is not available
531 Specification of required shared disk space NA4 0 0 0 0 0 0 0 required for glite 3.0
532 Specification of required local scratch disk space NA4 0 0 0 0 0 0 0 required for glite 3.0
540 Publication of software availability and location NA4 0 0 0 0 0 0 0 Usecases" applications use certain software packages frequently. Not all have standard locations or versions.
541 Publication of the Java and Python version NA4 0 0 0 0 0 0 0 required for glite 3.0
542 Mechanism to find the required versions of those packages NA4 0 0 0 0 0 0 0 required for glite 3.0
550 Priorities for jobs NA4 0 0 0 0 0 0 0  
551 Users should be able to specify the relative priority of their jobs NA4 0 0 0 0 0 0 0 required for glite 3.0
552 A VO should be able to specify the relative priority of jobs NA4 0 0 0 0 0 0 0 required for glite 3.0
553 VO and user priorities must be combined sensibly by the system to define an execution order for queued jobs NA4 0 0 0 0 0 0 0 required for glite 3.0
560 Job Dependencies NA4 0 0 0 0 0 0 0 Usecases: Applications often require workflows with dependencies
561 Ability to specify arbitrary (non circular) dependencies between jobs inside a set of jobs NA4 0 0 0 0 0 0 0 required after glite 3.0
562 Ability to query the state and control such jobs as a unit NA4 0 0 0 0 0 0 0 required after glite 3.0
563 Ability to query and control the sub jobs NA4 0 0 0 0 0 0 0 required after glite 3.0
570 Metadata Catalogue NA4 0 0 0 0 0 0 0 Usecases: identify dataset based on metadata information
571 Ability to add metadata according to user defined schema NA4 0 0 0 0 0 0 0  
572 Ability to control access to data based on the schema with a granularity of entries and fields NA4 0 0 0 0 0 0 0  
573 Ability to distribute metadata over a set of servers NA4 0 0 0 0 0 0 0  
580 Encryption Key Server NA4 0 0 0 0 0 0 0 Usecases: data can be highly sensitive and must be encrypted to control access
581 Ability to retrieve an encryption key based on a file id NA4 0 0 0 0 0 0 0  
582 Ability to do an M/N split of keys between servers to ensure that no single server provides sufficient information to decrypt files NA4 0 0 0 0 0 0 0  
583 Access to these keys must be controlled by ACLs NA4 0 0 0 0 0 0 0  
590 Software License Management NA4 0 0 0 0 0 0 0  
591 Ability to obtain licenses for a given package form a given server NA4 0 0 0 0 0 0 0  
592 Access to the server must be controlled via ACLs based on grid certificated NA4 0 0 0 0 0 0 0  
592 The system should know about the availability of licenses and start jobs only when a license is available NA4 0 0 0 0 0 0 0  
600 Database Access NA4 0 0 0 0 0 0 0 Usecases : Application data resides in relational and XML DBs. Applications need access to this data based on grid credentials
601 Basic access control based on grid credentials NA4 0 0 0 0 0 0 0  
602 Fine-grained control at table, row and column level NA4 0 0 0 0 0 0 0  
603 Replication mechanism for data bases NA4 0 0 0 0 0 0 0  
604 Mechanism to federate distributed servers (each server contains a subset of the complete data NA4 0 0 0 0 0 0 0  

-- Main.markusw - 21 Dec 2005

-- Main.markusw - 06 Jan 2006 filled in the HEP and NA4 issues and requirements. There are still duplicates, especially in the area of short jobs

-- Main.markusw - 09 Jan 2006

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2007-02-19 - AlbertoAimar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback