Index | Issue/Requirement | Origin | ALICE | ATLAS | CMS | LHCb | Biomed | NA4 | Sum | Res | Estimated Cost | Comments | Status |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100 | Security, authorization, authentication | FD | - | - | - | - | - | - | 0 | - | - | - | |
101 (was 101, 308, 501, 502, 503, 513, 572) | VOMS groups and roles used by all middleware; Support for priorities based on VOMs groups/roles; All user level commands must extract VO information from a VOMS proxy; Services must provide access control based on VOMS groups/roles. Critical is fine grained control to: files, queues and metadata; Ability to control access to data based on the schema with a granularity of entries and fields |
FD, NA4 | 5 | 12 | 12 | 0 | 18 | 14 | 61 | JRA1,SA3 tracked in: 2926 ![]() ![]() ![]() ![]() |
Short term solution for job priorities is to use VOViews. Available on the LCG-RB. Code frozen for gLite WMS (wait for the first post-glite 3.0 release of WMS) See: Jeff T. document ![]() Longer term use GPBOX to define and propagate VO policies to sites. Prototype targetted to the gLite 3.0 infrastructure is available but not integrated and tested. |
O(10) groups O(3) roles. ATLAS: O(25) groups, O(3) roles LHCb: Not clear item, should be specified for each service separately; ATLAS remarked that this should work without a central service Sites: VOMS should be used by users too; non-VOMS proxy means no special roles or privileges at site. | CMS: Ongoing |
103 | Automatic handling of proxy renewal | FD | 5 | 2 | 1 | 3 | 0 | 5 | 16 | Sec. | 0 | users should not need to know which server to use to register their proxies for a specific service LHCb: item should be split by services | CMS: done TCG:discussed |
103a | proxy renewal within the service | 16 | JRA1 | done | TCG:discussed | ||||||||
103b | establishment of trust between the service and myproxy | 16 | Sec tracked in 2929 ![]() |
TCG:discussed | |||||||||
103c | find the right myproxy server | 16 | SA3 (configuration) | TCG:discussed | |||||||||
104 | Automatic renewal of Kerberos credentials via the GRID | FD | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |||
105 | Framework and recommendations for developing secure experiment specific services | FD, VOB | 0 | 3 | 1 | 4 | 0 | 0 | 0 | 8 | including delegation and renewal LHCb: this should include certification of already developed by experiments security frameworks; this is also a requirement from the VO boxes group Sites: agree this is a requirement. |
CMS: ongoing TCG:discussed |
|
110 | Information System | FD | - | - | - | - | - | - | 0 | - | - | - | |
111 (was 111, 130) | Stable access to static information; No direct access to the information system for any operation a) caching of endpoint information in the clients and b} not to need to go to the information system if the information is already available elsewhere (e.g. through parameters) |
FD | 0 | 6 | 0 | 6 | 0 | 0 | 12 | tracked in 3069 ![]() |
0 | service end points, service characteristics stressed by LHCb LHCb: 124,128,130 to be merged Sites: should be addressed by split of IS into static and dynamic parts, currently discussed within GD |
CMS: ongoing (new info system ?) |
120 | Storage Management | FD | - | - | - | - | - | - | 0 | - | - | - | |
122 | Same semantic for all SEs | FD | 5 | 5 | 5 | 4 | 0 | 0 | 19 | SRM group | 0 | Sites: isn't this agreed part of Witzig proposal? | |
125 | Disk quota management | FD | 1 | 2 | 5 | 1 | 0 | 8 | 0 | not before 3Q 2006 (CASTOR, dCache, DPM) | at group and user level | to be discussed in the storage solution group | |
126 | Verification of file integrity after replication | FD | 1 | 5 | 3 | 2 | 0 | 0 | 11 | JRA1 tracked in 2932 ![]() |
0 | checksum (on demand), file size | CMS: done TCG:discussed |
200 | Data Management | FD | - | - | - | - | - | - | 0 | - | - | - | |
210 | File Transfer Service | FD | - | - | - | - | - | - | 0 | - | - | - | |
213 | Real-time monitoring of errors | FD | 1 | 5 | 2 | 0 | 0 | 0 | 8 | 0 | has to "parser friendly" and indicate common conditions (destination down..) | CMS: ongoing TMB discussed |
|
217b | SRM interface integrated to allow specification of lifetime, pinning, etc. | FD | 2 | 2 | 2 | 1 | 0 | 0 | 7 | 0 | LHCb: Different types of storages should have different SE's, pinning is important here | TCG:discussed storage type done in 217; this lists the remaining work |
|
218 | Priorities, including reshuffling | FD | 0 | 2 | 2 | 0 | 0 | 0 | 4 | 0 | CMS: not done TMB discussed |
||
240 | Grid File Catalouge | FD | - | - | - | - | - | - | 0 | - | - | - | |
243 | Support for the concept of a Master Copy | FD | 0 | 0 | 2 | 2 | 0 | 0 | 4 | 0 | master copies can't be deleted LHCb: can not be deleted the same way as other replicas | TMB discussed | |
244 | Pool interface to LFC | FD | 0 | 5 | 5 | 0 | 0 | 0 | 10 | pool | 0 | access to file specific meta data, maybe via a RDD like service LHCb: Pool to use gfal which will be interfaced to LFC | TMB discussed |
250 | Performance: | FD | - | - | - | - | - | - | 0 | - | - | - | ? |
260 | Grid Data Management Tools | FD | - | - | - | - | - | - | 0 | - | - | - | |
262a | POSIX file access based on LFN | FD | 1 | 0 | 0 | 4 | 0 | 5 | 10 | NA4 tracked in 2936 ![]() |
0 | ? | |
262b | including "best replica" selection based on location, prioritization, current load on networks | 0 | research problem | CMS: open | |||||||||
263 | file access libs. have to access multiple LFC instances for load balancing and high availability | FD | 1 | 0 | 0 | 2 | 0 | 0 | 3 | 0 | ? | ||
264 | reliable registration service | FD | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | supporting ACL propagation and bulk operations | ? | |
265 | reliable file deletion service that verifies that actions have taken place and is performant | FD | 1 | 5 | 0 | 2 | 0 | 0 | 8 | 0 | Sites: ATLAS is asking us for this. | ? | |
266 | Staging service for sets of files | FD | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | LHCb: item not clear | ? | |
300 | Workload Management | FD | - | - | - | - | - | - | 0 | - | - | - | |
302 | Single RB end point with automatic load balancing | FD | 0 | 0 | 0 | 2 | 0 | 0 | 2 | Design required. Major issue. No estimate | CMS: Open | ||
303 | No loss of jobs or results due to temporary unavailability of an RB instance | FD | 0 | 5 | 2 | 1 | 0 | 0 | 8 | Standard linux HA available now (Two machines, one dies, one takes up) Multiple RB plus network file system (N RB�s using NFS/AFS shared disk, hot swap RB�s to replace failed ones with same name, IP, certs, status, jobs within minutes): 1 FTE*month (JRA1/WM) + N (~3) months test |
CMS: Open/Ongoing | ||
307 | Latency for job execution and status reporting has to be proportional to the expected job duration | FD | 0 | 0 | 2 | 0 | 0 | 0 | 2 | Support for SDJ at the level of middleware is in the first post gLite 3.0 release of WMS. | ? | ||
310 | Interactive access to running jobs | FD | 0 | 2 | 2 | 0 | 0 | 0 | 4 | job file perusal in gLite 3.0. For basic functionalities like top, ls need ~ 1 FTE month. More design needed for full functionalities. |
and commands on the WN like: top, ls, and cat on individual files | CMS: open | |
311 (was 311, 405) | All CE services have to be accessible directly by user code; Computing element open to VO specific services on all sites | FD | 0 | 0 | 0 | 7 | 0 | 0 | 7 | tracked in 3072![]() |
CEMon already in gLite 3.0. CREAM prototype targetted to the gLite 3.0 infrastructure is available but not integrated and tested. | Sites: what does this mean? | CMS: Ongoing |
312 | Direct access to status of the computing resource (number of jobs/VO ...) | FD | 0 | 0 | 0 | 4 | 0 | 0 | 4 | Using CEMon (in gLite 3.0) | Sites: don't we already have this in the info system? sites likely to reject any proposal to query computing element directly, yet another way to poll the system to death. this is why we have an information system. users will have to accept that the information may be O(1 min) old. | CMS: Ongoing | |
313 (was 313, 314) | Allow agents on worker nodes to steer other jobs; Mechanism for changing the identity (owner) of a running job on a worker node | FD | 5 | 0 | 1 | 8 | 3 | 0 | 17 | tracked in 3073 ![]() |
glexec available as a library in gLite 3.0 O(1 FTE month) to have it as a service usable on the WNs, but it is a security issue to be decided by sites |
Sites: what does "agents on WN steer other jobs" mean? | CMS: ongoing |
320 | Monitoring Tools | FD | - | - | - | - | - | - | 0 | - | - | - | |
321 | Transfer traffic monitor | FD | 1 | 1 | 1 | 1 | 0 | 0 | 4 | 0 | CMS: open TMB discussed |
||
322 | SE monitoring, statistics for file opening and I/O by file/dataset, abstract load figures | FD | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | CMS: open TMB discussed |
||
324 | Publish subscribe to logging and bookkeeping, and local batch system events for all jobs of a given VO | FD | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | CMS: ongoing TMB discussed |
||
330 | Accounting | FD | - | - | - | - | - | - | 0 | - | - | - | |
331 | By site, user, and group based on proxy information | FD | 3 | 5 | 1 | 0 | 0 | 5 | 14 | all applications tracked in 2941 ![]() |
Suitable log files from LRMS on LCG and gLite CE in first post-gLite 3.0 release. DGAS provides the needed provacy and granularity. APEL provides an easy collection and representation mechanism for aggregate information. | DGAS; all applications should check whether the currently available information is enough | CMS: open TMB discussed |
333 | Storage Element accounting | FD | 0 | 2 | 1 | 0 | 0 | 2 | 5 | 0 | CMS: open < | ||
400 | Other Issues | FD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | have been grouped under Deployment Issues and partially deals with services provided at certain sites | ||
402 | Different SE classes, MSS, Disk with access for production managers, public disks storage | FD | 0 | 2 | 1 | 4 | 0 | 0 | 7 | 0 | CMS: Done TCG:discussed |
||
500 | From Cal's list | NA4 | - | - | - | - | - | - | 0 | - | - | - | |
510 | Short Deadline Jobs | NA4 | - | - | - | - | - | - | 0 | - | - | - | |
511 | The release should support SDJ at the level of the batch systems | NA4 | 0 | 0 | 0 | 0 | 4 | 0 | 4 | 0 | required for glite 3.0 | ||
512 | The resource broker has to be able to identify resources that support SDJs | NA4 | 0 | 0 | 0 | 0 | 4 | 0 | 4 | In first post-gLite 3.0 release of WMS as far as 511/406 are satisfied | required for glite 3.0. BUG:31278![]() |
||
514 | Modify system to ensure shortest possible latency for SDJs | NA4 | 0 | 0 | 1 | 0 | 5 | 0 | 6 | design needed | longer term | ||
520 | MPI | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Use cases: Running large scale parallel applications on the grid effectively | ||
521b | Publication of the maximum number of CPUs that can be used by a single job | NA4 | 0 | 0 | 0 | 0 | 2 | 5 | 7 | NA4 tracked in 2938 ![]() |
0 | required for glite 3.0 | |
530 | Disk Space Specification | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Handled with information pass-through via BLAH. Available as a prototype in the first post-gLite 3.0 release. Would need at least 1 FTE month for each supported batch system to use it. | Usecases: Jobs need scratch space, shared between nodes (MPI) or local and will fail if this resource is not available | ||
531 | Specification of required shared disk space | NA4 | 0 | 0 | 0 | 0 | 0 | 5 | 5 | As in 530 | required for glite 3.0. Needs deployment of CREAM CE + plug-ins | ||
532 | Specification of required local scratch disk space | NA4 | 0 | 0 | 0 | 0 | 1 | 5 | 6 | As in 530 | required for glite 3.0 | ||
540 | Publication of software availability and location | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Usecases" applications use certain software packages frequently. Not all have standard locations or versions. | ||
541 (was 541,542) | Publication of the Java and Python version; Mechanism to find the required versions of those packages | NA4 | 0 | 0 | 0 | 0 | 4 | 4 | 8 | 0 | required for glite 3.0; discussion not conclusive yet Sites: note this is an old HEPCAL requirement | ||
550 | Priorities for jobs | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Job Priorities WG | 0 | ||
551 | Users should be able to specify the relative priority of their jobs | NA4 | 0 | 0 | 1 | 0 | 3 | 2 | 6 | 0 | required for glite 3.0 | ||
552 | A VO should be able to specify the relative priority of jobs | NA4 | 0 | 5 | 1 | 0 | 0 | 2 | 8 | 0 | required for glite 3.0. Groups can have different priorities, but VO control is not available | ||
553 | VO and user priorities must be combined sensibly by the system to define an execution order for queued jobs | NA4 | 0 | 0 | 0 | 0 | 2 | 2 | 4 | 0 | required for glite 3.0 | ||
580 | Encryption Key Server | NA4 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 0 | Usecases: data can be highly sensitive and must be encrypted to control access | ||
581 | Ability to retrieve an encryption key based on a file id | NA4 | 0 | 0 | 0 | 0 | 6 | 0 | 6 | 0 | |||
582 | Ability to do an M/N split of keys between servers to ensure that no single server provides sufficient information to decrypt files | NA4 | 0 | 0 | 0 | 0 | 3 | 0 | 3 | 0 | |||
583 | Access to these keys must be controlled by ACLs | NA4 | 0 | 0 | 0 | 0 | 6 | 0 | 6 | 0 | |||
590 | Software License Management | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
591 | Ability to obtain licenses for a given package form a given server | NA4 | 0 | 0 | 0 | 0 | 2 | 3 | 5 | 0 | |||
592 | Access to the server must be controlled via ACLs based on grid certificated | NA4 | 0 | 0 | 0 | 0 | 2 | 3 | 5 | 0 | |||
592 | The system should know about the availability of licenses and start jobs only when a license is available | NA4 | 0 | 0 | 0 | 0 | 1 | 2 | 3 | 0 | |||
600 | Database Access | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Usecases : Application data resides in relational and XML DBs. Applications need access to this data based on grid credentials Sites: a very oft requested feature by non-HEP users! | ||
601 | Basic access control based on grid credentials | NA4 | 0 | 5 | 0 | 0 | 3 | 5 | 13 | NA4 tracked in 2937 ![]() |
0 | NA4 to evaluate ogsa-dai | |
602 | Fine-grained control at table, row and column level | NA4 | 0 | 0 | 0 | 0 | 3 | 5 | 8 | 0 | |||
603 | Replication mechanism for data bases | NA4 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | |||
604 | Mechanism to federate distributed servers (each server contains a subset of the complete data | NA4 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | Sites: a very oft requested feature by non-HEP users (esp. biobanking) | ||
701 | OutputData support in JDL | Savannah | 0 | 0 | 0 | 0 | 0 | 0 | 0 | From Savannah bug #22564![]() |
Index | Issue/Requirement | Origin | ALICE | ATLAS | CMS | LHCb | Biomed | NA4 | Sum | Res | Estimated Cost | Comments | Status |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
102 | VOMS supporting user metadata | FD | 3 | 0 | 0 | 2 | 0 | 0 | 5 | 0 | see list for details | Done | |
112 | Identical GLUE schema for gLite and LCG | FD | 0 | 5 | 0 | 0 | 0 | 0 | 5 | In the first post gLite 3.0 release of WMS and gLite CE | Done | ||
121 | SRM used by all Storage Elements | FD | 5 | 5 | 5 | 4 | 0 | 0 | 19 | SA1 (being put in place) | 0 | SRM as specified in the Baseline Services Working Group Report | Done |
123 | Smooth migration from SRM v1 to v2, gfal and FTS should hide differences | FD | 1 | 5 | 5 | 4 | 0 | 0 | 15 | JRA1 tracked in 2930 ![]() |
0 | LHCb: 121,122,123 should be merged in one, 4 points are counted for this set | Done |
124 (was 124, 128) | Direct access to SRM interfaces; Highly optimized SRM client tools |
FD | 3 | 7 | 5 | 8 | 0 | 0 | 23 | SA3 tracked in 2931 ![]() |
0 | SRM client libs. | Done |
127 | Verification that operations have had the desired effect at fabric level | FD | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | LHCb: 126,127 to be merged Sites: what does this mean?? | Obsolete | |
129 | Python binding for SRM client tools | FD | 0 | 5 | 0 | 1 | 0 | 0 | 6 | 0 | Obsolete | ||
211 | FTS clients on all WNs and VOBOXes | FD | 5 | 5 | 0 | 4 | 0 | 0 | 14 | SA3 | 0 | For Alice and LHCb only in VO-BOXES | Done |
212 | Retry until explicit stopped | FD | 3 | 5 | 3 | 0 | 0 | 0 | 9 | JRA1 | 0 | will see gradual improvements on error handling, no specific action | Obsolete, issue resolved |
214 | Automatic file transfers between any two sites on the Grid | FD | 5 | 5 | 0 | 4 | 0 | 0 | 14 | JRA1 | 0 | not linked to a catalogue, file specified via SURL LHCb: This should be handled by FTS | Done |
215 (was 215, 232) | Central entry point for all transfers; FPS should handle routing | FD | 1 | 5 | 0 | 8 | 0 | 0 | 14 | JRA1 tracked in 2933 ![]() |
0 | LHCb: 214,215 to be merged | Obsolete |
216 | FTS should handle proxy renewal | FD | 5 | 1 | 3 | 3 | 0 | 0 | 12 | JRA1 tracked in 2933 ![]() |
0 | Done | |
217 | SRM interface integrated to allow specification of storage type, lifetime, pinning, etc. | FD | 2 | 2 | 2 | 1 | 0 | 0 | 7 | 0 | LHCb: Different types of storages should have different SE's, pinning is important here | Done | |
219 | Support for VO specific plug-ins | FD | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | Done | ||
230 | File Placement Service | FD | - | - | - | - | - | - | 0 | - | - | ATLAS comment: is it not included now in the FTS specs? | FPS is covered by FTS plus plug-ins; obsolete |
231 | FPS plug-ins for VO specific agents | FD | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | obsolete | ||
233 | FPS should handle replication | FD | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | choosing the sources automatically | obsolete | |
234 | FPS should handle transfers to multiple destinations | FD | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | LHCb: 233,244 to be merged | obsolete | |
241 | LFC as global and local catalogue with a peak access rate of 100Hz | FD | 5 | 5 | 5 | 4 | 0 | 0 | 19 | testing | 0 | Sites: would be good to clarify role of global and local catalog, they seem to get out of sync. | Atlas usecase: Done in their framwork |
242 | Support for replica attributes: tape, pinned, disk, etc. | FD | 1 | 1 | 3 | 0 | 0 | 0 | 5 | 0 | Obsolete | ||
251 | Emphasis on read access | FD | 3 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | Done | ||
252 | Unauthenticated read-only instances | FD | 3 | 2 | 0 | 4 | 0 | 0 | 9 | 0 | Sites: will reject because it opens a DoS vector. If users insist, they get zero guarantee about downtime. | obsolete | |
253 | Bulk operations | FD | 3 | 2 | 0 | 4 | 0 | 0 | 9 | 0 | Sites: optional async bulk deletes from SRM server? | done | |
261 | lcg-utils available in production | FD | 5 | 0 | 0 | 4 | 0 | 0 | 9 | 0 | Done | ||
301 | Configuration that defines a set of primary RB's to be used by the VO for load balancing and allows defining alternative sets to be used in case the primary set is not available | FD | 5 | 0 | 0 | 2 | 0 | 0 | 7 | UIs may be configured to use a number of RBs (choosen randomly). To do it in a smarter way will take ~1FTE month, but need specifications | obsolete | ||
304 | Handling of 10**6 jobs/day | FD | 3 | 5 | 10 | 0 | 0 | 0 | 18 | testing | bulk match-making: 6 FTE*months Upgrade to a new version of Condor (first post-gLite 3.0 release) use CREAM CE (Prototype targetted to the gLite 3.0 infrastructure is available but not integrated and tested.) |
LHCb: this is metrics, not task | obsolete |
305 | Using the information system in the match making to sent jobs to sites hosting the input files AND providing sufficient resources | FD | 0 | 2 | 0 | 0 | 0 | 0 | 2 | done | Done | ||
306 | Better input sandbox management (caching of sandboxes) | FD | 0 | 1 | 2 | 0 | 0 | 0 | 3 | Sandox as URL (gsiftp) in gLite 3.0 Download from http server is already possible. Some work needed for uploads: 2 FTE*months |
Done | ||
309 | RB should reschedule jobs in the internal task queue | FD | 0 | 0 | 2 | 0 | 0 | 0 | 2 | Should be merged with 101 | obsolete | ||
323 | Scalable tool for VO specific information (job status/errors/..) | FD | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | Sites: is this not the dashboard? | CMS: open moved to done thanks to the dashboards |
|
332 | Accounting by VO specified tag that identifies certain activities. These could be MC, Reconstruction, etc. | FD | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |||
401 | Read-only mirrors of LFC service at several T1 centers updated every 30-60 minutes | FD | 0 | 0 | 0 | 3 | 0 | 0 | 3 | 0 | Sites: already provided by 3D project. | Done | |
403 | XROOTD at all sites | FD | 5 | 0 | 0 | 0 | 0 | 0 | 5 | closed | done by LCG Sites: not an EGEE requirement | ? | |
404 | VOBOX at all sites | FD | 5 | 2 | 0 | 2 | 0 | 0 | 9 | VOBox WS | 0 | requested by Alice, ATLAS, CMS. LHCb requested T1s and some T2s. ATLAS comment: awaiting conclusions of NIKHEF workshop. Sites: not an EGEE requirement | done |
406 | dedicated queues for short jobs | FD | 1 | 5 | 1 | 0 | 0 | 0 | 7 | SDJ WG | should be merged with 511 | working SDJs have been demonstrated, sites who want to support this are able to | |
407 | Standardized CPU time limits | FD | 3 | 5 | 0 | 3 | 0 | 0 | 11 | JRA1 tracked in 2942 ![]() |
0 | Sites: more important is standards for publishing the information, for example if one publishes cpu time limit, this should be actual time and not some scaled time. | obsolete |
408 | Tool to manage VO specific site dependent environments | FD | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | Sites: this could mean almost anything. What does it actually mean? | CMS: obsolete | |
409 | Rearranging priorities of jobs in the local queue | FD | 0 | 2 | 0 | 0 | 0 | 0 | 2 | Should be merged with 101 | ATLAS: Requirement for a priority system including local queues at the sites, able to rearrange the priority of jobs already queued at each single site in order to take care of new high priority jobs being submitted. Such system requires some deployment effort, but essentially no development since such a feature is already provided by most of the batch systems, and is a local implementation, not a Grid one. | CMS: obsolete Sites: decided to move to long term (Witzig recommendations) moved to 'done/obsolete' due to widespread use of pilot jobs |
|
410 | Package management | VOB | - | - | - | - | - | - | 0 | - | - | according to requrements document (need to link!); simple, bare-bones implementation initally Sites: also an old GAG requirement | obsolete |
521a | Use a batch system that can handle the "CPU count problem" | NA4 | 0 | 0 | 0 | 0 | 3 | 5 | 8 | 0 | required for glite 3.0, his problem arises because of a scheduling mismatch in the versions of maui/torque used by default. The end result is that typically an MPI job can only use half of the CPUs available on a site, yet the broker will happily schedule jobs which require more on the site. These jobs will never run. | done | |
522 | Publication of wether the home directories are shared (alternatively transparently move sandboxes to all allocated nodes | NA4 | 0 | 0 | 0 | 0 | 3 | 0 | 3 | JRA1 tracked in 2939 ![]() |
0 | required for glite 3.0 | done |
523 | Ability to run code before/after the job wrapper invokes "mpirun" | NA4 | 0 | 0 | 0 | 0 | 2 | 5 | 7 | JRA1 tracked in 2940 ![]() |
Job prologue executed before the job will be available in the first post-gLite 3.0 release of WMS. An epilogue is also possible but developers need to know the required semantics | required after glite 3.0. This will allow compilation and setup of the job by the user | done |
560 | Job Dependencies | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Usecases: Applications often require workflows with dependencies | done via DAGs | |
561 | Ability to specify arbitrary (non circular) dependencies between jobs inside a set of jobs | NA4 | 0 | 0 | 0 | 0 | 2 | 2 | 4 | Done via DAGs | required after glite 3.0 | ||
562 | Ability to query the state and control such jobs as a unit | NA4 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | required after glite 3.0 | done via DAGs | |
563 | Ability to query and control the sub jobs | NA4 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | required after glite 3.0 | done via DAGs | |
570 | Metadata Catalogue | NA4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Usecases: identify dataset based on metadata information | done via AMGA | |
571 | Ability to add metadata according to user defined schema | NA4 | 0 | 0 | 0 | 0 | 6 | 5 | 11 | SA3 | 0 | done via AMGA | |
573 | Ability to distribute metadata over a set of servers | NA4 | 0 | 0 | 0 | 0 | 3 | 0 | 3 | 0 | done via AMGA |