Data Management and Storage Management TEG joint LHC questions list
Draft list:
- We assert that we have a working system, do not need a new system, but the current system (cost and complexity) exceeds what is needed. Would the experiment agree with this statement?
- What elements are being used, and how? Which have been added and why?
- Which components failed to deliver (larger) functionality needed? Does that mean we can deprecate and obsolete these components, or was it not truly needed?
- If the middleware dropped complexity and removed functionality, does the experiment have the resources to adopt to the change?
- What is the experiment's interest in revising data placement models? What kinds of revisions have gone through?
- What is the experiment's interest in data federations?
- What sort of assumptions does the experiment need to make for data federations work?
- Could you work directly with clustered file systems at smaller sites?
- Could you work directly with cloud file systems (i.e., GET/PUT)? Assume "cloud file systems" implies REST-like APIs, transfer via HTTP, no random access within files, no third-party transfer, and possibly no file/directory structure. See Amazon S3 for inspiration.
- How thoroughly does the experiment use space management today?
- Does your framework support HTTP-based access?
- Is the archive / disk split an agreed-upon strategy in your experiment? Can HSM be dropped?
- What role do you see for Data Federation versus Data Push? Is this useful at smaller sites?
- For smaller sites, would caching work for your experiment?
Additions, based on conversations with Storage TEG:
- Security / VOMS - what are experiments expectations / needs.
- where they see their namespace management "philosophy" evolving in the future.
- What volume of data do experiments plan to move / store (and what do they currently move /store).
- What kind of file access performance are they expecting (any changes expected?) - what WAN data transfer rates?
Additions, based on conversations with the MB:
Data management
- Can we get rid of SRM in front of our storage services? (question also for Storage group)
- Why do we need gridftp? Why can't we use http like the rest of the world. (NB: "because it can't do xx" is not a good answer - we are good at developing something from scratch and bad at adding functionality to open source sw).
- Can we agree layered data management and storage architecture? This will help factorise various issues and help piecemeal replacement of layers. To be defined together with storage group
- Is HSM model needed? (etc see Dirk's GDB slides): NO?
- How will we manage a global namespace - whose problem is it? the Experiments? Is there a continued need for LFC?
- Access control - what is really needed? The simpler the better!
- As a general comment, we would like a better clarification/division of responsibilities between experiments and infrastructure(s). I.e. If the experiments want to manage most of the complexity then we should simplify the services that the infrastructure should provide.
- Posed as a question: Where should the complexity and intelligence lie - the experiments or the infrastructure? How do you view the balance now, and how would you like to see this change (if at all)?