Data Management and Storage Management TEG joint LHC questions list

Draft list:

  • We assert that we have a working system, do not need a new system, but the current system (cost and complexity) exceeds what is needed. Would the experiment agree with this statement?
  • What elements are being used, and how? Which have been added and why?
  • Which components failed to deliver (larger) functionality needed? Does that mean we can deprecate and obsolete these components, or was it not truly needed?
  • If the middleware dropped complexity and removed functionality, does the experiment have the resources to adopt to the change?
  • What is the experiment's interest in revising data placement models? What kinds of revisions have gone through?
    • What is the experiment's interest in data federations?
  • What sort of assumptions does the experiment need to make for data federations work?
  • Could you work directly with clustered file systems at smaller sites?
  • Could you work directly with cloud file systems (i.e., GET/PUT)? Assume "cloud file systems" implies REST-like APIs, transfer via HTTP, no random access within files, no third-party transfer, and possibly no file/directory structure. See Amazon S3 for inspiration.
  • How thoroughly does the experiment use space management today?
  • Does your framework support HTTP-based access?
  • Is the archive / disk split an agreed-upon strategy in your experiment? Can HSM be dropped?
  • What role do you see for Data Federation versus Data Push? Is this useful at smaller sites?
  • For smaller sites, would caching work for your experiment?

Additions, based on conversations with Storage TEG:

  • Security / VOMS - what are experiments expectations / needs.
  • where they see their namespace management "philosophy" evolving in the future.
  • What volume of data do experiments plan to move / store (and what do they currently move /store).
  • What kind of file access performance are they expecting (any changes expected?) - what WAN data transfer rates?

Additions, based on conversations with the MB: Data management

  • Can we get rid of SRM in front of our storage services? (question also for Storage group)
  • Why do we need gridftp? Why can't we use http like the rest of the world. (NB: "because it can't do xx" is not a good answer - we are good at developing something from scratch and bad at adding functionality to open source sw).
  • Can we agree layered data management and storage architecture? This will help factorise various issues and help piecemeal replacement of layers. To be defined together with storage group
  • Is HSM model needed? (etc see Dirk's GDB slides): NO?
  • How will we manage a global namespace - whose problem is it? the Experiments? Is there a continued need for LFC?
  • Access control - what is really needed? The simpler the better!
  • As a general comment, we would like a better clarification/division of responsibilities between experiments and infrastructure(s). I.e. If the experiments want to manage most of the complexity then we should simplify the services that the infrastructure should provide.
    • Posed as a question: Where should the complexity and intelligence lie - the experiments or the infrastructure? How do you view the balance now, and how would you like to see this change (if at all)?
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-01-12 - BrianBockelman
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback