Storage Management TEG: Questionnaire Level 1 - Paul Millar
This twiki is to collect the input of
Paul Millar. Please answer the questions below. For more information, please refer to the
Storage TEG main twiki.
Question 1
- In your view, what are the 3 main current issues in Storage Management (SM)?
My answer:
- The hiatus on cross-experiment, cross-technology storage-management development.
There seems to be a feeling that, with the development of SRM v2.2, the work was done: the only parts left was for storage-technologies to implement the standard and experiments to adopt it.
Perhaps with the benefits of hindsight, this certainly is naive. Initial use-case capture is always incomplete and inaccurate, technology is constantly changing, so there's no reason for such activity to stop.
This has lead experiments to move into new areas on their own initiative, investigating and creating ad hoc solutions.
- There's a mismatch between what the SRM provides and what the experiments need. This is both in terms of the specification itself (functionality that the experiments are not using and missing functionality) and the SRM implementations from the storage providers.
The main issue here is a lack of engagement and communication between the experiments and those developing and implementing the software.
The SRM should not be thrown away; rather, we need to develope better communication. This would allow more agile development that exploits common requirements. Implementing these in core components (such as storage software) would take work away from end-users.
- Deciding which actor is responsible for certain functionality.
There is increasing tendency for end-users (experiments) to micro-manage storage systems: requiring control over aspects that should be delegated to the sys-admin, sometimes from the sys-admin to the storage system itself.
What the end-users control, the end-users can break. This can happen from simply "doing the wrong thing" through to not reacting fast enough to changes in user activity profiles. The results can be poor performance or unnecessary outages.
There should be a dialogue between the actors to figure out who is responsible for which parts of the overall system; whether end-users are providing "hints" that a storage system uses or explicit commands.
Question 2
- What is the greatest future challenge which would greatly impact the SM sector?
My answer:
- There is really only one currently: the movement away from Monarc. This has resulted in:
- The move away from active use of tape towards more ad hoc network usage.
- Moving towards a fedorated storage solution.
The challenges are:
- to avoid ad hoc solutions, but deliver solutions that are applicable to multiple experiments.
- avoid being tied into particular software.
- as much as possible, adopt standards to reduce work-load, so delivering what people need faster.
- develop more sophisticated networking models within storage implementations and provide better interaction between storage and networking so that a storage system can deliver a file with minimal number of hops, allocated and configure network paths (c.f. lambda project)
- In the future, the current issues surrounding disk vs tape will likely resurface as SSD vs magnetic harddisk. The same questions will likely reappearing: how does the experiment optimise their use of storage systems so that their analysis data is (or is more likely) on SSD rather than magnetic harddisk? Do experiments provide "hints" of files that are likely to be read (==prestaging) or provide explicit control for the duration of analysis (==pins), etc.
Question 3
- What is your site/experiment/middleware currently working on in SM?
My answer:
- Scalable SRM: providing multiple SRM front-ends that clients can be load-balance over and removing a single-point-of-failure.
- Adopting standards and driving their adoption elsewhere: NFS v4.1/pNFS and WebDAV.
- Investigating cloud storage management APIs: Amazon S3 vs CDMI vs other proprietary standards. What benefits do they bring?
Question 4
- What are the big developments that you would like to see from your site/experiment/storage system in the next 5 years?
My answer:
- Establishing a group that takes a long-term interest in storage management. Such a group would:
- Take a long, hard look at what functionality is really needed; for example, what are people really using in SRM, what is missing.
- Analyses the experiment's data work-flow. There may be operations that are common between two or more experiments. Such operations would be candidates for moving into the storage system.
- Are there activities that are difficult for the experiments to do in a robust and efficient fashion? These are another candidate for moving functionality into the storage system.
An example of such difficult operations are chained operations; for example, to obtain, from some set of files, a list of TURLs where the data that is currently available online. This requires first discovering which data is online, then obtaining TURLs for these files. Rather than doing this within experimental framework, the storage systems could be extended to provide this functionality.
- Switch to use WebDAV for basic namespace operations. For functionality that is missing from WebDAV then provide this functionality by either:
- extend WebDAV (new HTTP methods and/or custom properties)
- RESTful API.
Question 5
- In your experience and area of competence, what are the (up to) 3 main successes in SM so far?
My answer:
- We have a grid infrastructure that basically works and that allows scientific work.
- The SRM. Despite its (many) faults, it's a universally available storage protocol (except for EOS).
Question 6
- In your experience and area of competence, what are the (up to) 3 main failures or things you would like to see changed in SM so far?
My answer:
I have only one core failure: the halt of storage-management
activities within WLCG. This has led to stagnation and knock-on
problems.
That's it!
Thanks! Feel free to edit again at any time, until the date of the kick-off meeting.
--
DanieleBonacorsi - November 2011