Overarching questions

• It has been asserted that we have a working system and do not need a new system, but the current system (cost and complexity) exceeds what is needed. Would the experiment agree with this statement?

Yes, it works and we have seen that with early LHC data. It has been somewhat resource intensive to use and maintain and with the LHC shutdown it is now a good time to reassess, particularly as we know better now what we need. Obviously this working system should not be allowed to break in the effort to design a new one. Atlas would like to see most effort be focused on standardisation, simplification and robustness rather than fancy new features.

• What elements are being used, and how? Which have been added and why?

SRM, FTS, GFAL (as used by lcg-utils and deletion service), lcg-utils, LFC The LFC is under consolidation to single LFC at CERN. Probably to be fully integrated into ATLAS data management in the coming years (under "Rucio")

Most things which have been added are in the experiment layer and address issues of consistency, data lifecycle management, data discovery and aggregation (into "datasets").

• Which components failed to deliver (larger) functionality needed? Does that mean we can deprecate and obsolete these components, or was it not truly needed?

SRM promised much, but the functionality we use is pretty much restricted to list, read and write (it doesn't even do overwrite properly!). It does not protect the storage properly, introduces delays (async calls, particularly painful for small files) and could be easily replaced by simpler and more standard components. However, a tape recall interface is required by ATLAS. Space tokens are currently used by ATLAS, but this is very crude space partitioning and could be relatively easily dropped (we have much more useful internal accounting between activities). Info about storage such as free space would also need to be provided if SRM obsoleted.

FTS is working, but its channel architecture is clearly outdated with a far more connected transfer topology being used by ATLAS. ATLAS maintains its own very large and deep transfer queue on top of FTS because its point-to-point architecture does not allow source switching (you could almost say we do just-in-time channel selection on top of FTS, plus maintain our own priority queuing system). Will use FTS for 2012, whether a new FTS is used depends on functionality delivered. For transfers we could use 3rd party WebDav, xrdcp or whatever but a transfer service on top of that is needed either from the experiment or FTS - a set of requirements from ATLAS for such a service could be provided. FTS is also currently being used to protect the SE from its propensity to commit suicide under load. This functionality could be transferred to the storage system.

• If the middleware dropped complexity and removed functionality, does the experiment have the resources to adopt to the change?

Broadly, yes. We require only very basic functionality and improvements to reliability and robustness will help us in the longer term.

• Where should the complexity and intelligence lie - the experiments or the infrastructure? How do you view the balance now, and how would you like to see this change (if at all)?

It could be said that complexity is not a desirable feature of a data management system (also intelligence is not necessarily a good design goal). But we feel the main thing for middleware to concentrate on transferring and storing files robustly - it should work well and reliably almost all of the time. If it gets an error recovery would be desirable (e.g., self-healing storage via the federation). If it is overloaded and cannot serve a request it should say so clearly and its clients should understand and respect this. There are concepts which should we see as held at the experiment layer including datasets and metadata. For metdata we have seen that our systems can scale to very high rates of queries - not sure this can be the case for (more general) infrastructure provided solutions.

State of play: resource use

• What volume of data do experiments plan to move/store (up to the 2017 timescale) (and what do they currently move/store).

[Numbers have been requested and will be added here soon]

Probably this will not go up considerably in the future. Already Atlas ships quite a lot of data around compared to other experiments so there may be changes in workload that could reduce that to balance any increases in data / analysis activity.

• What kind of file access performance are they expecting (any changes expected in next 5 years?) - what WAN data transfer rates?

Reconstruction is currently CPU limited. This may improve by a factor two or more in the coming years due to code improvements (though working against this is the effect of pileup). Even so, it would probably not reach storage system limits.

For Analysis however, we currently request that a site provide at least 5 MB/s per core bandwidth, though pure root based jobs could achieve much more than this. Many of the limitations are probably in the applications themselves (and work is ongoing to improve that through the ROOT IO forum and others). Certain improvements however, particularly in the case of sparse reading of events or branches, require some features of the underlying storage, such as a decent direct access protocol, vector reading etc. The protocol also needs to be robust to avoid job failures for direct reading of course.

For WAN Transfer rates: Atlas require "T2Ds" (ie Tier2s connected to Tier1s outside their "Cloud") to have 5 MB/s to and from all but 2 of the ATLAS T1s. And 5 MB/s to other T2Ds. This requirement was 10 MB/s but has been relaxed a bit recently.

File access: SRM

• Is the archive / disk split an agreed-upon strategy in your experiment? Can HSM be dropped?

Yes, for ATLAS tape and disk are separated and there is only ever explicit managed recall from tape.HSM is still useful because you want to process data in the HSM system, not have to manage the a cache of data which is explicitly staged to an 'external' disk area.

• Can we get rid of SRM in front of our storage services? We would need an agreed tape recall interface. Current system provides the same interface no matter if its dcache; tsm storm ; castor. Robust information (on for example space usage) would also need to be provided.

• Can we get rid of SRM for disk-only storage? Yes, and xrootd and http(s) are viable alternatives. But as above managing the space usage would need to provided by some interface. Current system (dq2) embeds srm and atlas transfers use FTS so moving in the short-term quite difficult. But a replacement system (rucio) in ~13 months time will be able to work with srm or not.

• How thoroughly does the experiment use space management today?

We use SRM space tokens, but this is mainly for partitioning we can now manage ourselves. All space tokens map to a base path in the storage system, so this runs alongside a namespace separation.

File access: Efficient Data Placement

• What is the experiment's interest in revising data placement models? What kinds of revisions have gone through?

Data placement is for data access. We now have a mixture of pre-placed data (because we know from experience it will be popular) and dynamic data placement based on popularity. We will almost certainly keep this model, but continue to refine it.

• What is the experiment's interest in data federations?

We see them as being a useful extension to current models with many good features so worth pursuing, but we will be able to work fine with non-federated storage.

• What sort of assumptions does the experiment need to make for data federations work?

There is question over whether the data management system should manage what is in the federated storage or not. With Rucio it would be possible to do so, but there is a potential conflict if sites still have the right to delete it. Seems difficult to accommodate a middle ground, though there is disagreement within Atlas on this.

• For smaller sites, would caching work for your experiment?

A pure cache based on improving data access at a site should be managed by the site - they should use that as they see fit. If 'cache' is taken to mean dynamic placement of useful data at a site then we do do this already. We should be quite precise about what's meant here.

File access: WAN protocols

• Do you need gridftp? Can you use http?

Currently almost all WAN transfers are managed via FTS, which is strongly bound to gridftp. We have experimented with http(s) and it is very desirable from the point of view of having standard clients such as wget/curl able to interact with grid storage.

We have a trial web based download service for users which has received very positive feedback.

• Does your framework already support HTTP-based access?

Not in a mature state.

File access: Clouds

• Could you work directly with cloud file systems (i.e., GET/PUT)? Assume "cloud file systems" implies REST-like APIs, transfer via HTTP, no random access within files, or at least limited byte-range access, no third-party transfer, and possibly no file/directory structure. See Amazon S3 for inspiration.

For job level data access we can manage this. Lack of third party transfers would be a major problem for moving large amounts of data into such a system. Not clear if a transfer between an out-of-cloud storage system could push data into the cloud making this more acceptable.

File access: Local

• Data access protocols - do you need our own special ones (xroot, rfio, dcap)? Do you see that http or WebDav be used here too?

We can work with anything supported by root (=> all of the above). The most promising ones to develop for us would be:

xroot - for its excellent level of robustness for clients;

file - because it's a no brainer for the client and VFS caching. ;

http/webdav could be used, but are probably more useful at the WAN level.

• Could you work directly with clustered file systems at smaller sites?

We already do.

Security / access controls:

• Security / VOMS - what are your expectations / needs?

Access to ATLAS data must be secured. However, the security layer should be an non-intrusive as possible and should have only a small impact on performance. We certainly would like read access to work without VOMS extensions. Other credential systems are usually more convenient for users, e.g., using krb tokens to access data in EOS.

• Access control - what is really needed?

Read access only to collaboration members.

Write access should be managed in a very broad way, e.g., clear separation between user and production data. Group data should not be deleted by user and we have that already and can be related to the namespace. Don't mind relying on site to get it right but back doors need to be closed.

More sophisticated authorisation should come from the experiment layers, e.g., we may want replicas to have multiple 'owners' in DDM, but clearly this does not map to any posix concept (even with ACLs there is only one 'owner').

Related to this, we don't want storage system to implement quotas. All files are owned by the data management system - multiple people locking with rules doesn't map to this.

Namespace:

• Where do you see your namespace management "philosophy" evolving in the future

We want to simplify the relationship between our 'namespace' and the SURL on the site. This is to increase efficiency and robustness. However the intention is to do this using a deterministic mapping between the logical file name and local file name, not with directory structure. This is to cope with overlapping datasets and ensure a uniform spread of files in directories.

• How will we manage a global namespace - whose problem is it? the experiments? Is there a continued need for LFC?

ATLAS plans no global namespace 'concept'. See comment above for SURLs. We do need a replica catalog, but we do not use or plan to use the LFC's logical namespace.

Data searches will be done using metadata at the dataset level.

-- WahidBhimji - 18-Nov-2011

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2012-01-23 - WahidBhimji
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback