J. Jensen review comments (extracted from email)

First of all, I am happy that EMI has a data storage and management area. As head of the GridPP storage and data management group I welcome this, and I am glad to see it in good hands, and I look forward to collaborating with EMI.

As another general comment, I welcome the strong emphasis on interoperability and open standards and working with standards bodies. I also welcome the migration plan to GLUE 2.0. Note that this also depends on components outside EMI, so should be coordinated (and interoperation tested) with those components.

I must correct one mistake: it was not at all a "flaw in the design" that catalogues and storage elements were left "unsynchronised", it was an intentional design decision which dates back to the European DataGrid's Architecture Task Force. The rationale behind the decision (as I recall) was that (a) services should be loosely coupled and catalogues and SEs are distinct services, (b) avoiding cyclic dependencies in the data management hierarchy, eg SEs depending on catalogues depending on SEs - it was thought that a tree-like hierarchy would provide the better architecture, and finally (c), that the grid should be resilient to recover from failure.

How is feedback obtained from sysadmins and users? (Is usability an issue for users: how many users access SEs directly?) Also, many projects are now using gLite, eg WLCG, EUIndiaGrid and suchlike; how are they consulted?

How is work prioritised and triaged (eg planned work combined with requests from outside)?

The words "global parallel filesystem" (section 2) make me slightly nervous; what are the aims behind "global"? Do you foresee eg jobs at one site accessing files directly at other sites (as is currently being discussed by WLCG), or will jobs be moved to the site where the data is, or files "prestaged" to the site where the job will run (eg by a job wrapper). A "global filesystem" sounds like a free for all data accessing across sites, and it is probably better to have more intelligent data management.

I do not see portability addressed. I know moving from SL4 to SL5 was painful, and this was just from one version of SL to another! I know some people who tried to port DPM to Solaris, and failed (they got stuck with the VOMS components). The requirements and ambitions for portability should be addressed.

Is there a release plan, quality control, and pre-release testing, or is this part of a wider EMI plan? There should probably be a reference. However, testing SEs and data management is specific, so there should be a documented test suite which includes stress tests and must be passed prior to release.

Specific comments:

I noted that the "state of the art" (section 3) only describes the state of the art within the EMI middleware. Which is fine, but the title is then slightly misleading.

Interesting to see dCache implement CDMI. Is there a wider role for CDMI, and if so, what?

Similarly, ByteIO was mentioned, which I also thought was an interesting aspect. Which role is foreseen for ByteIO?

I noted, of course, that the non-standardised xroot is not in the roadmap.

Is any work foreseen on WAN transfer protocols? Will GridFTP (in its current implementation) remain the only common protocol? Is there a roadmap for GridFTP v2?

Incidentally, I also noted iRODS, Apache, and Hadoop mentioned in connection with Unicore. Are these included in the EMI plans?

The harmonisation goals (4.1) sound quite ambitious, yet it is not clear which components will be harmonised (or perhaps it is not clear that the work described in 4.1.* is the same as "harmonisation"). For example, should ARGUS be a single PDP for all middleware? Should they share libraries? Should the code be refactored and shared between the three stacks? That's roughly how I would interpret "harmonisation".

Some things are probably more useful to harmonise than others, eg authorisation and accounting, and deployment frameworks. You address this in section 4.2 to some extent.

Incidentally, which deployment frameworks will be supported? Is this within a wider roadmap of EMI?

Which monitoring frameworks will be supported? As far as GridPP is concerned, we are happy with just Nagios, as is already in the plan.

4.1.1. In GridPP, we have run some synchronisation checks and found few discrepancies, mostly due to intentional deletions. I am not convinced it is worth having tight coupling behind the services, although I am quite convinced it is worth having the option to check synchronisation and address any issues.

For example, many of our users use different catalogues, and a synchronisation between DPM and LFC would be a hindrance rather than a help (unless it's optional of course).

Incidentally, in the same cases we also checked file integrities against those calculated by the SEs.

4.1.3. SRMs are actually interoperating, so it would be interesting to see more details about the proposed work. Perhaps at OGF30 (hint)? A number of issues are known, and work should be agreed/coordinated with SRM implemenations outside EMI.

4.1.4. is a good idea (and has been discussed before).

4.1.5. is also a good idea; hopefully this can be integrated with the rest of the software so jobs running on a WN can automatically pick up files over NFS.

4.2.2 is a bit thin for two very complex topics (maintainability and usability). If you plan to address this properly, these will need a lot of expansion!

-- PatrickFuhrmann - 15-Nov-2010


This topic: EMI > DeliverableDJRA121ReviewJensJensenLetter
Topic revision: r1 - 2010-11-15 - PatrickFuhrmann
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback