EMI Data Client Consolidation F2F Meeting, Feb. 11, 2011, 09:30-13:00

Present: Ricardo R., Zsolt, Zsombor, David, Jon

Where: 28 R-014 @ CERN

Jon starts with a presentation

  • overlapping functionality
  • one of the goals of EMI to reduce code to maintain
  • PEB needs very good reasons to keep both libraries
  • Goals for this meeting: document before the Vilnius meeting - with an agreement and a high level plan
  • we need to evaluate current libs/clients
  • investigate what features in the libs/clients are used actually
    • David asks: is any group within EMI (EGI?) who can help investigate the user's need?
  • identify scenarios and analyse - efforts, costs, benefits, risks, dependencies
  • get a decision from PEB/PTs/EGI?
  • implement!
  • possible scenarios: ARC/gLite libs/clients replacing eachother in all kind of combination - which one makes any sense? replace ARC libraries with gLite libraries and rewrite the ARC clients or vice versa? implement something completely new? and then support three sets of libraries?
  • ARC's point of view: don't want unusual/non-portable external dependencies - can use gLite libraries, if it does not bloat the code to much - ARC feels of course that ARC clients/libraries do everything actually needed
  • ARC data libs are used on the server side also
    • Zsolt: gFAL is currently used only client side, but now FTS will use it also!
    • Ricardo: the jobs will go the the worker node and use the client libraries to download files, so it is not only used on the UI, also it is used on some monitoring solution
  • do we have measures to be able to do cost estimate?
  • what about NFS4.1? does it make some big part of data libraries useless?
    • Ricardo: we need tools to query and manipulate the LFC - it does not seem possible to use only NFS4.1
    • David: to get NFS4.1 everywhere will take probably a long time (after EMI)
    • Ricardo: we can have EMI-open/read/close: first we will channel them to the ARC/gLite libraries, then later it could do NFS stuff...?

Discussions

lcg-utils has lot of direct manipulation of catalog

lcgutils CLI calls the lcgutils API, which will use GFAL to do the actual file transfer GFAL does everything, e.g. creating catalog alias

people are familiar with POSIX-like way to deal with data, that's why they like GFAL, because it feels the same

GFAL has a python interface, the researchers probably use that, but Zsolt has the feeling that they are not using that, they probably use RFIO and dcap directly, Ricardo has the feeling that GFAL is actually used more than that; the jobs don't expect that any data will be present in an NFS mounted area, they expect that they will use some remote I/O or they ask the files to be staged in in the JDL

ARC jobs usually specify the files to stage in, but they can do remote IO, if the WNs can access the internet at all, the ARC client libraries are normally not available on the WNs (if they are available and the WNs have internet connection, it is probably advertised)

in ARC you can do full job lifecycle almost without using arccp, arcls or arcrm

if NFS4.1 would be available the ARC CE would mount storages? it would probably be the decision of the site owner, it could be something like a runtime environment but if the WN has access to the storage through NFS4.1 then do we need data libraries on the WN? if the job specifies LFNs, then it's not enought to have the NFS4.1 mounted, because you need the resolve the LFNs to SURLs, and then figure out which SURL is mounted where - so we need some data libraries to do that - and at the end the results should be registered...

ARC does not have own catalogue - ARC users who needs catalog are using LFC now - they can rely on the arc data libraries to use LFC when stagein and stageout the job's input/output, but then they have to use lfc-*, it seems like they are happy with that - Matties does packages for lfc-*

for ARC is more typical that the user's install the clients on their machines, while for gLite there are lots of UI machine, users more typically just log into those

What about Unicore? Java? Somebody made a Java implementation of the GFAL interface. We should ask them and the PEB what is the plan with UNICORE related the data clients.

Ricardo's summary of ARC data libraries: they offer file access, but not in a remote I/O way - only get and put - it is used by user tools and the middleware - all the cataloge interaction needed is handled (you can specify LFN), but you have to specify the LFC URL, but you cannot discover LFCs - the client tools are basically: cp, ls, rm, with all kind of options to do fancy stuff - but it has less features than lfc-*/lcg-* The POSIX like interface is provided by GFAL, it will hide the underlying rfio and dcap libraries, and xroot could be added GFAL can be installed without dcap and rfio - then it will be only able to use file://

consolidating the catalog libraries: currently ARC data libraries uses the LFC libraries to do the catalog magic consolidating the SRM libraries: ARC SRM library has no external dependencies (only the ARC HED libraries), GFAL uses its own SRM library, which could be maybe replaced with the ARC one (but the ARC HED libraries needed! is it too much?)

ARC does not use rfio or dcap, because it's remote IO, ARC only support gridftp (and http(s/g)), ARC uses the official globus gridftp client library GFAL does not support https, yet (dCache support https)

(when httpg goes to https, then what about delegation? - myproxy? - srmCopy? - FTS can use srmCopy to ask the SRMs to do the copy... is this relevant for the data libraries?)

possibly GFAL could use the ARC http client code to add http to the supported protocols?

xroot (dcap?) supports vector reads, where you can say which blocks you want to read, and it will preload this - posix has something like that, currently GFAL does not support this

SRM end user tools - dcache has it, we should use that

LFC end user tools - do they support direct replica manipulation? (it seems like it they miss adding replica) - lcg-utils has replica management, but if lfc could do that, it would be not needed.

we also should consider the language: C and C++

we made a table of common functionality (photo taken), we should collect this into a document, and then let the PTs evaluate?

action item: Jon will do everything :), and asks us if he needs help - Ricardo is willing to put effort in getting a summary

-- JonKerrNilsen - 09-Feb-2011

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf datalib_consolidation_f2f_CERN110211.pdf r1 manage 26.8 K 2011-02-18 - 16:50 JonKerrNilsen Intro slides for F2F meeting
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2011-02-18 - JonKerrNilsen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback