LHCb Bookkeeping Working Group Meeting

Date and Location

25 May 2007


Andrew, Markus, Michael (phone), Niko, Olivier C., Olivier D., Philippe (chair, notes)




ATLAS and CMS bookkeeping (Andrew)

See slides attached to the twiki page

ATLAS doesn't have an intermediate entity between files and dataets (that can be as large as 300,000 files). A job can only run on a site that contains the complete dataset. This is a problem for large datasets.

CMS on the other had have the concept of "file block" (typically 10 to 20 GBytes) that travel all together. This makes it easier to find eligible sites.

Both experiments de-reference the datasets to files before submission (e.g. in Ganga for ATLAS).


The WG considers that the concept of datasets is very useful. Our computing model though is such that there is probably no need for as extreme solutions as in ATLAS (who anyway are not happy with what they have). It would be important at the level of users: they make a query with certain criteria (e.g. MC signal for B->J/PsiK0s, brand 2008), get a dataset name which they use subsequently when defining their input data in ganga.

It is then up to ganga and DIRAC to translate this dataset name into files, to split the job into runable sub-jobs (the data-location-driven splitter is already there in ganga) and submit it. Although all DSTs are supposed to be available everywhere, this strategy would allow some files to be missing at some sites without the user having to care.

The dereferencing must be done before job submission as otherwise changes in the dataset structure might occur (new files) that would prevent doing it on the WN (e.g. a reference for files 11 to 20 of a dataset might change meaning and cause different sub-jobs to process the same file). It would only work is the dataset is complete and closed (could save time at submission).

It is believed that the proposal to define datasets in the LFC as a directory of symbolic links to real entries is an attractive possibility. In order to have a consistent view (bookkeeping, DIRAC, ganga) of datasets, a meeting will be organised between the WG, DIRAC and ganga developers (PhC)

End of the meeting at 10:55

-- Main.phicharp - 25 May 2007

Topic revision: r1 - 2007-05-29 - PhilippeCharpentierSecondary
