Grid and data management

What can you expect from the grid?


You will need to be familiar with python. After that you will need to do AT LEAST the following tutorials before starting the DaVinci tutorial.

You need at least one DaVinci job which works on the grid and makes some output files


This tutorial corresponds to the slides last shown at the LHCb week here

1. Let's resubmit with changing things

I assume you have a previous successful job called jobs(6), with a number of subjobs which we will copy and mess with below

for js in j.subjobs:

for js in j.subjobs:

for js in j.subjobs:


2. Let's move some data

I assume you have a previous successful job called jobs(7) which we will copy and mess with below. I assume it makes a file called DVnTuples.root.

In [1]: j=jobs(7).copy()
In [2]: j.outputsandbox
Out[1]: [‘DVHistos.root’, ‘DVnTuples.root’]
In [3]: j.outputsandbox= [‘DVHistos.root’]
In [4]: j.outputdata=[‘DVnTuples.root’]
In [5]: j.outputdata.location=‘GridTutorial’
In [6]: j.submit()

Then copy that data around

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: ds.replicate(‘CERN-USER’)
In [3]: ds[0].download(‘/tmp/’)
In [4]: afile=PhysicalFile(‘/tmp/DVnTuples.root’)
In [5]: dscp=afile.upload(‘/lhcb/user/<u>/<uname>/GridTutorial/DVnTuples.root’)
In [6]: j.backend.getOutputData()

3. Where are the data?

In [1]: j.application.outputdir
In [2]: j.peek()
In [3]: ds=j.backend.getOutputDataLFNs() 
In [4]: reps=ds.getReplicas()
In [5]: reps[ds[0].name]

4. Using the Ganga Box

Before you delete the job, and if you want to keep the output for some time, why not put the LFNs in your Ganga Box?

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: box.add(ds,’ Output LFNs’)
In [3]: j.remove()
In [4]: box #print the content of the box

5. Cleanup

To see how much space you are using, you need to go to a new shell and start Dirac:

$ SetupProject LHCbDirac
$ dirac-dms-storage-usage-summary --Dir /lhcb/user/<u>/<username> 

You can remove all copies of these files from within Ganga:

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: for d in ds
  ....:    d.remove()

And find out what's left over from Dirac, and exterminate it:

$ SetupProject LHCbDirac
$ dirac-dms-user-lfns  
$ dirac-dms-remove-files <a-list-of-lfns>

6. Advanced cleanup

Load the dirac-dms-user-lfns into a LHCbDataset:

for i in range(len(files)):
  while('//') in files[i]:
files=['LFN:'+f for f in files]

Then go through the datasets you want to keep, subtracting from this list:

for ds2 in box:

Then remove them:

for df in ds_diff:


As with the above example, most things can be reduced to a line or two using Ganga Utilities. It's simple to get working and can save you a lot of time.

(1) is just gu.subjob_resubmit(j), to resubmit with changing the settings, and j=gu.resplit(j) to make a new job with the failed subjob inputdata

(6) is ds=gu.dataset_from_file('<a-list-of-lfns>'); keep=gu.boxLFNs(); ds_diff=ds.difference(keep); to get the list files to remove.

-- RobLambert - 25-Nov-2010

This topic: LHCb > WebHome > LHCbComputing > LHCbSoftwareTutorials > GridAndDataManagement
Topic revision: r2 - 2010-11-30 - RobLambert
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback