Grid and data management

What can you expect from the grid?

Prerequisites:

You will need to be familiar with python. After that you will need to do AT LEAST the following tutorials before starting the DaVinci tutorial.

You need at least one DaVinci job which works on the grid and makes some output files

Slides:

This tutorial corresponds to the slides last shown at the LHCb week here

1. Let's resubmit with changing things

I assume you have a previous successful job called jobs(6), with a number of subjobs which we will copy and mess with below

j=jobs(6)
for js in j.subjobs:
    js.resubmit()

for js in j.subjobs:
    js.backend.settings[‘CPUTime’]=js.backend.settings['CPUTime']*2
    js.resubmit()

for js in j.subjobs:
    js.backend.settings[‘BannedSites’]=[js.backend.actualCE]
    js.resubmit()

j=j.copy()
j.inputdata=jobs('6.1').inputdata
j.splitter.filesPerJob=1+len(j.inputdata)/10
j.submit()

2. Let's move some data

I assume you have a previous successful job called jobs(7) which we will copy and mess with below. I assume it makes a file called DVnTuples.root.

In [1]: j=jobs(7).copy()
In [2]: j.outputsandbox
Out[1]: [‘DVHistos.root’, ‘DVnTuples.root’]
In [3]: j.outputsandbox= [‘DVHistos.root’]
In [4]: j.outputdata=[‘DVnTuples.root’]
In [5]: j.outputdata.location=‘GridTutorial’
In [6]: j.submit()

Then copy that data around

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: ds.replicate(‘CERN-USER’)
In [3]: ds[0].download(‘/tmp/’)
In [4]: afile=PhysicalFile(‘/tmp/DVnTuples.root’)
In [5]: dscp=afile.upload(‘/lhcb/user/<u>/<uname>/GridTutorial/DVnTuples.root’)
In [6]: j.backend.getOutputData()

3. Where are the data?

In [1]: j.application.outputdir
In [2]: j.peek()
In [3]: ds=j.backend.getOutputDataLFNs() 
In [4]: reps=ds.getReplicas()
In [5]: reps[ds[0].name]

4. Using the Ganga Box

Before you delete the job, and if you want to keep the output for some time, why not put the LFNs in your Ganga Box?

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: box.add(ds, j.id+' '+j.name+’ Output LFNs’)
In [3]: j.remove()
In [4]: box #print the content of the box

5. Cleanup

Following GridStorageQuota.

To see how much space you are using, you need to go to a new shell and start Dirac:

$ SetupProject LHCbDirac
$ dirac-dms-storage-usage-summary --Dir /lhcb/user/<u>/<username> 

You can remove all copies of files from a given dataset within Ganga:

In [1]: ds=j.backend.getOutputDataLFNs()
In [2]: for d in ds
  ....:    d.remove()

And find out what's left over from Dirac, and exterminate it:

$ SetupProject LHCbDirac
$ dirac-dms-user-lfns  
$ dirac-dms-remove-files <a-list-of-lfns>

6. Advanced cleanup

Load the dirac-dms-user-lfns into an LHCbDataset:

f=open('<a-list-of-lfns>')
files=f.read().strip().split('\n')
f.close()
for i in range(len(files)):
  while('//') in files[i]:
    files[i]=files[i].replace('//','/')
files=['LFN:'+f for f in files]
ds=LHCbDataset(files)

Then go through the datasets you want to keep, subtracting from this list:

ds_diff=ds
for ds2 in box:
  ds_diff=ds_diff.difference(ds2)

Then remove them:

for df in ds_diff:
  df.remove()

Tips

As with the above example, most things can be reduced to a line or two using Ganga Utilities. It's simple to get working and can save you a lot of time.

(1) is just gu.subjob_resubmit(j), to resubmit with changing the settings automatically, and j=gu.resplit(j) to make a new job with the failed subjob inputdata

(6) is ds=gu.dataset_from_file('<a-list-of-lfns>'); keep=gu.boxLFNs(); ds_diff=ds.difference(keep); to get the list files to remove.

-- RobLambert - 25-Nov-2010

Edit | Attach | Watch | Print version | History: r13 | r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2010-12-02 - RobLambert
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback