-- RobCurrie - 2015-04-13

Ganga LHCb 6.1 Information

This page will contain some important information for those looking to useGanga 6.1 for LHCb.

Main changes in Ganga 6.1

The main changes for Ganga 6.1 for LHCb users can be summarized as the following:

  • New queues interface for performing computing in the background. Jobs now make use of inputfiles and inputdata to request data on a given backend. DiracFile has new features/abilities not present in 6.0

  • !!! Deprecation of job.inputsandbox !!!

  • Deprecation of j.backend.inputSanboxLFNs , users should now use j.inputfiles for all file objects

  • More use of IGangaFile filetypes. This means all of the file objects that could be used in outputfiles can now be used in inputfiles.

  • Using a newer LHCbDirac version when submitting jobs.

  • DIRAC ApplicationStatus information is now available by default through the backend in Ganga. More information from other backends to follow. This is accessible through job.backend.extraInfo. Hopefully this will get added to the jobs summary in the near future.

  • *Experimental* Parallel job submission, This is recommended for experts and brave people as it hasn't fully been debugged yet but all that is required t enable this is to set:
j.parallel_submit = True

  • Improved Startup/Shutdown

  • Code improvements and some light optimizations

Features not in Ganga 6.1

This release does not yet support CMake. This is coming in a future 6.1.x release

Using Ganga 6.1 for LHCb

To try using Ganga 6.1 with LHCb you can Setup the environment with:

SetupProject ganga v601r8

This will setup the environment to use Ganga 6.1.

This version makes use of LHCBDirac v8r1/

Expected Forward / Backward Compatability

Ganga 6.0 jobs in Ganga 6.1

All jobs created by Ganga 6.0 should be readable by Ganga 6.1

Ganga 6.1 should be able to resubmit any job created by Ganga 6.0

Ganga 6.1 should be able to copy any job created by Ganga 6.0 and create a new Ganga 6.1 job.

Ganga 6.1 jobs in Ganga 6.0

Ganga 6.1 jobs are mostly readable/monitorable in Ganga 6.0.

Jobs containing objects NOT present in the 6.0 release will NOT be readable by the 6.0 Ganga release.

Ganga 6.0 cannot be relied upon to resubmit Ganga 6.1 jobs.

Ganga 6.0 cannot be relied upon to correctly copy Ganga 6.1 jobs

Converting job scripts from Ganga 6.0 to Ganga 6.1

Many scripts for Ganga 6.0 will NOT run in Ganga 6.1 without some modifications. (Thankfully in most cases the amount that has changed is just to use j.inputfiles instead of j.inputsandbox)

The general rule to make a script work is to replace inputsandbox with inputfiles and correct your filetypes as appropriate.

It is recommended you follow this guide and carefully watch the Ganga feedback.

Examples for Ganga scripts

These ganga scripts should demonstrate some important differences between Ganga 6.0 and 6.1.

Example 6.0 Ganga Job Equivalent 6.1 Ganga Job
j=Job()
j.backend=Dirac()
j.inputsandbox=['/tmp/someFile']
j.inputdata=LHCbDataset(['/some/LFN'])
j.outputfiles=[LocalFile('someOutputFile')]
j.submit()
j=Job()
j.backend=Dirac()
j.inputfiles=[LocalFile('/tmp/someFile')]
j.inputdata=LHCbDataset([DiracFile(lfn='/some/LFN')])
j.outputfiles=[LocalFile('someOutputFile')]
j.submit()
  • Here the file 'someFile' is sent to (LHCb)Dirac using a string object in the inputsandbox.
  • Here the LFN '/some/LFN' is requested using a LogicalFile.
  • Here the 'someFile' file is sent to (LHCb)Dirac using a LocalFile object in inputfiles.
  • Here the LFN '/some/LFN' is requested using a DiracFile.

Example 6.0 Ganga Job 2 Equivalent 6.1 Ganga Job 2
j=Job(application=Bender(), backend=Dirac() )
j.application.module = '/tmp/myModule.py'
datatmp=BKQuery('/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp/Real Data/Reco14/Stripping20/90000000/DIMUON.DST', dqflag=['OK']).getDataset()
j.inputdata = datatmp
j.inputsandbox = [ '/tmp/myUniqueFile' ]
j.outputfiles = [ LocalFile ('myUniqueFile') ]
j.splitter = SplitByFiles ( filesPerJob = 100 )
j.submit()
j=Job(application=Bender(), backend=Dirac() )
j.application.module = '/tmp/myModule.py'
datatmp=BKQuery('/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp/Real Data/Reco14/Stripping20/90000000/DIMUON.DST', dqflag=['OK']).getDataset()
j.inputdata = datatmp
j.inputfiles = [ LocalFile ('/tmp/myUniqueFile') ]
j.outputfiles = [ LocalFile ('myUniqueFile') ]
j.splitter = SplitByFiles ( filesPerJob = 100 )
j.submit()
  • Here datatmp is a LHCbDataset which contains LogicalFile objects.
  • Here 'myUniqueFile' is passed as a string into the inputsandbox.

Example 6.0 Ganga Job 3 Equivalent Ganga 6.1 Job 3
j=Job(application=DaVinci(), backend=Dirac() )
j.inputsandbox = ['/tmp/mySandboxFile']
j.backend.inputSandboxLFNs = ['/my/custom/LFN']
j.application.optsfile = [ File ( '/my/opts/file.exit' ) ]
datatmp=BKQuery('/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp/Real Data/Reco14/Stripping20/90000000/DIMUON.DST', dqflag=['OK']).getDataset()
j.inputdata = datatmp
j.splitter = SplitByFiles( filesPerJob = 100 )
j.submit()
j=Job(application=DaVinci(), backend=Dirac() )
j.inputfiles = [LocalFile('/tmp/mySandboxFile'),DiracFile(lfn='/my/custom/LFN')]
j.application.optsfile = [ File ( '/my/opts/file.exit' ) ]
datatmp=BKQuery('/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp/Real Data/Reco14/Stripping20/90000000/DIMUON.DST', dqflag=['OK']).getDataset()
j.inputdata = datatmp
j.splitter = SplitByFiles( filesPerJob = 100 )
j.submit()
This example uses inputSandboxLFNs In Ganga 6.1 inputSandboxLFN is now deprecated and users can simply request DiracFile objects to be assigned to the inputfiles attribute. NB: The optsfile can not currently be set by a LocalFile attribute.

Deprecation of LogicalFile and PhysicalFile

Firstly scripts should remove all references to LogicalFile and PhysicalFile objects.

The appropriate transforms to use are:

File object in Ganga 6.0 Equivalent File object in Ganga 6.1
LogicalFile DiracFile
PhysicalFile LocalFile , MassStorageFile
If you attempt to run a job with LogicalFile Ganga will attempt where possible to create a DiracFile for you behind the scenes. You should try to replace LogicalFile with DiracFile.

If you attempt to run a job with a Physical File Ganga will attempt to replace the object with a 'LocalFile'. This is potentially dangerous and you should check this is what you expect!!!

New Ganga FileTypes

Ganga 6.1 allows users to make use of new filetypes when submitting jobs through the inputfiles interface. These filetypes are:

Filetype Description
DiracFile This is a file stored, or to be stored on a StorageElement (SE) managed by the Dirac system to provide an LFN to access the job.
LocalFile This is a file which is stored locally which can be accessed using the standard UNIX commands, e.g. cp, mv, rm, ...
MassStorageFile This is a file which is stored on a MassStorage system which can be accessed through a custom set of tools. i.e. for a file stored on CASTOR the file is accessible through, nsls, rfcp, nsmkdir, ...
GoogleFile This is a file which is stored on the Google File system, still WIP
CERNBoxFile Not yet ready but will handle files stored on CERNs CERNBox storage system
NB: Not all file objects are yet supported on all backends for inputfiles and outputfiles.

Dirac Job Splitting in Ganga 6.1

Ganga 6.1 will offer a new configuration option to chose the default algorithm for splitting files.

By default we will perform the splitting using a new Splitter written within ganga which gives the user some additional control over how their jobs are to be split.

Ganga 6.1 now uses a custom written splitter which reduces the redundancy of the jobs in favour of much fewer subjobs per dataset.

This can be configured by changing the 'SplitByFilesBackend' variable in the .gangarc. (The default algorithm from Ganga 6.0 is 'splitInputData')

DiracFile in Ganga 6.1

DiracFile in 6.1 allows much greater control for uploading a local file through the 'put' method in addition to replicating the file across multiple SE through the 'replicate' method.

Care should be taken when constructing a DiracFile object as the object is can be initialized through a single string argument. The following are some ways to construct a DiracFile.

DiracFile(lfn='/some/valid/LFN.ext')
# is the same as:
DiracFile('LFN:/some/valid/LFN.ext')

# This produces a local file
DiracFile('/my/Local/File.ext')
# This produces a local file which doesn't exist!!!
DiracFile('/lhcb/some/invalid/file.ext')

LHCbTasks in Ganga 6.1

This needs to be written properly. The following is written by an early user, not an expert!

You may also want to consult this page: https://www.gridpp.ac.uk/wiki/Guide_to_Ganga#Using_Tasks_for_Automated_Submission_and_Job_Chaining

Ganga now supports Tasks (actually since before 6.1), which provide a lot of nice functionality for chaining together multi-step jobs (and more?).

A Task contains a list of Transforms, which will typically run in sequence, and a 'float' member, which is the number of Jobs the Task will have running at any time.

A Transform has most of the same attributes as a Job, and owns a list of Units (which are automatically generated). They can also own input bookkeeping queries (as opposed to a Job which just owns a list of files), and should be able to intelligently [re-]run new Jobs when the input data changes.

Each Unit of a Transform will correspond to a Job appearing in the `jobs' list (named something like "T2:1 U0" for unit 0 of transform 1 of task 2).

The following is an example of a Task containing two Transforms, the first of which creates a DST which is used as the input to the second.

t = LHCbTask(
 float = 100,
 name = '...'
)
trans1 = LHCbTransform(
 name = 'My first Transform',
 backend = Dirac(),
 files_per_unit = 1000,
 application = Moore(...),
 splitter = SplitByFiles(
  filesPerJob = 50,
  ignoremissing = True
 ),
 outputfiles = [
  DiracFile('triggered.dst'),
 ]
 )

t.appendTransform(trans1)

datalink = TaskChainInput(
 include_file_mask = [ '\.dst$' ],
 input_trf_id = trans1.getID()
 )

trans2 = LHCbTransform(
 name = 'My second Transform',
 backend = Dirac(),
 files_per_unit = 100,
 application = DaVinci(...),
 splitter = SplitByFiles(
  filesPerJob = 1,
  ignoremissing = False
 ),
 outputfiles = [
  LocalFile('*.root')
 ]
 )
trans2.addInputData(datalink)
t.appendTransform(trans2)

bkk = BKQuery(path = '...')
# trans1.addQuery(bkk)  <-- This functionality is broken in Ganga, currently (11th April 2018).  Instead do:
trans1.addInputData( bkk.getDataset() )

Just as you type `jobs' to see your Jobs, `tasks' shows your Tasks. To start the Task created above you would type `t.run()', sit back and wait. `tasks.table()' and `t.overview()' will give more information about the status of your Task.

If the input query contains 2000 files then 2 Units will be created (files_per_unit=1000). Once a Unit has completed for transform 0, a corresponding Unit will be created for transform 1.

You can stop a Task using `t.pause()'.

A list of gotchas etc will take shape at: GangaTasksFAQ

Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2018-04-11 - DanielJohnson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb/FAQ All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback