NikhefLocalSoftware
This is the LHCb Nikhef group page describing locally available software
Pre-requisites and outcomes
You are assumed to know what SLC6 is, how to use a unix platform, and to have done the lessons from the
LHCb starterkit
.
This Twiki will discusses how you can use the LHCb software at Nikhef to work most efficiently.
Introduction
LHCb software is Open Source, technically speaking anybody in the world can obtain and use our software, but it is also technically difficult to get it to work.
So, there are many tools,
tutorials, and at every site some local expertese is on hand to help you out. Don't wander blindly through directory structures, if you get stuck, ask somebody in the group for help!
- Who manages our software? Who should I call if there's a problem?
GerhardRaven loves to help if there are any problems with the local software installation.
If necessary they will contact the helpdesk on your behalf, or the correct people in the Grid community directly.
- Where is LHCb software installed?
LHCb software is typically installed at:
- CERN build machines, maintained by core software
- CERNVM, maintained by core software
- GRID sites, maintained by LHCb Dirac and the local site administrators
- External institutes, maintained by individuals at those institutes
- Laptops/Desktops of users, maintained by those users
Complete stacks of LHCb software requires many tens of GB of memory per installation.
- Software installations at Nikhef
At Nikhef the LHCb software is availiable through:
- CVMfs, the local grid installation,
/cvmfs/lhcb.cern.ch/
, this is the recommended source.
- The LHCb software environment
To setup the LHCb software environment using the local software at Nikhef:
-
source /project/bfys/lhcb/sw/setup.sh
, using the CVMFS installed software
- You can, for example, add that into your
.bashrc
if you always want the lhcb environment.
- Availability
- The CVMfs software installation is mounted on the StoomBoot machines, and most local SLC6 desktops.
- The Afs software installation is available on desktop machines, but not on StoomBoot.
- For more information on using these machines, see NikhefCPUResources
- Which software, exactly?
- On the NFS mount, only a subset of the software is installed, it has to be done manually. Contact Gerhart or Roel to install anythin which you need.
- On the CVMfs and Afs mounts, all LHCb software which has not yet been archived is installed. It is installed centrally by the LHCb Core Software team.
- OK, but how do I actually run it?
- Right, it is assumed you've done the software tutorials
- It's a good idea to run everything inside of Ganga, since that is the simplest thing to do, and you only need to remember one command.
- Local Ganga and GRID certificates
- It's not necessary for you to do anything special to pick up the local Ganga version, or to use Ganga at Nikhef.
- It is however required to make sure you have the right VOMS settings. This can be done by adding the following in your .bashrc:
export X509_CERT_DIR=/cvmfs/lhcb.cern.ch/etc/grid-security/certificates
export X509_VOMS_DIR=/cvmfs/lhcb.cern.ch/etc/grid-security/vomsdir
- Ganga is installed along with the rest of the LHCb software, and is avilable for you when you have sourced the environment (above).
- Ganga works by just calling
ganga
.
- eos
- eos is not available as a mount on the stoomboot nodes, but you can access files in ROOT via the xrootd protocol.
-> Ganga and local resources
- Ganga will use the local software
- Ganga can run and manage applications and jobs for you on the local NikhefCPUResources
- Ganga can keep files for you in the many available places listed in NikhefDiskResources
- You don't need any special configuration to use Ganga, with StoomBoot, the Grid, or anything else.
- However, some tweaks and specializations in the UI are useful depending on how you work.
- Once the environment is sourced, GangaNikhef is aliased to Setup and Run ganga in one command.
- For tweaking the UI, take a look at my .gangarc, for example
~rlambert/.gangarc
-> Ganga productivity tips
- as Ganga will need time to submit and finalise jobs, it is advisable to run it inside a screen (or tmux) or VNC session to make sure it does not get interrupted if e.g. your ssh connection drops or you log out. On lxplus this requires a valid afs and kerberos token: with
pagsh.krb -c 'kinit && screen'
it is sufficient to run kinit
once a day to renew them, k5reauth
is recommended for longer-running programs.
- output files that are too large to keep in your ganga directory but not suitable to keep on grid storage (e.g. Ntuples) can be sent to a "mass storage", e.g. EOS at CERN (default settings) or
/data/bfys
when running Ganga at Nikhef, by declaring them to be a MassStorageFile, e.g. j.outputfiles=[MassStorageFile(namePattern="*.root", outputfilenameformat="MyTuples/{jid}/{sjid}/{fname}")]
In order to use a normal filesystem, e.g. /data/bfys
, set the following option in your .gangarc
MassStorageFile = {'fileExtensions': [''], 'uploadOptions': {'path': '/data/bfys/username', 'cp_cmd': 'cp', 'ls_cmd': 'ls', 'mkdir_cmd': 'mkdir'}, 'backendPostprocess': {'LSF': 'WN', 'Dirac': 'client', 'PBS': 'WN', 'LCG': 'client', 'Interactive': 'WN', 'Localhost': 'WN', 'CREAM': 'client'}}
- submitting jobs with many subjobs can take a long time. This can be parallelised with the user queues:
queues.add(j.submit)
will add this task to a queue, which is processed by a pool of worker threads (the number can be changed from the .gangarc), so you can submit a number of jobs at the same time and meanwhile keep using the interactive shell. The queues
command shows the status of the queues (by default 3 user queues and 3 monitoring queues; those handle job finalisation).
- Ganga can resubmit failed subjobs automatically (for an acceptable fraction of failures and up to a few times). This can be activated by setting the
do_auto_resubmit
attribute of the job to True
. You can also do this manually with the command jobs(i).subjobs.select(status='failed').resubmit()
.
- the monitoring loop goes sequentially through the subjobs of each master job, so it may take a while before it picks up the status for all of them. For grid jobs the job monitor page in the LHCb Dirac portal
provides a more up to date overview of what your jobs are doing (you may need to login with your grid certificate by clicking on "secure connection" in the settings section first). In case the monitoring loop gets stuck, you can try to call reactivate()
or exit and restart Ganga. If a job completed successfully in Dirac but failed to finalise Ganga (e.g. something went wrong when downloading the output), you can try "sj.backend.reset()": Ganga will in that case revert the job status to "submitted", ask Dirac for the status and try to finalise it again.
- if you disagree with Ganga on what qualifies a job as successful or failed, you can add a checker to the list of
postprocessors
, e.g. for DaVinci NTuple jobs something like FileChecker(files=["stdout"], filesMustExist=True, checkMaster=False, checkSubjobs=True, failIfFound=False, searchStrings=["INFO No more events in event selection", "INFO NTuples saved successfully"])
- the job listing can be changed to show e.g. the number of submitted, running, completed and failed jobs - which is quite useful when running over large datasets - by redefining a method in the
.ganga.py
startup file. An example is attached to this page (first version by Manuel Schiller, modified by Pieter David)
- Managing your checked-out code
Use cases
- OK, so you're now a LHCb software developer, you've spotted a bug, where do you want to keep your code?
- Or, you're working within an analysis team and need common fitting software, where do you want to keep your code?
By Default:
- The environment variable
$User_release_area
(in Ganga the user_release_area
attribute of the application) tells the LHCb software where to find your packages
- if this variable is not set it is automatically
~/cmtuser
.
Think about using Afs for this:
- You may wish to run your code on both lxplus and nikhef, this is easy if your cmtuser is a softlink to your afs directory
-
ln -s /afs/cern.ch/user/<a>/<another>/cmtuser cmtuser
The 1GB available on Afs will allow you to have many packages and versions in your cmtuser area, but not entire release stacks.
- For that you need somewhere with 10s of GB of space, such as
/project/bfys/<uname>
(i.e. not afs)
- Afs, Kerberos, and Lxplus
Kerberos:
- An authentication system using tokens stored in your local filesystems, similar to a grid proxy
- Afs uses the kerberos authentication system.
- Kerberos can be used to ssh into CERN without needign to type your password every time
The relevent commands are:
- kinit create a complete ticket for forwarding
- aklog authenticate yourself to an afs cell using kerberos tokens
- klist list your kerberos tokens
- kdestroy delete your kerberos tokens
- Full instructions on using kerberos can be found here
- kerberos can be used to ssh to cern with the kerberos token directly
e.g.:
-
kinit <yourusername>@CERN.CH
-
klist
-
kdestroy
-
klist
Connecting to CERN via ssh can be done with kerberos:
Communicating with the svn repository can also be done with kerberos
Getting local access to Afs can be done with kerberos
- Afs is an authenticated file system, authenticated with kerberos
-
kinit <yourusername>@CERN.CH
-
aklog
-
cd /afs/cern.ch/user/<u>/<username>
NikhefLocalSoftwareManagement
See
NikhefLocalSoftwareManagement for management instructions on the local software.
--
RobLambert - 24-Oct-2011