-- FlorianFeldhaus - 04 Aug 2009

DIRAC Site

overview.jpeg

Introduction 

The DIRAC site enables institutes and universities with few manpower to make their ressources available for processing LHCb grid jobs. The DIRAC site is not a full fledged grid site and is only intended to lower the burden to commit ressouces for LHCb and use standard grid tools for submitting and monitoring LHCb jobs on local ressources.

Right now only a Torque Batch System is supported. This will change in the near future.

Prerequisites for a DIRAC site

To create a DIRAC site the following prerequisites have to be fullfilled:

  • A head node with a valid host certifcate signed by an accepted CA (see List of CAs accepted for WLCG)
  • A Torque Batch System with some execution nodes (worker nodes)
  • The head node needs to be able to submit jobs to the batch system and monitor running and queued jobs
To submit test jobs and monitor your site you need to be a registered grid user for LHCb.
It is highly recommended to have a http proxy server which is able to cache larger files (10MB-1GB) and has enough free storage (>20GB).

Installation of a DIRAC site

For the first part of the installation you need root permissions. It is possible to install a DIRAC site without root permissions, but it requires some workarounds. This manual is written for a BASH. Some commands need to be adapted for different shells.
  1. Register your site and your host certificate at one of the DIRAC Admins (e.g. Stuart Paterson or Ricardo Graciani Diaz).
  2. Install OS (e.g. Scientific Linux 4.X) and configure Torque - the head node must be able to submit jobs using qsub and monitor the jobs using qstat
  3. Create a user account dirac. Also create the /home/dirac directory.
  4. Check that umask is either 0022 or 0002
    umask
  5. Create the install directory and some required subdirectories

    mkdir /opt/dirac
    chmod 755 /opt/dirac
    mkdir /opt/dirac/sbin

  6. Setup the runsvdir to be started up upon machine reboot and pointing to /opt/dirac/startup

    create /opt/dirac/sbin/runsvdir-start
    #!/bin/bash
    source /opt/dirac/bashrc
    RUNSVCTRL=`which runsvctrl`
    chpst -u dirac $RUNSVCTRL d /opt/dirac/startup/*
    killall runsv svlogd
    RUNSVDIR=`which runsvdir`
    exec chpst -u dirac $RUNSVDIR -P /opt/dirac/startup 'log:  DIRAC runsv'
  7. Change the permission to execute this

    chmod +x  /opt/dirac/sbin/runsvdir-start
  8. To make sure the services are restarted when the machine reboots add the following lines to /etc/inittab
    echo -e '\n#Run DIRAC services and agents\nSV:123456:respawn:/opt/dirac/sbin/runsvdir-start' >> /etc/inittab
  9. Create the DIRAC3 link

    ln -s /opt/dirac/pro /opt/DIRAC3
    chown -h dirac /opt/DIRAC3
  10. Copy host-certificates

    mkdir -p /opt/dirac/etc/grid-security
    cp -r /PATH/TO/YOUR/CERTIFICATES/host* /opt/dirac/etc/grid-security
  11. Set permissions for host-certificates

    chmod 400 /opt/dirac/etc/grid-security/hostkey.pem
    chmod 644 /opt/dirac/etc/grid-security/hostcert.pem
  12. Make sure that everything is owned by dirac

    chown -R dirac /opt/dirac
  13. Become dirac user

    su - dirac
  14. Get the DIRAC site install scripts

    cd /home/dirac
    wget http://lhcbproject.web.cern.ch/lhcbproject/dist/DIRAC3/DIRAC-scripts-HEAD.tar.gz
    tar xzf DIRAC-scripts-HEAD.tar.gz
    rm DIRAC-scripts-HEAD.tar.gz
  15. Get help on which options could be specified for the install script

    /home/dirac/scripts/install_dirac_site.sh -h

    This gives you an overview of the options you can (or must) set:

     -n --name SiteName Set Site Name (mandatory)
    -v --version Version DIRAC Version to install (mandatory)
    -L --LogLevel LogLevel for installed Components
    -P --path Path Site Installation PATH (default: /opt/dirac)
    -Q --Queue Queue Batch System submit Queue (default: default)
    -E --ExecQueue Queue Batch System executing Queue (default: same as Queue)
    -U --User UserName User executing the script (default: dirac)
    -p --platform Platform Use Platform instead of local one
    -h --help Print this
    • SiteName
      This is the site name which has been appointed to your site by the DIRAC administrators
    • Version
      This is the DIRAC version you want to install on the headnode. To get the latest development version use HEAD .
    • LogLevel
      Loglevel can be one of the following (listed from least to most information):
      • ERROR
      • WARN
      • ALWAYS
      • INFO
      • VERBOSE
      • DEBUG
        It is recommended to set the LogLevel to INFO.
    • Path
      This is the base path for the instalaltion. If you needed to configure something else than /opt/dirac you must put it here.
    • Queue
      Default queue to use for submission. If you want to use multiple queues have a look at #Advanced_configuration|Advanced configuration.
    • ExecQueue
      If the jobs are executed from a different queue than the submission queue it must be configured here. This queue will then be used to retrieve running and queued jobs.
    • UserName
      The username which should run the DIRAC site and who is able to submit jobs (e.g qsub for Torque) and monitor them (e.g. qstat for Torque).
    • Platform
      If a platform other than the one discovered by platform.py should be used, it can be configured here. 
  16. Execute the install script with at least the following options
    /home/dirac/scripts/install_dirac_site.sh -n $YOUR_SITE_NAME -v $DIRAC_VERSION -Q $QUEUE
  17. Configure the CEs you specified at the install_dirac_site.sh script using

    source /opt/dirac/bashrc
    dirac-config-ce -h
  18. Test the TaskQueueDirector Agent

    /opt/dirac/runit/WorkloadManagement/TaskQueueDirector/run
  19. If the agent works, create appropriate links in /opt/dirac/startup

    cd /opt/dirac/startup
    ln -s /opt/dirac/runit/WorkloadManagement/TaskQueueDirector/ WorkloadManagement_TaskQueueDirector
  20. Reboot the machine or as root run:
    su -
    /opt/dirac/sbin/runsvdir-start
  21. It is highly recommended to run a logrotate on the logfile of the TaskQueueDirector as it produces several MB per day. Here is an example config file for logrotate
    create /etc/logrotate.d/dirac
    /opt/dirac/runit/WorkloadManagement/TaskQueueDirector/log/current {
    daily
    rotate 30
    copytruncate
    compress
    notifempty
    missingok
    }

    This config file advises logrotate to make a daily backup of the logfile, compress it and clear the original logfile ( copytruncate). Backups older than 30 days will be deleted ( rotate 30). If the logfile is empty, logrotate does nothing ( notifempty). If the logfile does not exist, logrotate ignores the config file ( missingok).
    To check if the config file is correct check if current.1.gz will be created after running

    logrotate /etc/logrotate.d/dirac -f
    ls /opt/dirac/runit/WorkloadManagement/TaskQueueDirector/log/

Advanced configuration

Updating a DIRAC site

To update a DIRAC site follow these steps:

  1. Get the latest version of the DIRAC site install script
    cd /home/dirac
    cp scripts/install_dirac_site.sh ~/install_dirac_site.sh.old
    wget http://lhcbproject.web.cern.ch/lhcbproject/dist/DIRAC3/DIRAC-scripts-HEAD.tar.gz
    tar xzf DIRAC-scripts-HEAD.tar.gz
    rm DIRAC-scripts-HEAD.tar.gz
  2. Find out the config parameters for your site
  3. Execute the install script with at least the following options
    /home/dirac/scripts/install_dirac_site.sh -n $YOUR_SITE_NAME -v $DIRAC_VERSION
  4. Change to a DIRAC bash
    source /opt/dirac/bashrc
  5. For all the CEs you specified, run
    dirac-config-ce

Running a Test Job

Using lxplus

  1. Log on to lxplus
  2. Login for LHCb
    LbLogin
  3. Setup Dirac
    SetupDirac
  4. If not already done, change DIRAC setup to LHCb-Development in your local dirac.cfg (remember to remove it later!)
    echo "DIRAC\n{\n  Setup = LHCb-Development\n}" > ~/.dirac.cfg
  5. Create Test Job
    dirac-lhcb-run-test-job -p Gauss -v v37r2 -m wms -n TestJob -g
  6. Fix Gauss options file
    sed -i '40iLHCbApp().DDDBtag   = "MC09-20090602"\nLHCbApp().CondDBtag = "sim-20090402-vc-md100"' Gauss_v37r2_Wms/OptsGaussv37r2.py
  7. Modify the job to make it run on a DIRAC site and tell the job to run only at your site:
    set platform=DIRAC
    set site=DIRAC.Dortmund.de
    sed -i "12ij.setPlatform('$platform')\nj.setDestination('$site')" Gauss_v37r2_Wms/DiracAPI_Gauss_v37r2_Wms.py
  8. Submit Job to DIRAC

    python Gauss_v37r2_Wms/DiracAPI_Gauss_v37r2_Wms.py

Monitoring a DIRAC site

Detailed information on the activities of the DIRAC site can be found in the local logfile of the TaskQueueDirector (on the headnode):

tailf /opt/dirac/runit/WorkloadManagement/TaskQueueDirector/log/current

For a global overview of the number of running, failed and done jobs the DIRAC Site Monitor can be used (with a valid user certificate registered in the lhcb VO) at:

Troubleshooting

If the following exception occurs
exceptions.ImportError:liblapack.so.3: cannot open shared object file: No such file or directory

install the lapack package. For Scientif Linux this can be done with

su -
yum install lapack
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2009-09-21 - FlorianFeldhaus
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback