Difference: ProductionProceduresRunningJobOnSiteWN ( vs. 1)

Revision 12008-12-10 - GreigCowan

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ProductionProcedures"

Running jobs directly on the site WN

Motivation

Often there are problems in accessing the site shared software area due to NFS or AFS breaking which means that the applications cannot be found by the job leading to the job failing. As a site admin or T1 LHCb contact, you may want to run an LHCb application directly on the site WNs in order to determine that everything is working. It should be noted that often the problems with NFS/AFS are only seen when there are many 10's or 100's of jobs simultaneously trying to access the same software area. The load on the NFS servers increase, leading to failures (i.e. stale NFS file handles). Therefore, following the steps below may result in a job which is successful even though the site still has problems under load.

Additionally, it can happen where NFS is incorrectly configured on a single machine at a site. In this case, only jobs going to that machine will fail. This can be identified by a single node at a site gobbling through many jobs, all of which fail.

Environment setup script

In order to get the same environment as the job which would run on the site, log onto the WN and source this script. You will have to modify the paths at the start to point to your shared software area.

# General setup environment
export HEPSOFT=/path/to/the/shared/software/area
export LHCB_DIR=$HEPSOFT/lhcb-soft
export LHCBRELEASES=$LHCB_DIR/lhcb
export MYSITEROOT=$LHCB_DIR
export CMTCONFIG=slc4_ia32_gcc34
export LCG_release_area=$MYSITEROOT/lcg/external

export LD_LIBRARY_PATH=$MYSITEROOT/cern/usr/lib:$LD_LIBRARY_PATH

export EMACSDIR=$MYSITEROOT/lhcb/TOOLS/Tools/Emacs/pro

export CVS_RSH=ssh

#Only when starting an interactive session
if [[ $TERM != "dumb" || $ENVIRONMENT == "BATCH" ]]
   then
   source $MYSITEROOT/scripts/ExtCMT.sh
fi

if [ -f $HOME/.hepix/cern-user-name ]
    then
    export cernuser=`cat $HOME/.hepix/cern-user-name`
else
    export cernuser=$USER
fi

export GETPACK_USER=$cernuser

if [ -d $HOME/cmtuser ]
    then
    export User_release_area=$HOME/cmtuser
else
    echo "LHCb local-setup.sh: you don't have a cmtuser directory. I'll create one for you."
    mkdir $HOME/cmtuser
    export User_release_area=$HOME/cmtuser
fi

#
#Old-style alias definitions
#
unalias getpack > /dev/null 2>&1
alias getpack="export USER=$cernuser; $MYSITEROOT/scripts/getpack -f ssh"

unalias DaVinciEnv > /dev/null 2>&1
alias DaVinciEnv="source $MYSITEROOT/scripts/ProjectEnv.sh DaVinci"

unalias GangaEnv > /dev/null 2>&1
alias GangaEnv="source /exports/work/physics_ifp_ppe/ganga/install/etc/setup-lhcb.sh"


unalias SetupProject > /dev/null 2>&1
alias SetupProject="source $MYSITEROOT/scripts/SetupProject.sh"

unalias setenvDaVinci > /dev/null 2>&1
alias setenvDaVinci="source $MYSITEROOT/scripts/setenvProject.sh DaVinci"

unalias setenvGauss > /dev/null 2>&1
alias setenvGauss="source $MYSITEROOT/scripts/setenvProject.sh Gauss"

Running the job

Once the environment is correct, you can run the job by doing:

$ gaudirun.py LHCbApplictionOptionsFile.py

Where you get LHCbApplicationOptionsFile.py from somewhere like /afs/cern.ch/lhcb/software/releases/DAVINCI/DAVINCI_v21r0/Phys/DaVinci/options/DaVinci.py (for a basic DaVinci job). Look in the Gauss directory for the Gauss options files etc.

If there are problems with the execution of the application then this is a good sign that Grid jobs running on the site will also have problems.

-- GreigCowan - 10 Dec 2008

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback