AutoPyFactory
Introduction
ATLAS, one of the experiments at LHC at CERN, is one of the largest
users of grid computing infrastructure. As this infrastructure is now
a central part of the experiment's computing operations, considerable
efforts have been made to use this technology in the most efficient
and effective way, including extensive use of pilot job based
frameworks.
In this model the experiment submits 'pilot' jobs to sites without
payload. When these jobs begin to run they contact a central service
to pick-up a real payload to execute.
The first generation of pilot factories were usually specific to a
single VO, and were very bound to the particular architecture of that
VO. A second generation is creating factories which are more flexible,
not tied to any particular VO, and provide for more features other
than just pilot submission (such as monitoring, logging, profiling, etc.)
AutoPyFactory has a modular design and is highly configurable. It is
able to send different types of pilots to sites, able to exploit
different submission mechanisms and different charateristics of queues
at sites. It has excellent integration with the PanDA job submission
framework, tying pilot flows closely to the amount of work the site
has to run. It is able to gather information from many sources, in
order to correctly conigure itself for a site and its decision logic
can easily be updated.
Integrated into AutoPyFactory is a very flexible system for delivering
both generic and specific wrappers which can perform many useful
actions before starting to run end-user scientific applications, e.g.,
validation of the middleware, node profiling and diagnostics,
monitoring and deciding what is the best end-user application that
fits the resource.
AutoPyFactory now also has a robust monitoring system and we show how
this has helped setup a reliable pilot factory service for ATLAS.
AutoPyFactory
Design
Status Plugins
PandaWMS
Condor
Sched Plugins
NullPlugin
TrivialPlugin
SimplePlugin
SimpleNqPlugin
Submission Plugins
CondorGT2
CondorLocal
Deployment
Deployment using RPM
Installation as root via RPMs has now been quite simplified. These instructions assume Red Hat
Enterprise Linux 5.X (and derivates) and the system Python 2.4.3. Other distros and higher
Python versions should work with some extra work.
1) Install and enable a supported batch system. Condor is the current supported default.
Software available from
http://www.cs.wisc.edu/condor/. Condor/Condor-G setup and
configuration is beyond the scope of this documentation. Ensure that it is working
properly before proceeding.
2) Install a grid client and set up the grid certificate+key under the user APF will run as.
Please read the CONFIGURATION documentation regarding the proxy.conf file, so you see what
will be needed. Make sure voms-proxy-* commands work properly.
3) Add the racf-grid YUM repo to your system
rpm -ivh
http://dev.racf.bnl.gov/yum/grid/production/rhel/5Client/x86_64/racf-grid-release-0.9-1.noarch.rpm
The warning about NOKEY is expected. This release RPM sets up YUM to point at our
repository, and installs the GPG key with which all our RPMs are signed. By default
the racf-grid-release RPM sets our production repository to enabled (see
/etc/yum.repos.d/racf-grid-production.repo ). If you are testing APF and want to run
a pre-release version, enable the racf-grid-development or racf-grid-testing repository.
4) If you will be performing
local batch system submission (as opposed to remote submission
via grid interfaces) you must confirm that whatever account you'll be submitting as exists on
the batch cluster.
5) Install the APF RPM:
yum install autopyfactory
This performs several setup steps that otherwise would need to be done manually:
-- Creates 'apf' user that APF will run under.
-- Enables the factory init script via chkconfig.
-- Pulls in the panda userinterface Python library RPM from our repository.
-- Pulls in the python-simplejson RPM from the standard repository.
6) Configure APF queues/job submission as desired. Read the CONFIGURATION documentation in
order to do this. Be sure to configure at least one queue in order to test function.
7) Start APF:
/etc/init.d/factory start
8) Confirm that everything is OK:
-- Check to see if APF is running:
/etc/init.d/factory status
-- Look at the output of ps to see that APF is running under the expected user, e.g.:
ps aux | grep factory | grep -v grep
This should show who it is running as, and the arguments in
/etc/sysconfig/factory.sysconfig:
apf 22106 1.3 0.1 318064 12580 pts/2 Sl 17:13 0:00 /usr/bin/python /usr/bin/factory.py --conf /etc/apf/factory.conf --debug --sleep=60 --runas=apf --log=/var/log/apf/apf.log
-- Tail the log output and look for problems.
tail -f /var/log/apf/apf.log
-- Check to be sure jobs are being submitted by whatever account APF is using by
executing condor_q manually:
condor_q | grep apf
Deployment on user's home directory
User installation assumes that APF will be installed in the users home directory using the
standard Python distutils setup commands. It assumes that pre-requisites have already
been installed and properly configured, either within the user's home directory or on
the general system.
Prerequisites:
-- Python
-- Condor (Condor-G)
-- Panda Client library
-- simplejson
Monitoring
Are APF logs available through the panda monitor: yes they are! APF sets the GTAG environment variable, which is passed by the pilot through to panda and shows up in the monitor. Look at most any job running at a European site, e.g.,
http://panda.cern.ch:80/server/pandamon/query?job=1209939301
Under "pilotID" you have the link to the stdout logfile. Something which could be improved is to also automatically link to the stderr file (s/out/err) and to the condor log file (s/out/log).
This should be wrapper independent (it's passed in as part of the condor set environment). If you're not getting this something needs checked.
On the factory site there are two important variables which control this:
baseLogDir = /disk/panda/factory/auto/logs
baseLogDirUrl =
http://svr017.gla.scotgrid.ac.uk/factory/auto/logs
baseLogDir is the local physical disk path; baseLogDirUrl is the http URL prefix.
What's missing from the panda monitor: an overview of the pilots going to a site, so know if the site is broken or the factories serving it have died, etc.
Where is this information now: in Peter's monitor! See the last talk as s/w week for some details (I gave the talk, but the content was all his).
So, should this be in panda monitor or not? It should be crosslinked from the monitor, but the key point was to have this on an independent database, not to add load to Oracle. It's monitoring, not accounting, some losses are ok and you throw the information away after a week.
Any factory can dispatch calls to the factory monitor, just by defining
monitorURL =
http://py-dev.lancs.ac.uk/mon/
in their configuration. Peter has been gradually ramping up the number of factories to test scaling, so he can report on how well that's going.
In the end this should move to CERN, but we had the (usual) problems in obtaining and configuring a machine for it so this hasn't progressed much.
This really is a shifter tool as well. It's used to help diagnose problems with site infrastructure and to submit tickets (especially when pilots don't start or abort before the payload can be executed).
PanDA wrappers refactoring
Motivation
The first piece of code that PanDA system submits to sites by different job submission mechanisms is called "pilot wrapper". This is the first code that executes on the worker node, performs some environment checks, and downloads from a valid URL the following piece of code to continue operations, called in the PanDA nomenclature as "pilot".
This "pilot wrapper" is not unique. There are a multiplicity of versions for this part of the system, depending on the final pilot type, and the grid flavor, for example.
This multiplicity forces to maintain several pieces of software even though they have a common purpose.
On the another hand, for practical reasons, these pilot wrappers are
implemented in BASH language, with the consequent lack of flexibility and inherent difficulties to implemented complicated operations.
One practical case is the need to generate weighted random numbers to pick up an specific development version of the ATLAS code only a given percentage of the times. This weighted random numbers generation is more complicated in BASH language.
Finally, a new pilot submission mechanism, called AutoPyFactory, has been introduced in the scenario. This new pilot submission tool was implemented in its first version to submit a specific ad-hoc pilot wrapper, with a different set of input options and with different formats. Moreover, this specific pilot wrapper is only valid for ATLAS in EGEE, being invalid for other purposes or in OSG sites.
This discrepancy adds to the multiplicity of pilot wrapper versions,
and introduces difficulties for its deployment as a submission tool to replace the already existing AutoPilot.
A final reason is that these wrappers require some improvements. One example is the absence of proper validation on the number and format of the input options. Given these improvements are important, it will be always easier to introduce and maintain them in a single piece of code than in several.
For these reasons it was agreed that a refactoring of the different pilot wrappers was needed. The proposal is to create a single pilot wrapper implemented in BASH language, performing the minimum amount of checking operations. This unique code should be valid for any kind of final application, grid flavor environment, submission tool, etc. In particular, it will allow the easy deployment of AutoPyFactory as pilot submission tool.
After checking the presence of required programs needed to continue with
operations, and setting up the corresponding grid environment if needed,
a second piece of code will be downloaded from a valid URL to continue
operations. This second code will now be written in Python, which allows for more complex operations implemented in an easier manner. Therefore, its maintainability and scalability will be improved. This will require the reimplementation of all BASH code from the multiple pilot wrappers, except those operations already done by the new unified wrapper, in Python. Finally, in this second step, the final payload code to be run will be chosen, downloaded, and executed.
wrapper.sh
A generic panda wrapper with minimal functionalities
input options:
- pandasite
- pandaqueue
- pandagrid
- pandaproject
- pandaserverurl
- pandawrappertarballurl
- pandaspecialcmd
- pandaplugin
- pandapilottype
- pandaloglevel
where
- pandasite is the panda site
- pandaqueue is the panda queue
- pandagrid is the grid flavor, i.e. OSG or EGEE (or gLite). The reason to include it as an input option, instead of letting the wrapper to discover by itself the current platform is to be able to distinguish between these two scenarios:
- running on local cluster
- running on grid, but the setup file is missing.
(ii) is a failure and should be reported, whereas (i) is fine.
A reason to include grid as an option in this very first wrapper
is that for sites running condor as local batch system,
the $PATH environment variable is setup only after sourcing the
OSG setup file. And only with $PATH properly setup
is possible to perform actions as curl/wget
to download the rest of files, or python to execute them.
- pandaproject will be the VO in almost all cases, but not necessarily when several groups share the same VO. An example is VO OSG, shared by CHARMM, Daya, OSG ITB testing group...
- pandaserverurl is the url with the PanDA server instance
- pandawrappertarballurl is the base url with the pyton tarball to be downloaded
- pandaspecialcmd is special command to be performed, for some specific reason, just after sourcing the Grid environment, but before doing anything else. This has been triggered by the need to execute command
$ module load <module_name>
at NERSC after sourcing the OSG grid environment.
- pandaplugin is the plug-in module with the code corresponding to the final wrapper flavor.
- pandapilottype is the actual pilot code to be executed at the end.
- pandaloglevel can be debug or info (default).
Some input options have a default value:
- URL="http://www.usatlas.bnl.gov/~caballer/panda/wrapper-devel/"
- SERVERURL="https://pandaserver.cern.ch:25443/server/panda"
Note:
before the input options are parsed, they must be re-tokenized
so whitespaces as part of the value
(i.e. --specialcmd='module load osg')
create no confussion and are not taken as they are splitting
different input options.
The format in the condor submission file (or JDL) to address
the multi-words values is:
arguments = "--in1=val1 ... --inN=valN --cmd=""module load osg"""
This first wrapper perform basic actions:
(1) check the environment, and the availability of basic programs
- curl
- python
- tar
- zip
(2) downloads a first tarball with python code
as passes all input options to this code.
With passed options, the python code will download
a second tarball with the final pilot code.
plug-ins architecture
This is the suggested architecture:
AutoPyFactory ---> wrapper.sh ---> wrapper.py
A preliminary diagram (to be improved) is this
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxqY2FiYWxsZXJvaGVwfGd4OjdhM2IxN2EyNGMwNGMxZDU
wrapper.sh downloads a tarball (wrapper.tar.gz), untars it, and invoked
wrapper.py. The content of the tarball is something like this
- wrapper.py
- wrapperutils.py
- lookuptable.conf
- plugins/base.py
- plugins/<pilottype1>.py
- plugins/<pilottypeN>.py
The different plug-ins corresponds with the different wrapper flavors, so far
written in BASH. For example, trivialWrapper.sh, atlasProdPilotWrapper.sh,
atlasProdPilotWrapperCERN.sh, atlasProdPilotWrapperUS.sh, etc.)
All of these wrappers share a lot of common
functionalities, with only small differences between them.
To take advantage from that, the different wrapper flavors will be implmented
as plug-ins.
All these plug-ins will be included in a directory inside the tarball.
The plug-ins will be classes inherited from base.py, so will implement
different variations of the same methods (things like download(), execute(),
etc.) That will allow to invoke the same methods from wrapper.py, irrespective
which plug-in is being used.
The common features will now be implemented in the base class base.py,
and the differences will be implemented in the corresponding plug-in.
How to decide the right plugin
The current mechanism to choose the right plugin is implemented by inspecting a lookup table like this one:
# ------------------------------------------------------------------------------------------------------------------------------------------
# SITE QUEUE GRID PROJECT PLUGIN PILOTTYPE
# ------------------------------------------------------------------------------------------------------------------------------------------
# --- ATLAS T1 sites ---
BNL_CVMFS_1 BNL_CVMFS_1-condor OSG * atlasprodpilot pilotcode,pilotcode-rc
# --- ATLAS T3 sites ---
ANALY_DUKE ANALY_DUKE OSG * atlasprodpilot pilotcode,pilotcode-rc
ANALY_DUKE3 ANALY_DUKE3 OSG * atlasprodpilot pilotcode,pilotcode-rc
BNL_T3 BNL_T3-condor OSG * atlasprodpilot pilotcode,pilotcode-rc
# --- ITB Robot sites ---
UC_ITB UC_ITB-pbs OSG * trivial trivialPilot
LBNL_DSD_ITB LBNL_DSD_ITB-condor OSG * trivial trivialPilot
BNL_ITB_Test1 BNL_ITB_Test1-condor OSG * trivial trivialPilot
OUHEP_ITB OUHEP_ITB-condor OSG * trivial trivialPilot
TTU_TESTWULF TTU_TESTWULF_ITB OSG * trivial trivialPilot
Firefly_SBGRID Firefly_SBGRID-pbs OSG * trivial trivialPilot
Harvard-East_SBGRID Harvard-East_SBGRID-condor OSG * trivial trivialPilot
where:
- SITE is the PanDA site
- QUEUE is the PanDA queue
- GRID is the grid flavor, and therefore to know if some setup file
has to be sourced.
Why not just to let the wrapper to find out if there is a setup file
in the system? Because then we can not distinguish between these two
scenarios:
- no grid to be used
- some grid flavor expected to be used, but the setup file is not there
- PROJECT is a VO subtype workflow: test, production, analysis, ...
- PLUGIN is the plugin wrapper needed for the VO
- PILOTTYPE is the final tarball to be downloaded and executed by the plugin
* means that any value is accepted.
Based on the combination queue/site/grid/project, the value of plug-in and
pilottype are chosen.
This mechanism would be replaced by a different one in the future.
Talks and publications
Major updates:
--
JoseCaballero - 25-Feb-2011
Responsible: JoseCaballero
Never reviewed