PanDAnewController

newController Basics and Use

Basic Principles

The newController code stores all configuration in a set of configuration files. The files are all stored in their own SVN at CERN. Each config file represents one queue or JDL specification. Changes in the SVN version of a config file will be reflected in the panda_meta database (for site and queue configuration) upon the next run of the update code.

Code Modification and Checkout

The newController code is presently stored in a repository apart for initial development. The 1.0 version will be merged into the standard Panda repository (right now at BNL, but moving to CERN in the near future).

newController Dev repo: svn co svn+ssh://svn.cern.ch/reps/newcontroller

BNL repo (outgoing): svn co https://svn.usatlas.bnl.gov/svn/panda/autopilot/trunk autopilot

CERN repo (upcoming): svn co svn+ssh://svn.cern.ch/reps/panda/autopilot/trunk autopilot

It is not OK to add specific hardwired queue or site configuration changes to this codebase. If you face a challenging configuration problem, please find an algorithmic way of getting around it. Temporary overrides are ALWAYS an option for emergency changes that can't wait. See below for how to do so.

Please use tabs whitespace in the python code instead of spaces. This is the present standard. If anyone knows how to get VI to follow this convention, please modify this instruction to include that technique.

If you want rights to the CERN repositories, please contact Torre or Alden.

Configuration Modification

Introduction and Checkout

schedconfig contains information about PanDA queue configuration for the ATLAS PanDA grid production system. Examples of queue configurations can be found here an example queue configuration can be seen here. Please refer to this queue definition to understand the fields mentioned, like jdltext and cloud.

The config files are stored in a SVN repository called pandaconf. You can check out the repo like so:

svn co svn+ssh://svn.cern.ch/reps/pandaconf

If you need help setting up the SVN checkout, please see the CERN SVN HowTo.

Under this repository we find three directories:

Backup, JDLConfigs, and SchedConfigs.

Backup contains raw backup pickle files that can be restored by hand using the newController.py code, as shown below. This option is for admins and experts only. You probably won't need to ever look at it.

JDLConfigs contains configuration files that define the jdltext field for each queue. These JDL specs can be specific to a single queue, or span a number of similar queues. To see which JDL is being used in any case, please see the jdl field in the queue specification.

SchedConfigs contains (for the moment) 14 subdirectories, corresponding to the existing clouds. Each of these clouds contains a series of sites, and each site contains a number of python file. Each file contains the specifications for one queue, and is named with the queue's name (with possible control character substitutions). Changes made to this file will reflect in the database after they have been checked in and harvested by the update code.

"Deactivated" Cloud

There is also a cloud called Deactivated that contains queues for which cloud is not set (and which are therefore inactive). I will be going through these at some future point to set their configuration files inactive, at which point they will not appear in the monitor until (and unless) reactivated.

There is also a site (when necessary) called "?". This contains queues defined in the BDII for which a site has not been specified, to deactivate them.

All.py Files

For all other site directories that contain more than one queue, there exists a file called All.py. This file contains any commonalities that exist between queue definitions -- things like, perhaps, sitename, siteid, cmtconfig, cloud, and etc. Changes made to the All.py file will reflect across the whole site. Overrides in the All.py file will override any overrides in the individual queue definition files.

Configuration Files

The rest of the configuration files are fairly self-explanatory. They behave like python dictionaries, with comments above each of the relevant section. When the queues are updated, each file is imported and its three values read out: Enabled, Parameters, and Override.

Enabled allows one remove or restore the queue in the schedconfig table, while maintaining its configuration intact. It has two allowed values: True and False. Please don't use quotation marks around the value.

Parameters is where the parameter changes are stored, and where most of your changes will take place. A parameter may be commented with a source tag -- for example, if you see the tag

# Defined in All.py: UTArlington site

you know that the value is being set by the All.py file, and that any changes you make will be overridden and lost -- you need to make your changes at a higher level (by removing that value from All.py, or adding an override for temporary changes.

If you are emptying a field, please use the entry None rather than empty quotes. If you are putting in an int or float value, please do so without quotation marks.

Override If you want a field to be temporarily overridden without losing the previous value, please copy the field (exactly) from the Parameters area and paste it within the braces of the Override area. This will override for the queue, and will supersede anything in the All.py file. They will also replace any values received from the BDII or ToA. Overrides in the All.py file override anything else.

Once you have made modifications, save them and check them back in to the repository. If you need repository access, please contact Alden or Torre. If you want to just add a file (or a few files), send the text to schedconfig@gmailNOSPAMPLEASE.com, and it will be included ASAP. Generally such changes are very quick.

New Queue Insertion

New queue insertion is simple. Take the config file of another queue within your site to act as a template, and rename it as
<queuename>.py
. Make the necessary modifications and adaptations (including the queue name, nickname). When done, you can add it to the SVN (svn add *) and check it in, or send the file to schedconfig@gmailNOSPAMPLEASE.com for inclusion. Please be sure to specify the site and cloud in which you want the site.

Expert Procedures

(Experts) Running newController on the INTR database

On voatlas19 (as atlpan), do the following:

cd ~/newDev
source setupAlden.sh
cd newController
svn up
python newController.py

If you set it up to use the prod DB, you may cause some hassles.

(Experts) DB Backfilling

You can temporarily change the file controllerSettings.py to allow changes made in the DB to be forced into the config files. Modify the line:

dbOverride = False

to read

dbOverride = True

and run pilotController.py. Please do not check this change into the repository. If I have time, I'll add a precommit hook that explicitly excludes that possibility.

(Experts) Reverting to pilotController

For the moment, pilotController.py is running in parallel to newController.py. I have disabled the schedconfig updating by commenting out the replaceDB() call that updates schedconfig and jdllist. These can be rapidly uncommented, and the newController.py cron commented out to disable it for full emergency reversion to pilotController.

(Experts) Emergency DB Recovery

The Backup directory contains a number of pickle files. For example, 2010_4_20_11_47_31_schedConfigBackup.pickle is the schedconfig backup file from April 20th, 2010 at 11:47:31 AM GMT. To restore this file, go to the newController (or autopilot, after the migration) codebase , while logged in to voatlas19. Run as follows:

python

from backupHandling import *

backupRestore('2010_4_20_11_47_31_schedConfigBackup.pickle')

If at all possible, please contact Alden for this operation until it is well-tested. This is a last resort.

Timeline and Steps to Complete Takeover

Once agreement is there on newController.py, the schedconfig, installedsw and jdllist tables will be changed over fully to its care. I hope to have that agreement by the end of April. I will provide contact numbers for emergency support during the transition for rapid revision.

Once we are convinced that the initial transition is stable, I will take over all the other tables that pilotController touches one by one -- these are less critical, small, and the transition should be transparent.

-- AldenStradling - 21-Apr-2010

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2010-04-21 - AldenStradling
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback