SchedConfigNewController

newController Basics and Use

Basic Principles

The newController code brings populates the schedconfig table with site configuration data from the AGIS database at CERN. Changes in the AGIS version of a config will be reflected in the atlas_pandameta database (for site and queue configuration) upon the next run of the update code (automatic, every 20 min) using anacron. To check on the changes, please use anacron from the atlpan account.

If you want to get an update in sooner (if, for example, you have made an urgent change to AGIS), you can use the following command:

ssh -l lxplus_username lxplus.cern.ch touch /afs/cern.ch/user/a/atlpan/Trigger/trigger.txt

You can also add this as an alias on any machine -- if you put the following line in your .bashrc file, you can simply type update and get your changes in.

alias update='ssh -l lxplus_username lxplus.cern.ch touch /afs/cern.ch/user/a/atlpan/Trigger/trigger.txt'

To trigger an update of the installedsw table, use:

alias update='ssh -l lxplus_username lxplus.cern.ch touch /afs/cern.ch/user/a/atlpan/Trigger/swtrigger.txt'

Code Modification and Checkout

newController Dev repo: svn co svn+ssh://svn.cern.ch/reps/newcontroller

CERN repo: svn co svn+ssh://svn.cern.ch/reps/panda/autopilot/trunk autopilot

It is not OK to add specific hardwired queue or site configuration changes to this codebase. If you face a challenging configuration problem, Talk to the AGIS team. Temporary overrides are ALWAYS an option for emergency changes that can't wait. See below for how to do so.

Please use tabs whitespace in the python code instead of spaces. This is the present standard. If anyone knows how to get VI to follow this convention, please modify this instruction to include that technique. (set noexpandtabs should do a trick for vi or vim)

If you want rights to the CERN repositories, please contact Alden.

Configuration Modification

Introduction

All config modification now goes through AGIS. Documentation can be found here and here. Site status and other volatile information is still updated via curl, as seen in the documentationhere.

schedconfig contains information about PanDA queue configuration for the ATLAS PanDA grid production system. Examples of queue configurations can be found here an example queue configuration can be seen here. Please refer to this queue definition to understand the fields mentioned, like jdltext and cloud.

The source of these configurations is AGIS.

Configuration Files

The configuration files produced by the update are fairly self-explanatory, and almost useless (a vestige of previous behavior). They can be used to see what the updater thinks it is supposed to put into place from what it read in AGIS. They behave like python dictionaries. When the queues are updated, each file is recreated based on AGIS info.

For a discussion of parameter definitions, please see SchedConfig Parameter Definitions

New Queue Insertion

Please see the AGIS documentation.

Expert Procedures

(Experts, ADC) Emergency DB Recovery

These files contain all queue config information, and all associated volatile information!. The restoration process uses the series of SQL commands contained in, for example, 2011_10_18_11_47_31_schedConfigBackup.sql.gz to restore the state of the DB quickly.

This is an expert-driven procedure only. ADC members should be able to follow these steps without difficulty. If you don't know what you're doing, please contact Alden.

To run a quick restore:

1. Log in to aipanda045, aipanda046, aipanda047 or aipanda048 (as atlpan), and do the following:

2. cd prod

3. source setupProd.sh

4. cd newController

5. python2.5 backupRestore.py ~/scratch0/schedconfig/prod/Backup/YYYY_MM_DD_HH_MM_SS_schedConfigBackup.sql.gz

The queues state in will be restored, and will remain until the next automatic schedconfig update When that update arrives, all the queues will be set as specified in AGIS!

If you wish to restore only one queue (for example), it is easy to create a new restore file:

zgrep "ANALY_BNL_ATLAS_1" > myrestoreFile.sql; gzip myrestoreFile.sql

and then follow the above directions, with your restore file name in place of YYYY_MM_DD_HH_MM_SS_schedConfigBackup.sql.gz.

(Experts, ADC) Emergency DB Recovery for Volatiles in the DB (like status and nqueue)

These files contain no queue config information. To restore all this information, use the series of SQL commands contained in, for example, 2011_10_18_11_47_31_schedConfigStatus.sql.gz to restore the state of the DB quickly.

This is an expert-driven procedure only.* ADC members should be able to follow these steps without difficulty.* If you don't know what you're doing, please contact Alden.

To run a quick restore:

1. Log in to aipanda045, aipanda046, aipanda047 or aipanda048 (as atlpan), and do the following:

2. cd prod

3. source setupProd.sh

4. cd newController

5. python2.5 volatileRestore.py ~/scratch0/schedconfig/prod/Backup/YYYY_MM_DD_HH_MM_SS_schedConfigStatus.sql.gz

The volatiles will be restored.

If you wish to restore only a set of queues, or one cloud, or one site, it is easy to create a new restore file:

zgrep "Site is BNL" > myrestoreFile.sql; gzip myrestoreFile.sql

zgrep "Cloud is CERN" > myrestoreFile.sql; gzip myrestoreFile.sql

zgrep "ANALY_BNL_ATLAS_1" > myrestoreFile.sql; gzip myrestoreFile.sql

and then follow the above directions, with your restore file name in place of YYYY_MM_DD_HH_MM_SS_schedConfigStatus.sql.gz.

To figure out which file to use, in the event of a loss, one effective method would be to do the following:

cd ~/schedconfigProd/pandaconfBackup

zgrep QUEUENAME *Status*|sort|more

and look for when the parameters you are interested in got zeroed.

If you find that the code is failing for any reason, the backup is formatted so that it can be put directly into a SQL Developer or P/SQL command line and restored that way. Be sure to commit the changes!


Major updates:
-- AldenStradling - 21-Jul-2010



Responsible: AldenStradling

Never reviewed

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt Tier3_Template.py.txt r1 manage 2.2 K 2010-11-30 - 11:18 AldenStradling Template for T3 queue configuration
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2015-01-14 - AldenStradling
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback