LCG Production Services -
LCG Grid Deployment
How to add/remove root access to machines managed by quattor
From lxadm
- Add root or interactive access:
LEAFAddAccess --objecttype=host --objectname=<hostname> --root=<login1, ...,loginN>
LEAFAddAccess --objecttype=host --objectname=<hostname> --interactive=<login1, ...,loginN>
- Remove root or interactive access:
LEAFAddAccess --objecttype=host --objectname=<hostname> --rm_root=<login1, ...,loginN>
LEAFAddAccess --objecttype=host --objectname=<hostname> --rm_interactive=<login1, ...,loginN>
By manipulating CDB template
- Cluster gridrb (template pro_type_gridrb_slc3.tpl)
You should add/update the following lines in CDB:
"/software/components/access_control/roles/ce30_root" = list("login1","login2","login3");
# root access for AFS login login1, login2 and login3
"/software/components/access_control/privileges/acl_root/role/ce30_root/0/targets" = list("+cluster::gridrb");
On the target node, you must execute the following commands:
ccm-fetch
ncm-ncd --configure access_control
Note that in this example, the role ce30_root is defined.
- Cluster lcgrb (template pro_type_lcgrb_slc3.tpl)
"/software/components/access_control/roles/ycalas/0"="ycalas";
# root access for AFS login ycalas
"/software/components/access_control/privileges/acl_root/role/ycalas/0/targets"=list("+cluster::lcgrb");
# interactive access for AFS login ycalas
"/software/components/access_control/privileges/acl_interactive/role/ycalas/0/targets"=list("+cluster::lcgrb");
How to manipulate CDB profiles
- Login on the cdbserv from lxplus: cdbop.
- Get a CDB template: get (eg. get profile_rb125).
- Edit the template by using vi for example: !vi template_name.
- Update and commit your changes (if no error is detected): update ; commit.
How to modify default parameters set in /etc/lcg-quattor-site-info.def
- Get the CDB template corresponding to your node, eg. profile_rb125.
- After the line include pro_type_gridrb_slc3, put the following lines (for example):
# To support only VOs dteam, ops and CMS
"/software/components/yaim/VOs" = list("dteam","ops","cms");
# To modify the value of the WMS host variable in yaim
"/software/components/yaim/conf/WMS_HOST" = "rb109.cern.ch";
# Assign a function to a node
"/system/function" = "gLite WMS for CMS";
# For accounting purpose
"/system/accounting/name" = "cms";
- Update and commit the template.
- On the node itself (here rb109):
ccm-fetch
ncm-ncd --configure yaim
- Check that the file /etc/lcg-quattor-site-info.def has been updated.
- Re-run the yaim components if needed.
How to query the NCM subsystem
You can always check the NCM subsystem by executing command ncm-query:
# query from the root
ncm-query /
# query the yaim component subtree
ncm-query /software/components/yaim
# query the access_control subtree
ncm-query /software/components/access_control
# query the monitoring system
ncm-query /system/monitoring
# query the cron jobs.
ncm-query /software/components/cron
There is also a
wiki page describing the structure of the NCM components tree.
How to put a machine in production or in maintenance
You must execute the
sms script on lxadm (see man sms):
# Put a machine in maintenance
sms set maintenance 'kernel upgrade' 'Need to reboot the machine thereafter' hostname
# Put a machiine in production
sms set production non 'Back in production after middleware upgrade' hostname
Note that before to put a machine in production, you must check that there is no alarm on the machine itself by executing
lemon_host_check.
How to install a machine using Quattor
On
lxadm, you must use the
PrepareInstall script. For example:
PrepareInstall -v --aimsgroup lcg hostname
Note that the aimsgroup option should point to the directory used by GD (i.e. lcg). If a kickstart file for a GD machine already exists (for example in fio-is subdirectory),
Veronique.Lefebure@cernNOSPAMPLEASE.ch must be contacted to remove this kickstart file.
There is some interesting options:
- --rep: represents the directory where the kickstart file will be stored (for GD it is /afs/cern.ch/project/linux/redhat/kickstart/cfg/lcg").
- --mail: specify the mail adress where the installation log file should be sent in case of failure.
The host certificate must have been generated before the execution of the
PrepareInstall script, otherwise it fails. Details related to the generation of a host certificate can be found
here.
How to update CDB Templates for gLite upgrades
gLite WMSLB upgrade
To generate the template related to the list of RPMs concerning the update 14 of gLite 3.0 for the WMSLB, follow these steps:
- Upload the new packages related to this middleware update to SWRep repository:
swrep-soap-client put i386_slc3 /cern/cc /tmp/ycalas/middleware/*
The list of packages can be found on the
gLite 3.0 update web page
. Note that the SWRep is updated every 30 minutes, so you have to wait a little bit until to commit the new CDB template (see below).
/afs/cern.ch/group/c3/bin/CDB_create-glite-templates --type glite-WMSLB --version 2.4.9-0
A file
pro_software_glite_2_4_9-0_glite-wmslb.tpl is generated in the current directory.
- Add the new generated template in CDB and commit it:
<cdbop: ~/CDB> add pro_software_glite_2_4_9-0_glite-wmslb.tpl
[INFO] 'pro_software_glite_2_4_9-0_glite-wmslb': added
<cdbop: ~/CDB> commit
- Update the template related to the WMSLB nodes via cdb (pro_software_packages_cern_slc3_glite3_0_wmslb.tpl) and put the following lines (comment out the former ones):
# Update 14 of gLite WMSLB middleware (2007-02-20) -- Yvan
include pro_software_glite_2_4_9-0_glite-wmslb;
Update and commit your change.
- On the node itself, do a spma_wrapper.sh --noaction, and if no error occurs, execute spma_wrapper.sh (i.e. without the --noaction option).
- The node should be upgraded
gLite LB upgrade
- Generate gLite LB template (in this example it is pro_software_glite_2_2_8-0_glite-lb.tpl):
/afs/cern.ch/group/c3/bin/CDB_create-glite-templates --type glite-LB --version 2.2.8-0
- Add this template in CDB.
- Modify the template pro_software_gridrb_lb_slc3.tpl accordingly.
Note that there is also a wiki page which can be found
here.
How to move a machine from a cluster to another cluster
- Use the LEAFChangeProfile command:
LEAFChangeProfile --logfile lxb2173.log --newtype gridrb_lb_slc3 -list=lxb2173 --skipsindes
How to obtain basic information related to a given host
CDBDump rb201
Where can I find some documentations related to Quattor or CDB
How to add additional trusted hosts to MyProxy
How to change the load threshold on the LCG RBs or gLite WMS
Two cases:
- If this is for a single node, edit the CDB profile for this host (for example profile_rb113.tpl) and put the following lines:
# Override the default threshold related to the high_load alarm - YC (2007-02-28)
"/system/monitoring/exception/_30008/correlation" = "20002:1>30" ;
- If this is for the whole cluster, edit the pro_system_lcgrb.tpl template file (for LCG RB) or the pro_system_gridrb template file (for gLite WMS) and put the following lines:
# Override the default threshold related to the high_load alarm - YC (2007-02-28)
"/system/monitoring/exception/_30008/correlation" = "20002:1>30" ;
How to upgrade a package with no version in the CDB template (example with eela_vomscerts package)
This example is taken from the upgrade of package eela_vomscerts...
- Upload the new RPM to SWREP.
- Edit the file pro_software_packages_defaults_cc.tpl and change its version number.
- Update and commit.
How to move a machine to another cluster
If you want to move the node
lxb2173 to cluster
grid_lb_slc3, execute the following command on
lxadm:
LEAFChangeProfile --logfile lxb2173.log --newtype gridrb_lb_slc3 -list=lxb2173 --skipsindes
How to configure sudo access in CDB
- You must add the following lines before the "include pro_type_griddpm_slc3;" line in the CDB host profile (eg. profile_lxdpm101.tpl):
"/software/components/access_control/sudo_restrictions" = "on";
"/software/components/access_control/roles/glite_sudo"=list("slemaitr","labadie","baud");
"/software/components/access_control/privileges/acl_sudo/role/glite_sudo/0/targets"=list("+node::lxdpm101");
"/software/components/access_control/privileges/acl_sudo/role/glite_sudo/0/commands"=list("ALL=(glite) NOPASSWD: ALL");
- Execute ncm-ncd for the component access_control.
Where I can find package j2sdk
In the swrep repository at the following url:
http://swrep.cern.ch/swrep/
How to enable/disable an alarm
To disable an alarm (NOSPMA in this exemple), execute the following command on the node itself:
# 30069 is the ID of the NOSPMA alarm
lemon-host-check --disable=30069
To enable it:
# 30069 is the ID of the NOSPMA alarm
lemon-host-check --enable=30069
Structure of the NCM components tree
See the following
wiki page.
Other questions
Please contact
Yvan.Calas@cernNOSPAMPLEASE.ch or
Steve.Traylen@cernNOSPAMPLEASE.ch