CRAB Logo

Puppet profile for the CRAB project:

Complete: 5 Go to SWGuideCrab

All questions about TaskWorker and deploying it on private machine should go to hn-cms-crabDevelopment@cernNOSPAMPLEASE.ch

For production and pre-production: Local service account used to deploy, run and operate the service is crab3.

Introduction

This twiki explains how to use the puppet profile for the CRAB project and deploying different type of machines (a.k.a. CRAB Server | TaskWorker | Schedd | ) via Puppet.

The basic steps that are usually followed are:

  1. Configure your project environment in aiadmin.
  2. Create a virtual machine of the proper type with the ai-* toolset.
  3. Configure your virtual machine and services via puppet.

Note: Legend of colors for the examples:

Commands to execute
Output sample of the executed commands
Configuration files
Other files

Repository

Configure your project environment in aiadmin.

  • Login to the aiadmin cluster:
ssh aiadmin.cern.ch

  • Check which OpenStack projects you are associated with:
openstack project list

The output should like something like this:

+--------------------------------------+------------------+
| ID                                   | Name             |
+--------------------------------------+------------------+
| 180ca677-8b4b-4bbd-9b4c-d69652c97638 | Personal tivanov |
| 580b613d-85da-453c-9b59-012549ebfd9e | CMS CRAB         |
+--------------------------------------+------------------+

You will see your own project for sure. In case the 'CMS CRAB' Project is missing contact the VOC.

  • Create the configuration script ~/.openrc with your project environment for your private openstack project and for the CRAB Openstack project:

#!/usr/bin/env bash

projname="crab"
[[ -n $1 ]] && projname=$1


case $projname in
    private)
   echo Setting the ENV for the PRIVATE project
   os_tenant_name='Personal tivanov'
   os_project_id='<-- fill your project Id here -->'
   os_project_name=$os_tenant_name
   ;;
    crab)
   echo Setting the ENV for the CRAB project
   os_tenant_name='CMS CRAB'
   os_project_id='580b613d-85da-453c-9b59-012549ebfd9e'
   os_project_name=$os_tenant_name
   ;;
    *)
   # echo No project specified
   echo Bad project name $projname
   ;;
esac



export OS_AUTH_URL=https://keystone.cern.ch/krb/v3
export OS_AUTH_TYPE=v3kerberos
export OS_USERNAME=`id -un`
export OS_TENANT_NAME=$os_tenant_name
export OS_IDENTITY_API_VERSION=3
export OS_PROJECT_DOMAIN_ID=default

# With the addition of Keystone we have standardized on the term **project**
# as the entity that owns the resources.
export OS_PROJECT_ID=$os_project_id
export OS_PROJECT_NAME=$os_project_name

unset OS_USER_DOMAIN_NAME

Fill the missing gaps and source it with the proper project as first parameter (crab|private):

source ~/.openrc private

Warning, important NOTE: PLEASE once you source it ALWAYS check to which project your current environment has been set for real - do not trust only the 'echos' from the configuration scrip:.

openstack server list

You should see only your private machines now:

openstack server list
+--------------------------------------+------------------+---------+--------------------------------------------------------+-------+-----------+
| ID                                   | Name             | Status  | Networks                                               | Image | Flavor    |
+--------------------------------------+------------------+---------+--------------------------------------------------------+-------+-----------+
| 15168816-fdba-43c1-8d6d-614ba3e99ee7 | crab-priv-tw03   | SHUTOFF | CERN_NETWORK=188.185.118.157, 2001:1458:d00:b::100:297 |       | m2.medium |
| 74b17e17-a362-4da7-8d3a-d1c164ffe76c | crab-priv-tw02   | SHUTOFF | CERN_NETWORK=137.138.152.94, 2001:1458:d00:13::7d      |       | m2.medium |
| 572b6285-05b9-48a9-8f92-d030c0d6b976 | crab-priv-rest01 | SHUTOFF | CERN_NETWORK=137.138.148.83, 2001:1458:d00:12::15c     |       | m2.medium |
+--------------------------------------+------------------+---------+--------------------------------------------------------+-------+-----------+

In case you want to switch to the CRAB OpenStack project:

source ~/.openrc crab

Create a virtual machine of the proper type with the ai-* toolset

For every separate type of machine in our project we have different modules/classes in the puppet profile (not all of them evolved to final separate modules, some of them are just classes yet and some of them are really scattered across the whole profile). Few examples how to create instances of different type of machines:

Deployment of CRAB TaskWorker via Puppet

  • Set few variables in advance:
hostname='crab-priv-tw01'
foremanenv='crabdev'
openstackflav='m2.medium'
osrelease='cc7'

Preparing the puppet configurations:

Before creating the machine you need to prepare the .yaml configuration file in the repository of the puppet host group hostgroup:

git  checkout https://:@gitlab.cern.ch:8443/ai/it-puppet-hostgroup-vocmsglidein.git
cd it-puppet-hostgroup-vocmsglidein
git checkout crabdev
git pull origin crabdev

One thing you need to bare in mind: your machine may exist in which ever puppet environment you wish (it is defined during the machine creation time), but there is a strict mapping between the branch in the gitlab and the puppet environments as follows:

+--------------------+------------------------+
| gitlab branch:     | Puppet environment:    |
+--------------------+------------------------+
| crabdev            | crabdev                |
| qa                 | qa                     |
| master             | production             |
+--------------------+------------------------+

On the other hand the connection between the service instances and the puppet environment is not very strict, but we tend to stick to the following ('loose') separation:

+--------------------+------------------------+
| Puppet environment:| Service instance:      |
+--------------------+------------------------+
| crabdev            | dev | private          |
| qa                 | peprod | testbed | itb |
| production         | prod | globalpool      |
+--------------------+------------------------+

Going back to the topic - create the .yaml file for the machine:

emacs -nw data/fqdns/${hostname}.cern.ch.yaml

Put the following content there, while filling the relevant fields marked with '<-- -->' tags (examples in the comment lines):

gwms_type: crabtaskworker

sudo:
   users:
     - crab3
   egroups:
     - cms-service-crab3htcondor-admins

sssd::interactiveallowusers:
   - cmsprd

sssd::interactiveallowgroups:
   - cms-service-crab
   - cms-service-crab2
   - cms-service-glideinwms
   - cms-service-crab3htcondor
   - cms-voc

sssd::filter_users:
   - mlindner
   - tbato

sendmail::masquerade_enable: false
sendmail::root_email: cms-service-crab3htcondor-monitor@cern.ch



tw_release: <-- the TW tag/version -->
tw_scram_arch: <-- the TW architecture -->
tw_repo: <-- the cmswe repository for the rpms -->
tw_name: <-- the TW name to be used for identification in the DB -->
tw_mode: <-- the TW mode -->
tw_nslaves: <-- the number of forked process (workers) --> 
tw_resturl: <-- the urls to the cmsweb forntend(valid only in private mode) -->
tw_recurring_actions:  <-- the list of remcurrent actions -->

certmgr_san: <-- the list of alternative names to be put in the additional section of the service certificate --> 

# tw_release: "3.3.1810.rc2"
# tw_scram_arch: "slc7_amd64_gcc630"
# tw_repo: "comp"
# tw_name: "crab-priv-tw01"
# tw_mode: "private"
# tw_nslaves: 2
# tw_resturl: "cmsweb-testbed.cern.ch"
# tw_recurring_actions: "['RemovetmpDir', 'BanDestinationSites']"

# certmgr_san: "crab-priv-tw01.cern.ch, tw/crab-priv-tw01.cern.ch"

Creating the virtual machine:

  • Check if the machine does already exist in the current OpenStack project, and if so delete it and wait for the DNS record for this hostname to be freed (it may take between 20 min and 1h in case you are behind CERN firewall and you are using the CERN DNS servers):

ai-kill ${hostname}

while (host $hostname > /dev/null 2>&1); do clear; echo Waiting for the DNS record of $hostname to be cleared.; sleep 10;done; echo Ready to go 

  • Create the machine via the ai tools (not openstack)
ai-bs -g vocmsglidein/crab --$osrelease --foreman-environment $foremanenv  --landb-mainuser CMS-SERVICE-CRAB3HTCONDOR --landb-responsible CMS-SERVICE-CRAB3HTCONDOR-ADMINS --nova-flavor $openstackflav --nova-sshkey lxplus --landb-ipv6ready -v $hostname

  • You should see long output from openstack and the ai tool which must end up with:

...

Issuing put on https://teigi75.cern.ch:8201/roger/v1/state/crab-priv-tw01.cern.ch/
With headers: {'Content-Type': 'application/json', 'Accept-Encoding': 'deflate', 'Accept': 'application/json'}
With data: {"hostname": "crab-priv-tw01.cern.ch", "appstate": "build"}
Starting new HTTPS connection (1): teigi75.cern.ch
https://teigi75.cern.ch:8201 "PUT /roger/v1/state/crab-priv-tw01.cern.ch/ HTTP/1.1" 401 381
Resetting dropped connection: teigi75.cern.ch
https://teigi75.cern.ch:8201 "PUT /roger/v1/state/crab-priv-tw01.cern.ch/ HTTP/1.1" 204 0
Returned (204) 
----------------------------------------------------------------------
* Your machine is booting and the network is being configured right now,
  Puppet will run immediately after a successful boot process.
* It typically takes around 30 minutes between this command is
  executed and the first Puppet report arrives to Foreman:
  https://judy.cern.ch/hosts/crab-priv-tw01.cern.ch/config_reports
  (although this depends a lot on the complexity of your configuration)
* After the initial configuration, if you've set rootegroups or
  rootusers in Foreman or Hiera you should be able to log in as
  root using your Kerberos credentials. The LANDB responsible
  has also root access by default.
* You can check the status of the node creation request by running:
  'openstack server show crab-priv-tw01'
* A custom LANDB responsible/user has been set. It will be visible
  in Foreman a few minutes after the node is booted. In the
  meantime, the Foreman owner will be the issuer of this command.
  (tivanov@CERN.CH)
* In case of problems, if you provided a SSH key when creating the node
  use it to log into the box and take a look at /var/log/cloud-init*.log.
  Console log can be retrieved by using 'openstack console log show'.
----------------------------------------------------------------------

  • Wait until the machine has been build for real:

until (host $hostname > /dev/null 2>&1); do clear; echo Waiting for the build process to finish.; sleep 10;done; until (ping -c 1 $hostname > /dev/null 2>&1); do echo Waiting for $hostname to show up. ; sleep 10;done; echo Ready to go

Configure your virtual machine and services via puppet

Migration to CC7

Warning, important NOTE: Before you start put the machine you are about to work with on drain in https://gitlab.cern.ch/crab3/CRAB3ServerConfig/

What have been mentioned above is the correct sequence of actions to be used for the migration to CC7, but in case one wants to automate things a little bit, then better create the following script (and fill the blanc spaces accordingly):

[tivanov@aiadm32 ~]$ cat > slc6-slc7.sh <<EOF
#!/bin/bash

# Set the correct OpenStack project to run in (crab|private)
. ~/.openrc crab

hostname='........' # example: 'vocms0122'
foremanenv='........' # example: 'production'
openstackflav='........' # example: 'r2.2xlarge'
avzone='........' # example : cern-geneva-a

ipv6ready='false' # true or false
osrelease='cc7'

logfile=${hostname}_upgrade.log
volumes="${hostname}_standard ${hostname}_high"

touch $logfile

# redirect everything to logfile
exec &>>$logfile

echo
echo =======================================================
echo Start: `basename $0` at: `date -Im`


wrapfun (){
 ###
 ### returns error if either the executed command fails or the execution itself
 ### is skipped due to a previous succesfull run of the cutrrent step
 ###
 step=$1
 execute=$2

 echo
 echo -------------------------------------------------------
 echo -e "wrapfun call for: \n step: $step \n exec: $execute\n" >&2
 grep "Step $step done" $logfile >/dev/null 2>&1
 toexecute=$?
 if [[ $toexecute -eq 0 ]]
 then
     ## found the srting -> we return false and continue
     retval=255
 else
     ## have not found the string -> we execute and return the err value
     eval $execute 2>&1
     err=$?
     [[ $err -eq 0 ]] && echo "Step $step done"
     retval=$err
 fi
 echo -------------------------------------------------------
 return $retval
}

iswaiting () {  [[ $1 -eq 0 ]] && return 0  ; echo Not; return 1  ;}

echo
echo =======================================================
echo "Detaching the volumes"
step=1
for volume in $volumes
do
    wrapfun $step "openstack server remove volume ${hostname} $volume"
    let step+=1
done

echo
echo =======================================================
echo "Kill the VM"
step=3
wrapfun $step "ai-kill ${hostname}"
lasterr=$?

echo
echo =======================================================
echo "$(iswaiting $lasterr) Waiting for the DNS record of $hostname to be cleared"
[[ $lasterr -eq 0 ]] && while (host $hostname > /dev/null 2>&1); do sleep 60; done

echo
echo =======================================================
echo "Rebuild the VM"
step=4
wrapfun $step "ai-bs -g vocmsglidein/crab --$osrelease --foreman-environment $foremanenv  --landb-mainuser CMS-SERVICE-CRAB3HTCONDOR --landb-responsible CMS-SERVICE-CRAB3HTCONDOR-ADMINS --nova-flavor $openstackflav --nova-availabilityzone $avzone --nova-sshkey lxplus --nova-parameter landb-ipv6ready=$ipv6ready -v $hostname"
lasterr=$?

echo
echo =======================================================
echo  "$(iswaiting $lasterr) Waiting for the build process to finish."
[[ $lasterr -eq 0 ]] && until (host $hostname > /dev/null 2>&1); do sleep 10;done


echo
echo =======================================================
echo "$(iswaiting $lasterr) Waiting for $hostname to show up."
[[ $lasterr -eq 0 ]] && until (host $hostname > /dev/null 2>&1); do sleep 10;done

echo
echo =======================================================
echo "$(iswaiting $lasterr) Waiting for $hostname to boot."
[[ $lasterr -eq 0 ]] && until (ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@$hostname 'true' 2>/dev/null); do sleep 10; done

echo
echo =======================================================
echo "Re attach the volumes"

step=5
mnt_status=0
for volume in $volumes
do
    wrapfun $step "openstack server add volume ${hostname} $volume"
    lasterr=$?
    let mnt_status+=$lasterr
    let step+=1
done

if [[ $mnt_status -eq 0 ]]
then
    echo
    echo =======================================================
    echo "All volumes mounted correctly"
else
    echo
    echo =======================================================
    echo "ERR while mounting some volumes"
fi

echo
echo =======================================================
echo "DONE"
echo =======================================================


EOF

  • First step is to copy the 99_local_tweaks file for that machine:

[tivanov@aiadm32 ~]$ hostname='........' # example: 'vocms0122'
[tivanov@aiadm32 ~]$ scp -P 2222 root@${hostname}.cern.ch:/etc/condor/config.d/99_local_tweaks.config ~/99_local_tweaks.config.${hostname}  

  • Edit data/fqdns/vocms*.yaml for that machine in the puppet repository. Change the condor version to refer to the one with the .el7 suffix. And also stop the WMArchiveUploader service:
...
condor_version: 8.6.13-1.el7
...
enable_wmarchiveuploader: false

  • Run the above script and wait until it finishes (it should take at around an hour - the time to wait for the DNS record to be refreshed)
[tivanov@aiadm32 ~]$ (./slc6-slc7.sh &) ; sleep 5 ; tail -f ${hostname}_upgrade.log 

  • Login to the machine - ssh -p 2222 root@${hostname}.cern.ch At first, if the puppet run have not progressed to the end, you may need to login to the standard ssh port 22 instead.
  • Check if the CEPH volumes are correctly mounted on that machine. If they are not into the output of the following command then list the /etc/fstab file where you should see them listed by UUID. If they are not listed run puppet repeatedly until they show up into the fstab file and then restart the machine.
[root@vocms0122 ~]# mount |grep -E '/dev/vd(b|c)'
/dev/vdb on /home/grid type ext4 (rw,relatime,data=ordered)
/dev/vdc on /data type ext4 (rw,relatime,data=ordered)

[root@vocms0122 ~]# cat /etc/fstab 
# HEADER: This file was autogenerated at 2019-05-30 09:26:17 +0200
# HEADER: by puppet.  While it can still be managed manually, it
# HEADER: is definitely not recommended.

#
# /etc/fstab
# Created by anaconda on Mon Dec  3 13:47:35 2018
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=632f04d6-0d40-44f5-ad88-8adede6434d4   /   xfs   defaults   0   0
UUID="e60c5645-f269-42f2-b911-ed916945cd87"   /data   ext4   defaults   0   0
UUID="6172787e-2558-4c38-a719-8345d9a1a2cd"   /home/grid   ext4   defaults   0   0

  • Run puppet many times by hand - you will have plenty of errors while running and every next time the set of applied stanzas from the catalog will be different. Repeat until everything that is possible to be applied from the catalog is applied, then the the set of error will be all the same on every next run or eventually will have none.
  • One of the errors that you may see during those puppet runs is:

Error: Could not create user _condor: Execution of '/usr/sbin/useradd -c Condor Pool -g 100000 -d /home/condor -s /bin/bash -u 100003 -m _condor' returned 4: useradd: UID 100003 is not unique
Error: /Stage[main]/Hg_vocmsglidein::Profiles::Crabschedd/User[_condor]/ensure: change from 'absent' to 'present' failed: Could not create user _condor: Execution of '/usr/sbin/useradd -c Condor Pool -g 100000 -d /home/condor -s /bin/bash -u 100003 -m _condor' returned 4: useradd: UID 100003 is not unique
Error: Could not create user _gfactory: Execution of '/usr/sbin/useradd -c GlideIn Factory -g 100000 -d /home/gfactory -s /bin/bash -u 100001 -m _gfactory' returned 4: useradd: UID 100001 is not unique
Error: /Stage[main]/Hg_vocmsglidein::Profiles::Crabschedd/User[_gfactory]/ensure: change from 'absent' to 'present' failed: Could not create user _gfactory: Execution of '/usr/sbin/useradd -c GlideIn Factory -g 100000 -d /home/gfactory -s /bin/bash -u 100001 -m _gfactory' returned 4: useradd: UID 100001 is not unique

The reason for that is the overlap of some of the local and ldap userids. To find out which are the names of those user use the following command:

[tivanov@aiadm32 ~]$ ldapsearch -x -h xldap.cern.ch -b 'OU=Users,OU=Organic Units,DC=cern,DC=ch' '(&(objectClass=user) (uidNumber=100003))' sAMAccountName uidNumber gidNumber
[tivanov@aiadm32 ~]$ ldapsearch -x -h xldap.cern.ch -b 'OU=Users,OU=Organic Units,DC=cern,DC=ch' '(&(objectClass=user) (uidNumber=100001))' sAMAccountName uidNumber gidNumber

Once you see which are the 'sAMAccountNames' of those users then you should exclude them in the machine's .yaml file. Add the following lines there:

sssd::filter_users:
   - mlindner
   - tbato

  • In case you are trying to 'downgrade' and install another but the latest version of condor, you may face the following error:

Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-procd-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-procd]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-procd-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-classads-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-classads]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-classads-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-externals-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-externals]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-externals-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-external-libs-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-external-libs]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-external-libs-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-python-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-python]/ensure: change from 'purged' to '8.6.13-1.el6' failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-python-8.6.13-1.el6' returned 1: Error: Nothing to do
Error: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-static-shadow-8.6.13-1.el6.i686' returned 1: Error: Nothing to do
Error: /Stage[main]/Hg_vocmsglidein::Modules::Condor::Install/Package[condor-static-shadow-8.6.13-1.el6.i686]/ensure: change from 'purged' to 'present' failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install condor-static-shadow-8.6.13-1.el6.i686' returned 1: Error: Nothing to do

The solution to that is:

    • login to that machine as root
    • execute:

1. [root@vocms0122 ~]# for i in `rpm -qa |grep condor`; do yum remove -y $i;done 
2. [root@vocms0122 ~]# yum clean all; yum install condor-8.6.13-1.el7 condor-static-shadow-8.6.13-1.el7 condor-python-8.6.13-1.el7 --disablerepo=htcondor-stable-8.8-rhel7
3. [root@vocms0122 ~]# puppet agent -tv

  • The user id of the condor user from the previous installation most probably was different than the current one. The following command may take a while.


[root@vocms0122 ~]# chown -R condor:condor /data/srv/glidecondor/condor_local/

  • Copy the old 99_local_tweaks file back in place:
[tivanov@aiadm32 ~]$ scp -P 2222 ~/99_local_tweaks.config.${hostname}  root@${hostname}.cern.ch:/etc/condor/config.d/99_local_tweaks.config

  • Copy hostkey.pem as condorkey.pem and change the ownership of the file to user condor and restart the service:

[root@vocms0122 ~]# cp /etc/grid-security/hostkey.pem  /etc/grid-security/condorkey.pem
[root@vocms0122 ~]# chown condor:condor  /etc/grid-security/condorkey.pem
[root@vocms0122 ~]# systemctl restart condor
  • Enable ASOv2 in that schedd:

[root@vocms0122 ~]# touch /etc/enable_aso_v2

  • In case you need to change the ipv6ready flag to that machine you may use the following command from and aiadmin node:
[tivanov@aiadm32 ~]$ . ~/.openrc crab
[tivanov@aiadm32 ~]$ openstack server set --property landb-ipv6ready='false' ${hostname}

Hold reason: 'AdjustSites.py failed to update the webdir.'

Monitoring Scripts distribution.

Warning, important NOTE: This may not be the best place to put the following information but since we have only this page as a puppet dedicated one I will paste it here and the page may be reorganized later.

The script that is used to send data to Elastic search is the following: https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/blob/master/code/files/profiles/crabtaskworker/GenerateMONIT.py

The operator maintaining this needs to be sure there is only one instance of the script running at a time, because we do not need redundant data feed to ES and also if the script is placed at a machine which cannot run it properly the cron job starts flooding with emails everybody in the e-group ...... . By design the host supposed to run the script should be the TaskWorker but could also be any other machine (even a dedicated one). So far in order to ensure the single instance of the script we use the following preventive measures:

  1. We have the name of the machine supposed to run it hardcoded in the script, which is not the best method but used it just as an extra precaution.
  2. We have one parameter in the TaskWorker profile which is read by the hiera function and is used to manage the distribution of the script to the machines supposed to run it. In order to enable the script on one machine in the .yaml file used to configure it (code/qdns/crab--tw0[0-9]*.cern.ch.yaml) put the following enable/disable flag:

enable_generatemonit: true

Example: https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/blob/master/data/fqdns/crab-prod-tw01.cern.ch.yaml#L30

The 'hiera' parameter is read here: https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/blob/master/code/manifests/profiles/crabtaskworker.pp#L13

and is used here: https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/blob/master/code/manifests/profiles/crabtaskworker.pp#L249-262

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2019-07-08 - TodorTrendafilovIvanov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback