Scientific Cluster Deployment & Recovery
Using puppet to simplify cluster management
V. Hendrix
1, D. Benjamin
2, Y. Yao
1
1Lawrence Berkeley National Laboratory, Berkeley, CA, USA
2Duke University, Durham, NC, USA
New Tier3g Site Setup
Before Start
Connect Network Cables
- Connect your network like in the graph below:
- For Head, Interactive, NFS
- connect eth0 to your outside network
- connect eth1 to the internal network just for tier3
- For Workers:
- connect eth0 to your internal network just for tier3
Create USB Key
Prepare Bootable USB Key
Creating a bootable USB Key
- Get Disk one ISO image of Scientific Linux installation
CD or DVD.
- Create a USB Key
Generate Kickstart Files in the USB Key
* Checkout the kickstart package:
svn export http://svnweb.cern.ch/guest/atustier3/ks/tag/ks-0.1 ks
Create the configuration files for your cluster
cd ks
./generateScripts
- Basic Configuration
Customize parameters.py mentioned in the output of the previous action.This is where you put your hostname, ipaddress etc. The parameters.py file should be self explanatory but let me know if it isn't.Please note that this process makes an assumption that the INTERACTIVE and WORKER nodes have a starting ip address in the private subnet and increment for each successive node. If this is an issue, you can make the changesvi ./mytier3/src/parameters.py
- Generate all kickstart files and other necessary files
./generateScripts
ls ./mytier3 # you should see all the generated files
cd ./mytier3
- Copy kickstart and other necessary files to your USB key
cp -R /path/to/ks /path/to/usb/mount/ks
Install Physical HEAD Node
The head node is the gateway and it will contain virtual machines for the PUPPET, PROXY and LDAP nodes. The HEAD node also has an Http server used for network installations of the other nodes in the cluster.
Kickstart Install
Boot into the USB stick and type:
linux ks=hd:sdx/ks/mytier3/kickstart-head.cfg
Note to replace head with your real file name for the nfs node, normally the domain name of the head node.
Replace sdx with the drive name of your usb disk. Normally it is the one after all your SATA harddisk. E.g. if you have 4 hard disks, your usb will be
sde
Click
Ignore Drive when prompted, so the installer does not format the USB drive.
Configure HEAD
- Copy configuration files from USB key
mkdir /mnt/usb
mount /dev/sdx /mnt/usb # where 'x' is the letter of your usb key
export AT3_CONFIG_DIR=/root/atustier3 # or /root/working if you prefer
mkdir -p $AT3_CONFIG_DIR
cp -r /mnt/usb/ks $AT3_CONFIG_DIR
cd $AT3_CONFIG_DIR
- Configure HEAD and install PUPPET
Install VM LDAP
- Terminal on HEAD as root. It is important to use -X which enabling X11 forwarding to allow the "vm_ldap Virt Viewer" to load in your local environment.
ssh -X head ## where 'head' is the name of your head node.
cd $AT3_CONFIG_DIR/mytier3
./crvm-vmldap.sh
Once the installation of the puppet server finishes. Close the X-windows session OR Ctr-C on the head node
virsh autostart vm_vmldap # puts symlink to the xml file for the VM so that
# the VM can be restarted if the head node is rebooted
- If you would like to reattach to VM LDAP after viewing session has been cancelled:
ssh -X head
virt-viewer --connect qemu:///system vm_vmldap
Install VM PROXY
- Terminal on HEAD as root. It is important to use -X which enabling X11 forwarding to allow the "vm_proxy Virt Viewer" to load in your local environment.
ssh -X head ## where 'head' is the name of your head node.
cd $AT3_CONFIG_DIR/mytier3
./crvm-vmproxy.sh
Once the installation of the puppet server finishes. Close the X-windows session OR Ctr-C on the head node
virsh autostart vm_vmproxy # puts symlink to the xml file for the VM so that
# the VM can be restarted if the head node is rebooted
- If you would like to reattach to VM LDAP after viewing session has been cancelled:
ssh -X head
virt-viewer --connect qemu:///system vm_proxy
Install and configure the rest of the cluster
- NFS node
The other nodes are clients of the NFS service so this node should be up and configured with puppet before continuing with the other nodes.
- Configure HEAD node with puppet by following these instructions: Create certificates for puppet client
- Install INTERACTIVE nodes
- Install WORKER nodes
Existing Tier3g Site Setup
Follow these instructions to configure a Tier 3 puppet server and run the puppet clients against the nodes
On the HEAD Node
ssh -X root@head
yum -y --enablerepo=epel-testing install puppet
yum install python-setuptools
easy_install simplejson
Generate kickstart files for nodes
The script here creates kickstart files which are obviously unnecessary for retrofitting existing Tier 3 sites with puppet
Checkout the kickstart package in a working directory.
export AT3_CONFIG_DIR=/root/atustier3 # or /root/working if you prefer
mkdir -p $AT3_CONFIG_DIR
cd $AT3_CONFIG_DIR
svn export http://svnweb.cern.ch/guest/atustier3/ks/trunk ks
Create the configuration files for your cluster
cd ks
./generateScripts
- Basic Configuration
Customize parameters.py mentioned in the output of the previous action.This is where you put your hostname, ipaddress etc. The parameters.py file should be self explanatory but let me know if it isn't.Please note that this process makes an assumption that the INTERACTIVE and WORKER nodes have a starting ip address in the private subnet and increment for each successive node. If this is an issue, you can make the changesvi ./mytier3/src/parameters.py
- Generate all kickstart files and other necessary files
./generateScripts
ls ./mytier3 # you should see all the generated files
cd ./mytier3
- Configure HEAD and install PUPPET
On Any WORKER Node
Install puppet
yum -y --enablerepo=epel-testing install puppet
Now you can run puppet on the WORKER nodes by following
Create certificates for puppet clients
Supplementary Installation Notes
Configure HEAD and install PUPPET
- Minimally configure the HEAD node before installing the puppet server
./apply-puppet.sh head-init.pp # where head is the name of your head node
- Kickstart installation of puppet VM on HEAD node
Instructions to come. The following script makes the following assumptions
- On the PUPPET Node
Create certificates for puppet clients
- On the PUPPET client
First run puppet on the puppet client. This will create a request with the puppet CA and wait for 30 seconds before trying
puppetd --no-daemonize --test --debug --waitforcert 30
- On the PUPPET server
Now sign the request
puppetca --list # this will tell you of the waiting requests
puppetca --sign puppetclient
- On the PUPPET client
You should see the puppet agent startup after 30 seconds and run successfully. After you have confirmed that the puppet client runs successfully, do the following:
chkconfig puppet on
service puppet start
- On the PUPPET server
You should turn on the puppetmaster service
chkconfig puppetmaster on
Puppet Server SETUP during kickstart installation
The following is how the puppet server is setup during the kickstart installation of the puppet VM
#########################
# Puppet Configuration
cd /etc/puppet
mkdir modules
# Checkout puppet definitions for the whole cluster
svn export http://svnweb.cern.ch/guest/atustier3/puppet/at3moduledef/trunk at3moduledef
svn export http://svnweb.cern.ch/guest/atustier3/puppet/puppetrepo/trunk puppetrepo
# Checkout all modules for use in the WORKER nodes only
# # AUTOMATIC CHECKOUT WITH puppetrepo.py
python puppetrepo/puppetrepo.py --action export --moduledef=/etc/puppet/at3moduledef/modules.def --moduledir=/etc/puppet/modules/ --modulesppfile=/etc/puppet/manifests/modules.pp --loglevel=info
cd /etc/puppet
cp at3moduledef/auth.conf at3moduledef/fileserver.conf ./
cp at3moduledef/site.pp manifests/
## Copy config files over to puppet server from HEAD node
wget -O /etc/puppet/manifests/nodes.pp http://192.168.100.1:8080/nodes.pp
wget -O /etc/puppet/modules/at3_pxe/templates/default.erb http://192.168.100.1:8080/pxelinux.cfg.default
chown -R puppet:puppet /etc/puppet
chmod -R g+rw /etc/puppet
Updating Puppet Modules
You perform these commands to update the puppet modules from the
SVN repository
cd /etc/puppet
svn export --force http://svnweb.cern.ch/guest/atustier3/at3moduledef/trunk at3moduledef
python puppetrepo/puppetrepo.py --action export --moduledef=/etc/puppet/at3moduledef/modules.def --moduledir=/etc/puppet/modules/ --modulesppfile=/etc/puppet/manifests/modules.pp --loglevel=info --svnopts=--force
Checking your configuration files into BNL usatlas-cfg
- Get access to the SVN Repository
- Send your Grid Certificate for your
- Distinguised Name (DN) to
- Doug Benjamin <benjamin@phy.duke.edu>
- Create .p12 file for subversion crentialed access
- These steps are to make sure that a password is never stored in the clear.
emacs -nw ~/.subversion/servers
add the following sections
[global]
store-passwords = yes
store-plaintext-passwords = no
store-ssl-client-cert-pp-plaintext = no
[groups]
usatlas = svn.usatlas.bnl.gov
[usatlas]
ssl-client-cert-file = /path/to/your/user-cert.p12
Setup 1st LDAP User "atlasadmin"
#LDAP Configuration atlas tier3
#Login to any node (with a public IP)
cat > tt <<EOF
dn: dc=mytier3,dc=com
ObjectClass: dcObject
ObjectClass: organization
dc: mytier3
o : mytier3
dn: ou=People,dc=mytier3,dc=com
ou: People
objectClass: top
objectClass: organizationalUnit
dn: ou=Group,dc=mytier3,dc=com
ou: Group
objectClass: top
objectClass: organizationalUnit
dn: cn=ldapusers,ou=Group,dc=mytier3,dc=com
objectClass: posixGroup
objectClass: top
cn: ldapusers
userPassword: {crypt}x
gidNumber: 9000
dn: cn=atlasadmin,ou=People,dc=mytier3,dc=com
cn: atlasadmin
objectClass: posixAccount
objectClass: shadowAccount
objectClass: inetOrgPerson
sn: User
uid: atlasadmin
uidNumber: 1025
gidNumber: 9000
homeDirectory: /export/home/atlasadmin
userPassword: {SSHA}MQstDGq3bTK1Fle+iAa+p4jYgeyl1RIG
EOF
ldapadd -x -D "cn=root,dc=mytier3,dc=com" -c -w abcdefg -f tt -H ldap://ldap/
ldapsearch -x -b 'dc=mytier3,dc=com' '(objectclass=*)' -H ldap://ldap/
mkdir /export/home/atlasadmin;chown atlasadmin:ldapusers /export/home/atlasadmin
Test if condor works:
Login as atlasadmin in int1:
cat > simple.c <<EOF
#include <stdio.h>
main(int argc, char **argv)
{
int sleep_time;
int input;
int failure;
if (argc != 3) {
printf("Usage: simple <sleep-time> <integer>\n");
failure = 1;
} else {
sleep_time = atoi(argv[1]);
input = atoi(argv[2]);
printf("Thinking really hard for %d seconds...\n", sleep_time);
sleep(sleep_time);
printf("We calculated: %d\n", input * 2);
failure = 0;
}
return failure;
}
EOF
gcc -o simple simple.c
cat > submit <<EOF
Universe = vanilla
Executable = simple
Arguments = 4 10
Log = simple.log
Output = simple.out
Error = simple.error
Queue
EOF
condor_submit submit
Kickstart Install a Node
There are several ways to perform a kickstart installation once you have the HEAD node up and minimally configured. Choose the one that fits your machines.
- USB Key
You may duplicate the USB key to install the rest of the nodes in parallel. Using the previously made USB key boot into the machine you are installing. Use the code below. Replace xxx with the short hostname of the node:linux ks=hd:sdx/ks/mytier3/kickstart-xxx.cfg
After reboot, Create certificates for puppet clients
- PXE Install NFS, WORKER and INTERACTIVE Nodes
If your machines are PXE capable. You may enable a PXE boot in the Bios. Make sure that ethernet cable for the PXE boot is connected to the private network. Otherwise you will not be able to connect to the HEAD node which is the PXE Server.
- Enable and Boot via PXE
- Choose from menu which node it is, it will the automatically kickstart install the node.
- After reboot, Create certificates for puppet clients
- Check /var/log/messages for error messages
Major updates:
--
ValHendrix - 16-Sep-2011
- Screen_shot_2010-07-02_at_1.55.17_PM.png: