VGrid Web Portal
The VGrid portal consists of two parts - a server part and a node part. The server part has to be installed only on one machine and is used to provide the portal web pages to the user and to keep track of the deployed virtual machines. The node part has to be installed on every node of the cluster where virtual machines should be deployable. It is responsible for deploying and terminating virtual machines on its node and for providing some information for the user interface. Note: It is also possible to install the node part on the machine which runs the server part.
Requirements
On all machines:
- A web server (Apache recommended)
- Cheetah templating library
- Python 2.3 (or later)
On the nodes in addition:
- XEN
- libvirt (if you are using Python 2.3 you will need a patched version from the repository
- A LVM volume group to create the partitions for the VMs on
Note: Although it is possible to use Python 2.3 it is highly recommended to use Python 2.4 or later for the server, since the popen command in Python 2.3 tends to leave behind hanging child processes, this does not happen with the subprocess package in newer Python versions.
Installation
There are three steps for vgrid deployment as listed below. You should copy the
yum repo
file to /etc/yum.repos.d/ on your node first of all.
The Xen installation procedure is based on the
JaroslawPolok guide.
- Node -> yum install vgrid-node
This will install vdeploy-node, and the related dependency packages.
- Server -> yum install vgrid-server
This will install vdeploy-server and and the related dependency packages.
Configure the machine to boot with XEN
You have to add the Xen boot entry to your machine's
/etc/grub.conf file. Verify the root filesystem label according to your machine's configuration in
/etc/fstab:
...
title Scientific Linux CERN SLC (2.6.18-8.1.3.slc4xen)
root (hd0,0)
kernel /xen.gz-2.6.18-8.1.3.slc4 dom0_mem=512M com1=9600,8n1 console=vga,com1
module /vmlinuz-2.6.18-8.1.3.slc4xen ro root=LABEL=/ xencons=ttyS console=tty0 console=ttyS0
module /initrd-2.6.18-8.1.3.slc4xen.img
...
Once you have updated your bootloader reboot the machine.
Create a LVM volume group to provide space for the the VM partitions
vGrid uses LVM to create and remove the partitions for the virtual machines. It is necessary to create a LVM volume group which provides the space for the vGrid managed virtual machine partitions. Please have a look into the LVM manual or the manual for your distribution for instructions how to do that (i.e. the instructions for
SLC4
).
The name of the volume group has to be set in the
volumeGroupName entry of the
General section in
node.cfg for configuring vGrid to use this volume group.
Add sudo entries
On the nodes it is required to add certain commands to the sodu configuration for the user web server is running as.
Entries to add for a SLC4 node running vGrid on Apache:
apache ALL = NOPASSWD: /usr/sbin/xm, /bin/tar, /usr/sbin/lvcreate, \
/sbin/mkswap, /sbin/mkfs.ext3, \
/usr/sbin/lvremove, /bin/mount, /bin/umount, \
/bin/mv, /bin/rm, /usr/sbin/vgs, /usr/sbin/lvs
Add the cron job
For the the expiry time feature the automatic mail notification if VMs don't respond and the automatic termination of VMs it is necessary to add a cron job on the server. This can be done by typing:
crontab -e -u apache
and adding following line to the cron job file:
0 * * * * cd /var/vgrid/mod && ./vmcleaner.py
Use a different Python version for the server scripts
It is highly recommended to use Python >= 2.4 for the server scripts, since the popen command in Python 2.3 tends to leave behind hanging child processes. This does not happen with the subprocess package in newer Python versions. If you don't have Python >= 2.4 installed as default Python version on your system (you can verify this by just typing
python on the command line) then you should change a line in two scripts to point to a newer version.
The files are:
- user/server.py
- mod/vmcleaner.py
Change the first line from:
#!/usr/bin/python
to your binary of the newer Python version, for example for the installation from the Python 2.5 tar-ball with the default installation directory:
#!/usr/local/bin/python2.5
Please make sure that you also install Cheetah for the newer Python version too, for example using following command to install it from the
tar-ball
for a Python 2.5 installed from a
tar-ball
:
/usr/local/bin/python2.5 setup.py install
Configuration
Web server configuration
vGrid has to be provided in the following directories on the web server:
- /cgi-bin/user: User accessible parts of vGrid.
- /cgi-bin/internal: Scripts for internal communication between the server and the nodes.
- /resources: Files required for the mark-up of the portal.
To add those directories to Apaches directory structure add following entries to Apaches configuration file (httpd.conf):
Alias /resources "/usr/local/vgrid/htmltmpl/resources"
<Directory "/usr/local/vgrid/htmltmpl/resources">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
ScriptAlias /cgi-bin/ "/usr/local/vgrid/cgi/"
<Directory "/usr/local/vgrid/cgi/">
AllowOverride AuthConfig Options
Options None
AddHandler cgi-script .py
Order allow,deny
Allow from all
</Directory>
User authentication
The user authentication and managed by the web server (usally Apache) used to host the vGrid server. Every user that is allowed to access the vGrid server pages is allowed to deploy virtual machines. The authentication can be done using different authentication mechanisms:
- Basic Authentication: Login using username and password.
- Shibboleth: Login using Shibboleth Single SignOn (for example for using CERN NICE login).
- Certificates: Login using a certificate in the users browser.
- Others: vGrid can be used with any type of authentication mechanism that exports the user name into an environment variable.
The user authentication types can be set by using Apaches
.htaccess file which has to be in the
cgi/user directory of the server.
Basic authentication
Apaches basic authentication allows logins using a username and a password which are specified in a password file on the server.
.htaccess file for basic authentication:
AuthType Basic
AuthName "vGrid Portal"
AuthUserFile /path/to/the/user/passwords
Require user username1 username2
For information on Apaches authentication settings and how to create the password file please have a look at the
authentication documentation on the Apache site
.
For this kind of authentication the
EnvUsernameVariable setting in the
General section of
server.cfg has to be set to:
REMOTE_USER
Shibboleth authentication using NICE
For the installation of Shibboleth with NICE please refer to the
NICE SSO in Apache page
.
.htaccess file for Shibboleth authentication using NICE:
SSLRequireSSL
AuthType shibboleth
ShibRequireSession On
ShibRequireAll On
ShibExportAssertion Off
Require adfs-user "username1" "username2"
For this kind of authentication the
EnvUsernameVariable setting in the
General section of
server.cfg has to be set to:
HTTP_ADFS_LOGIN
Certificate authentication
.htaccess file for certificate authentication:
SSLRequireSSL
SSLOptions +StdEnvVars +FakeBasicAuth
SSLVerifyClient require
SSLVerifyDepth 1
SSLRequire %{SSL_CLIENT_S_DN_O} eq "CERN" and \
%{SSL_CLIENT_S_DN_OU} in {"USERS", "CA"}
SSLCACertificatePath /etc/httpd/conf/ssl.crt/
SSLLog /etc/httpd/logs/ssl_engine_log
SSLLogLevel warn error
AuthName "Certificate Authentication"
AuthType Basic
AuthUserFile /path/to/fake/httpd.passwd
require valid-user
With this
.htaccess file it is required that the users certificates are valid (the CA certificate has to be available and trusted at the server) and the user is listed in the password file. It is also possible to list the allowed users/organisations using the
SSLRequire statement. For more information on both methods please have a look at the
official mod_ssl HowTo
or
here
.
For this kind of authentication the
EnvUsernameVariable setting in the
General section of
server.cfg has to be set to:
SSL_CLIENT_S_DN_CN
Or you set
SSLUserName SSL_CLIENT_S_DN_CN in the .htaccess file to continue to use
REMOTE_USER in the
server.cfg file.
Node authentication
A basic level of node authentication is managed on server side that all submitting nodes have to be listed in the
common.cfg on server side, and the server node have to be matched in the
node.cfg so that nodes can accept the VM deployment and termination requests.
vGrid configuration
There are three different configuration files:
- server.cfg: Contains server specific settings.
- node.cfg: Contains node specific settings.
- common.cfg: Contains settings which are used by the server and the node part. This file has to be identical on all machines.
common.cfg
Virtual Machine Image Tags
All images listed in the
VMImages section are displayed as options to the user. The virtual machine images the user can select from are identified by tags within vGrid. Each entry has to be a
key: value pair, where the key is a tag and the value is the name displayed to the user in the web portal.
Physical Host Pool
The list of physical hosts is in the
PhysicalHostsPool section. Each entry has to be a
key: value pair, with the hostname as key and the maximum number of virtual machines on this host as value.
Virtual Host Name Pool
The list of virtual hosts is in the
VirtualHostsPool section. Each entry has to be a
key: value pair, with the hostname as key, the value could be set to either kernal or network parameters (virtual MAC address) and would be accessible to Xen templates on the nodes $vmExtra variable.
server.cfg
General Settings
vGrid relys on the web server to do the authentication of the users. It is required that the web server stores the username in an environment variable. The name of the environment variable is set in the
EnvUsernameVariable configuration entry. See
User authentication for more information.
Administrators
Administrators are users who are allowed to terminate virtual machines of others. The list of administrators is in the
Administrators section. Each entry has to be a
key: value pair, with the username as value (the key is ignored). Note: vGrid replaces spaces in usernames by underscores.
node.cfg
General Settings
General settings for the nodes are stored in the
General section. There are following entries:
- serverHostname: Specifies the hostname of the vGrid server for the node. The address of the server is required since the nodes send back information about the deployment status to the server. The server address is also used to check if the deployment and termination requests are coming from the right address.
- imageDirectory: Directory where the virtual machine image files are stored. This directory is used for the images specified in the VMImageFiles section and also for the images saved by the users.
Virtual Machine Images
There are several sections for specifying the settings for the image tags defined in the
VMImages section in
common.cfg. All of them are
key: value pairs, where the keys have to be the image tag defined in
VMImages or "default" for a default setting in case a key is not in the section.
The sections are:
- VMConfigTemplates: Specifies the names of the configuration file templates that should be used to start the virtual machines.
- VMImageFiles: Specifies the name of the virtual machine images.
- VMSwapSize: Specifies the size of the swap partition (in MB).
- VMTypes: Specifies the type of virtual machine to be used (currently only "xen" is supported).
- XenDeployerFSCreateCmd: Specifies the command which should be used to create the filesystem on the root partition.
VM Configuration File Templates
Two types of xen configuration are provided as part of VGrid installation. One for SLC3/SL3 which requires
sda type block device to specified in the
disk and
root section or
xvda for all other distributions. The templates are stored in
/usr/local/vgrid/vmtmpl/ and are pre-configured for
SLC3/SL3/SLC4/SL4.
Following list of parameters are available for xen templates:
- ${machinename} Machine Name to be visible under xen
- ${vmIP} IP Address of the Virtual Machine
- ${vmHostname} Virtual Host Name as available from the network
- ${vmMemory} Memory for the virtual machine
- ${username} User name to be set in the Extra section for ssk keys
The admins can configure their templates on node basis as per their needs but here is an example of a standard template for SLC4/SL4.
name = "${machinename}"
ip = "${vmIp}"
hostname = "${vmHostname}"
memory = "${vmMemory}"
disk = ['phy:/dev/vg1/xen-root-${machinename},xvda1,w','phy:/dev/vg1/xen-swap-${machinename},xvda2,w' ]
root= "/dev/xvda1 ro"
vif = [ '' ]
gateway= "128.142.1.1"
netmask = "255.255.0.0"
on_reboot = 'restart'
on_crash = 'restart'
on_poweroff = 'restart'
bootloader = '/usr/bin/pygrub'
extra = "vgrid_username=${username}"
Passing arguments to the virtual machine
It is possible to pass arguments to the virtual machine using the
extras entry in the XEN config file. The default XEN config file templates pass the vGrid username to the virtual machine using the entry:
extra = "vgrid_username=${username}"
The contents of the extra entry is appended to the kernel command line parameters of the
DomU kernel an can be accessed using the /proc/cmdline file of the VM.
Setting the ssh public key of the VM owner for the VM root user
To make sure that only the user who started the virtual machine can access the VM (as root user) it is necessary to set the authentication accordingly at startup. The script
copyuserkey
copies the ssh public key of the VM owner into the .ssh directory of the VMs root user on startup. The user keys can be places inside the VM images themselfs or on a central key repository on a web server.
Following steps are necessary for the prepare the VM image (the steps have to be done in the VM image that will be provided to the users):
To install the script, go the directory where the image have been decompressed:
cp copyuserkey etc/init.d/
cd /etc/init.d
chown root copyuserkey
chmod 755 copyuserkey
Add it in the system startup:
cd /etc/rc3.d/
ln -s ../init.d/copyuserkey S55copyuserkey
Make sure that there are no other keys already existing for the root user, delete them if there are some:
rm /root/.ssh/authorized_keys
Now it is necessary to put the user ssh public keys into some repository. Either inside the images themselves or on a web server, or both. The script first checks inside the image and if there was no key found for the user it checks on the web server.
Store the user public keys on a web server
Storing the user ssh public keys on a web server is more convenient, since it is not necessary to recreate the images if a user is added to the system. For using a key repository on a web server it is highly recommended that it is a https site, in this case the script will only use the public key from the server if the server has a valid certificate. It is also possible to use a http server but this might cause security problems.
To use this feature change the WEB_SOURCE_PREFIX variable at the beginning of the script to point to your key repository. The username will be appended to the contents to the variable to get the filename that is fetched from the web server.
Store the user public keys inside the image
If you do not want to use the web repository for the user pubic keys you can also store them inside the virtual machine image. If you want to disable the key retrieval from a web server remove the WEB_SOURCE_PREFIX from the beginning of the script. The keys have to be located in the /opt/ssh_public_keys directory:
Create a user key repository:
mkdir /opt/ssh_public_keys
chown root /opt/ssh_public_keys
chmod 700 /opt/ssh_public_keys
Copy the public keys of the users (id_dsa.pub in the users .ssh dir) into the key repository:
cp /path/to/key/of/user/<username>/id_dsa.pub /opt/ssh_public_keys/authorized_keys.<username>
chown root /opt/ssh_public_keys/authorized_keys.<username>
chmod 700 /opt/ssh_public_keys/authorized_keys.<username>
Troubleshooting in case "None" is passed instead of the "extra" variables
Some older pygrub versions cause that there is "None" passed instead of the "extra" parameter in the xen config file. To check if this execute the following in your VM:
cat /proc/cmdline
If the output looks like this everything is fine:
ip=128.142.200.241:1.2.3.4:128.142.1.1:255.255.0.0:ctb-generic-19.cern.ch:eth0:off root=/dev/xvda1 ro vgrid_username=username
If it looks like this your pygrub has to be updated:
ip=128.142.200.241:1.2.3.4:128.142.1.1:255.255.0.0:ctb-generic-19.cern.ch:eth0:off root=/dev/xvda1 ro None
You can update pygrub either be installing a newer version or by fixing the problem in pygrub by hand. To do that replace in "/usr/bin/pygrub" the lines:
else:
initrd = None
sxp += "(args '%s')" %(img.args,)
sys.stdout.flush()
os.write(fd, sxp)
with following lines:
else:
initrd = None
if img.args:
sxp += "(args '%s')" %(img.args,)
sys.stdout.flush()
os.write(fd, sxp)
Deploy and terminate virtual machines from scripts
The vGrid deployment and termination actions can also be triggered from normal http get requests. The scripts
deploy.sh
and
terminate.sh
are examples how to deploy and terminate virtual machines using
wget. For those example scripts the
basic authentification has to be used. It is also possible to use certificate authentication with wget using the
--certificate option, please have a look into the wget manual page for more details.
Virtual Machine Image Generation
These are standard steps to create virtual machine images in a tar ball.
Downloading Image Generation Scripts
First all all check out image generation scripts from xenvirt CVS repository. Log in to your machine and go the directory structure where you want to check out the scripts and issue the following command:
cvs -d :pserver:anonymous@isscvs.cern.ch:/local/reps/xenvirt/ co libfsimage
Once checkout, go the libfsimage directory and change the permission rights for
genfs.py script
chmod 755 genfs.py
Generate an image
Next step is to create an image. Create the appropriate directories where image have to be generated (temporary) and stored (permanently). Set the
root password at the execution time. Use the help of the
genfs.py to find out all possible image types such as
SLC-4-i386 SLC-5-i386 SL-4-x86_64 etc etc.
./genfs.py -t ${image_type} -d /${dir_path}/${distro_name} -o /${dir_path}/${image_name}.tar.gz -g Base Core -p which wget openssh-clients cvs apt bzip2 cpp tcsh man vixie-cron curl-devel emacs python -devel openssl openafs-client openldap-clients rpm-build gcc -w ${password}
Note: This process usually take 5 or so mins per image.
Image Customization
Once the image have been created in the desired place, then next step is to customize it for gLite certification team. First of all, decompress the image in a directory:
cd /${dir_path}
tar -xzvf ${image_name}.tar.gz .
First copy over all the necessary files in the decompressed filesystems
/tmp such a
j2sdk
or
j2re
for
SLC3/SL3 images only and the host certificates for all images.
For SLC3/SL3 images:
wget ${j2sdk_or_j2re_web_path} tmp/
For host certificates:
mkdir opt/certificates
cp -R ${host_certificates} opt/certificates
For SSH keys, copy the key manager script:
cp copyuserkey etc/init.d/
Now it's the time to install Java and to configure other scripts in the image:
chroot .
For ssh keys configuration:
cd /etc/init.d
chown root copyuserkey
chmod 755 copyuserkey
cd /etc/rc3.d/
ln -s ../init.d/copyuserkey S55copyuserkey
For SLC4/SL4/SL5 images, follow this
guide to create the jpackage yum repo file in the image.
mount -t proc proc/ proc/
yum install jpackage-utils
yum --enablerepo slc4-cernonly install java-1.5.0-sun java-1.5.0-sun-devel
yum -y clean all
umount proc/
For SLC3/SL3 images, follow these steps for Java installation:
cd /tmp
rpm -ivh j2sdk-*-*.rpm
rm -f /etc/alternatives/java /usr/bin/java
ln -s //usr/java/j2sdk-$ver/bin/java /etc/alternatives/java
ln -s /etc/alternatives/java /usr/bin/java
Also make sure that
kudzu service is set to
OFF while
afs and
yum-autoupdate or
apt-autoupdate are set to
ON.
chkconfig --del kudzu
chkconfig --add afs
chkconfig --levels 345 afs on
chkconfig --add _yum-autoupdate_/_apt-autoupdate_
chkconfig --levels 345 _yum-autoupdate_/_apt-autoupdate_ on
Now if you are done, the it's time to exit the
chroot environment:
exit
Now it's time to prepare the image tar ball. You should be at the root path of the directory structure where the image have been decompressed and issue the following command:
tar -czvf ${image_path}/${image_name}.tar.gz *
Unresolved Problems with Xen 3.0.3
Currently, when deploying a virtual machine with a SL(C)4 image,
OpenLDAP 2.2 and bdb are not supported. This means that it's not possible to configure any information providers using this platforms. The problem happens because Xen doesn't support a specific assembly-language
mutex nor does support the nptl glibc (and bdb). This only affects Xen 3.0, it's a known issue by Red Hat but it's marked as "won't fix".
Fix: But by disabling the disabled /lib/tls.disabled_ to
/lib/tls solves the problem