ROCKS on an ATLAS Tier3 Cluster

Starting off, having ROCKS install and maintain the cluster is a real improvement over doing it by hand. I tried the alternative and found it to be lacking.

We're going to be using ROCKS 5.3 (Rolled Tacos) for this, and the OS we'll select is SL5 (Not SLC, in this case, based on the experience of Pat McGuigan of UTA. He's the source of almost all of this).

Go to http://www.rocksclusters.org/wordpress/ for more info on ROCKS, or here to download.

Starting Up a Clean Cluster

Media Preparation

Get the Jumbo Roll DVD for installation, and burn it to a DVD. Also go here and get the DVD ISOs and burn them to DVDs as well. Finally, back at rocksclusters.org, get the torque roll (here) to be able to install the Torque+Maui batch system automatically. Burn that to a CD or DVD as well.

Initial Installation

You're now ready to start the ROCKS installation. You need to have chosen a master node for this (in my case, it's the login node as well). Some comments on architecture here (to the best of my understanding):

  • All of the machines you are installing need to be on the same private network, and only the master is going to be looking at the public network
  • The network switch you are using needs to be enabled for PXE booting, as well as all of the NICs on the machines
  • The public network is plugged into eth1 on the headnode, and the private network shared with the compute and storage nodes is plugged into eth0

Boot the machine with the ROCKS DVD in the drive, and you'll eventually be presented with the ROCKS front-end. Type "build", and wait for the installation main window to arrive.

Click the button (CD/DVD-based Roll) that allows you to select the rolls you will need. The screen that follows will allow you to select a set of rolls from a list -- please choose the following:

  • base
  • ganglia
  • hpc
  • kernel
  • web-server

Please DON'T select OS -- that will be provided by the SL5 DVDs you created before. If you choose OS, the ROCKS standard CentOS will be installed.

I wanted to test SGE, but we realized that the commands for PBS/Torque and SGE could conflict -- so it's been left aside for now.

Hit the submit button, and you'll be taken back to the first screen. Select the same button (CD/DVD-based Roll) again, and it will ask you to insert a DVD (the name won't be familiar, but it'll ask for the OS DVD). Put in the first SL5 DVD, click the confirmation and wait. You'll then be asked for another OS DVD, and you'll offer the second disk. This will be followed by reinserting the ROCKS DVD, and you'll end with the Torque CD.

You'll go on with the installation process -- add your cluster information. The info collected will be added to certificates that are auto-generated in the install process. The fully-qualified host name is important! Clicking "Next" will take you to network configuration -- use the options 10.1.1.1/255.0.0.0. This will give you maximum flexibility in your future cluster setup.

It will be difficult to back out some of the choices you make in this process, so consider carefully when you choose settings. Opt for flexibility.

You then set your external network as has been specified by your network admin to match the fully qualified domain name (FQDN) you specified two screens ago.

Next, set a time server, root password (this will reflect across the whole cluster) and disk partitioning. I recommend that you choose "Manual Partitioning". The reason is as follows:

ROCKS will create a directory in each compute node called /state/partition1, unless otherwise instructed. This is space that will be preserved across all reinstallations -- something valuable for an app server, for example, but useless for compute nodes. Manual partitioning allows you to fiddle with the space given to this partition, or put it on a second disk (on, for example, an app server), or remove it entirely.

Click "Next", and you go to the ROCKS installer, where you're greeted by a disk partitioning dialog.

Since this is the frontend node, you need to have a minimum of 16 GB set aside as / (more is fine) and a substantial /export partition (installed size here was 6.5 GB, so make it larger than that by a fair margin.) Don't use LVM -- it's not supported on ROCKS.

Create at least two partitions on the first disk (/dev/sda or /dev/hda) -- / and swap. Format / as ext3 and swap as swap. The swap should be 4 GB or so. Set the / partition to be formatted.

After clicking "Next", ROCKS will ask you to cycle through all the DVDs a second time -- just follow the directions. The RPMs are being copied off into /export, where they will sit even if not installed.

Wait through the installer process, and your head node is now installed. Congratulations. That was the hard part.

Installation of Compute Nodes -- Basic

Let's go on to install some simple compute nodes. While these nodes may not have everything you want (yet), reinstalling and extending them is so easy that we'll just start with the basics. Once a compute node is installed by hand, it's not necessary to be physically present for any of its reinstalls or upgrades. This is the only time-consuming part.

Turn off all the compute nodes.

On the headnode, log in launch a terminal as root (the only account you have defined yet, right?)

Run the command:

insert-ethers --rack=1 --rank=25

You'll probably have only one rack for your T3. If you have more, decide on the numbering of racks. The rack number allows you to easily locate a compute node in its rack. The rank number can correspond to the physical slot where the node is installed (which helps a lot with repairs and troubleshooting). Start with the lowest number for both rack and rank, so that ROCKS can automatically increment the numbering for you. I have chosen (for this example) rack as 1 and rank as 25. I'll have compute nodes going out to 42 or so, in this example.

When I run the command insert-ethers, with rack and rank defined, it will give me a list of choices for types of "Appliance" to install -- I select Compute, for now. I get to a blue screen with a gray box in the middle, like this. Until one of the compute nodes has tried to network boot in PXE mode, this box will remain empty.

Power up the compute node in slot 25. As it boots, press the key that makes it attempt to PXE boot (F12 on Dell machines). Since it is on the same switch and local network as the headnode (which is also the DHCP server for the network), the headnode will notice the PXE boot and insert-ethers will grab the machine and start installing the compute node software. You'll know this was successful when the insert-ethers window pops up a "Discovered New Appliance" dialog with a MAC address in it, like this. The installation will continue until it becomes independent of insert-ethers -- you'll know it when an asterisk appears next to the MAC address listed for the node you're installing.

As soon as the node has been recognized by insert-ethers, you can go on to the next one (in this case, 26). Power it up, and repeat the exact process. ROCKS will assign it the name of compute-1-26, just as the first one was compute-1-25. This will go on until you quit insert-ethers (via the F8 key). Please don't quit until all listed installing nodes have asterisks beside them.

Installation Monitoring

You can even monitor the installation with the command

rocks-console compute-1-25

or eventually log into the machine before it's fully built via

ssh compute-1-25 -p 2200

If you make a mistake and need to redo a machine, just do:

Removing Machines and Correcting Mistakes

install-ethers --remove compute-1-25
install-ethers --update # (This is just for safety, but can resolve nasty issues)

and repeat the preceding instructions.

ROCKS Administration Basics

After installing all the compute nodes, we need to spend some time on other kinds of ROCKS appliances -- and that means we need to turn to ROCKS on the headnode.

ROCKS is based out of the /export/rocks directory on the headnode. There are four principal mechanisms it uses for cluster administration (one of which you have already used):

  • rocks
  • insert-ethers
  • 411
  • tentakel

ROCKS

ROCKS is the primary admin system. It maintains a MySQL database of cluster and node information, and modifies this information as necessary as nodes are updated or upgraded or removed. It knows MAC addresses, hostnames, and software packages installed, as well as routing and other arcana. More on its command structure later, as we discuss how to add and modify Appliance types.

Insert-ethers

insert-ethers handles the interface between the headnode itself and ROCKS. When run to insert or remove a node, it updates ROCKS with the node's MAC address, and assigns a new IP address to the node. It modifies the /etc/hosts and /etc/dhcp.conf files to assign those addresses permanently. In cases where there are inconsistencies between ROCKS and the headnode IP configuration, a --update option can sort them out.

411

411 propagates important information (login/password, sudoers, NFS exports, mount commands, etc) between the headnode and the rest of the cluster. It's based in /var/411, and is based on a series of Makefiles. You can change the list of files it handles in /var/411/Files.mk. Be sure, when this change is made, to:

cd /var/411 make

When changes are made to these important files at the headnode, the 411 system propagates those changes on a regular basis. If you need the changes to be passed along immediately,

make -C /var/411 force

will pass along the changes immediately.

Tentakel

tentakel allows system-wide commands with ease. One can even send commands to subsets of the cluster, or individual nodes. To see what subsets of the cluster can be used,

tentakel -l

Say you wanted to check the uptime on all compute nodes:

tentakel -g compute /usr/bin/uptime

Creating Appliances (Not Just Compute Nodes)

Appliance modification and creation is not difficult, once you know where to go and what you are doing. Appliances are specific kinds of ROCKS nodes, with special modifications for your individual needs. I will include a number of example files from the Tier3 in my subversion repository at

https://svnweb.cern.ch/trac/stradlin/browser/ROCKS

which you can get to if you have ATLAS credentials. These will serve as my examples as well.

Inheritance

Appliances in ROCKS are allowed to inherit from each other. For example -- there exists an appliance called a NAS Appliance (Network Attached Storage). This might be useful, but I might want a new kind of appliance that derives from it and extends it -- and that automatically has a different name in the cluster.

As mentioned before, the ROCKS installation lives in /export/rocks. The most interesting part for normal work is /export/rocks/install, however. That will be the base directory from which your modifications will be executed.

Describing Relationships

Go to /export/rocks/install/site-profiles. You'll see a directory corresponding to the ROCKS version you're using -- in this case, it's 5.3. When I go into /export/rocks/install/site-profiles/5.3, I see two directories -- graphs and nodes. The graphs/default directory contains XML files describing relationships and inheritance between existing Appliances and their children.

For example -- I want to take a NAS Appliance and convert it into a more specialized network storage -- I want an application server for my Tier3. I create a new file in /export/rocks/install/site-profiles/5.3/graphs/default called app.xml, with the following contents:

<?xml version="1.0" standalone="no"?>
<graph>
  <description>
  </description>
  <changelog>
  </changelog>
  <edge from="app">
    <to>nas</to>
  </edge>
  <order gen="kgen" head="TAIL">
   <tail>app</tail>
  </order>
</graph>

You can see the structure clearly -- I'm creating an "edge" (or line) between the node nas to the node app, which I just defined. I can create edges to other Appliances from which I want to inherit -- just add another

<to>appliance</to>
line below
<to>nas</to>
like
<to>compute</to>
To see what appliance types are already available,

rocks list appliance

Once you have the inheritance in place, you can go to the other directory -- /export/rocks/install/site-profiles/5.3/nodes. You'll see another set of XML files that describe either Appliances or extensions to Appliances.

Describing Appliances

We want to create a new appliance, called "Application Server". Create the file app.xml, and fill it from the following link:

app.xml

or get a copy (probably up to date) from SVN here.

This file defines the node for this appliance. It has pre-install scripts, specifications for additional packages to be installed by default, and post-install information. I have added a lot of standard packages that Athena requires -- you can remove things as you wish. Since I am likely to be testing installed software from the app server, I decided that all packages I install on compute nodes will also show up in the app server.

In addition, you might note that I asked ROCKS to completely reinstall cvmfs and squid. The whole machine setup can be automated through the post-install mechanism, even for things that don't come from RPMs, or that change a lot, or that need special configurations. There are examples of how to create specialized crontabs for different users, configuration files and etc.

Once these files are created, you should run xmllint --noout on them, to make sure there are no XML errors. Things like ampersands and greater-than signs are tricky in XML, so you'll need to escape them. Look to the app.xml file for examples.

Adding Appliances to ROCKS

The definitions are created, but now ROCKS needs to know about them. First, we add the appliance definition to ROCKS:

rocks add appliance app membership='Applications Server' node='app'

This creates a definition in the database. The name app will be the prefix on your node's hostname. "Application Server" will show up in the list of appliances you can install when you run insert-ethers.

Finally, we need to update ROCKS' distribution lists. It is ESSENTIAL that you complete this step in the /export/rocks/install directory!

cd /export/rocks/install

rocks create distro

After the distro is recreated, you can use install-ethers as described above to install your app server definition wherever you want.

Extending an Appliance

Extending an Appliance is similar, but simpler. If, for example, I wanted to define an extended compute node, I'd create the file extended-compute.xml in the nodes directory listed above -- and following the same structure of pre- and post-install scripts and additional packages, I'd extend the compute node's definition.

After finishing the extension (an example of which is found in the SVN), I would once again run

cd /export/rocks/install

rocks create distro

to update the installation files.

Updating existing installations is easy when you have these extension files in place. Say I needed to add emacs to all compute nodes. I'd add:

<package>emacs-common</package> 
<package>emacs-nox</package> 

to the packages section of my XML file, then run a rocks create distro. To get my compute nodes to pick up this definition, I can ask them to all reboot and reinstall themselves in one shot.

On one node, the reinstall command is:

/boot/kickstart/cluster-kickstart-pxe

So I can run this on the "compute" goup with tentakel as follows:

tentakel -g compute /boot/kickstart/cluster-kickstart-pxe

Don't worry about overloading the headnode -- after a certain point, these transfers are done via BitTorrent, to scale the network load.

Live Updates of Nodes

If you don't want to reboot nodes that are operating, just add the definition to the XML file and rebuild the ROCKS distro -- and then ask yum to install the package for the moment:

tentakel -g compute yum install emacs-common emacs-nox

Similar commands will allow you to make modifications across the cluster, reboot machines in groups, and other fun tricks.


Contact Email Address: Alden.Stradling@cern.ch

%RESPONSIBLE% AldenStradling
%REVIEW% Never reviewed

-- AldenStradling - 24-Sep-2010

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2010-10-10 - AldenStradling
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback