How to install a boot-server for a VMEbus SBC (VP110)

Introduction

This HowTo was written to document the boot-server installation for the VP110 in the TTC page1 system. The general idea is to install the server following the conventions of the ATLAS sysadmins. Therefore the sysadmins can be asked for help in case of problems with the server when the PH/ESE expert is not available.
For the simple task of booting one VP110 much of the complexity of an ATLAS server is not required. The installation procedure has been simplified accordingly. However enough features have been retained such that this recipe can also be used for the installation of slightly more complex servers.

The page1 server is a bit special because the SBC is on a private network and the server does not offer a DNS service. Therefore IP addresses have to be used instead of IP names in many places.

Preconditions

The IP name of the server is: pcphese08
Install SLC5 on the server

Install additional packages (in case they are missing)

Use the "links" utility (or something else) to download the "mknbi" RPM from (e.g.):
http://sourceforge.net/project/downloading.php?group_id=4233&filename=mknbi-1.4.4-1.noarch.rpm&a=15314451
> rpm -ihv mknbi-1.4.4-1.noarch.rpm

Install the TFTP server
> yum install tftp-server
> cd /etc/xinetd.d
Open the file "tftp" and set disable to "no":
> cd /etc/init.d
> ./xinetd restart
> /usr/sbin/exportfs -ar

Installation procedure

> source /etc/profile
> cd /usr/local/bin/
> ln -s /usr/bin/python python2.4
> cd /mj_addon/

Create a base directory

> mkdir bootserver
> cd bootserver

Get the sysadmin CVS package

Connect to the sysadmin CVS server and check out the sysadmin CVS package. Note: You get the whole sysadmin stuff
> export CVSROOT=:ext:joos@isscvs.cern.ch:/local/reps/atlas-tdaq-adm
> export CVS_RSH=ssh
> cvs co .

Build a boot image for the VP110

> cd /mj_addon/bootserver/bwm
> chmod a+x set_permissions.sh
> ./set_permissions.sh
> cd projects
> cp lab32-slc5_32.xml page1-slc5_32.xml

> nedit page1-slc5_32.xml
Update (if required):

  kernel_version= (e.g.) 2.6.18-128.1.10.el5
  root_dir=... (this is the directory of the SLC5 base installation on the server. 
                      Files of this installation will be copied into the file system for the diskless client)
  Default: root_dir="/"
In addition you can remove undesired services such as:
<plugin>snmp</plugin>
<plugin>irqbalance</plugin>
<plugin>hwclock</plugin>
<plugin>ntp</plugin>
<plugin>autofs</plugin>
<plugin>sensors</plugin>
<plugin>postfix</plugin>
<plugin>xinetd</plugin>
<plugin>nrpe</plugin>
<plugin>slc5-sudo</plugin>
<plugin>slc5-irqbalance</plugin>
<plugin>slc5-cpuspeed</plugin>

> nedit /mj_addon/bootserver/bwm/templates/slc5x_32/sbin.xml
After

  <file>tune2fs</file>
Add:
      
  <file>udevsettle</file>

After

  </dir>
Add:
      
  <dir name="clients">
  </dir>

> cd ../scripts
> nedit Makefile
Add (after the "===Point 1====")

  # ======== PH/ESE ======================
  # ======== PH/ESE  >>>>>>>> GENERIC RULE !!!!
  page1-%:
          $(MAKE) bootimage PROJECT=$@

Select a root password:

  ROOT_PASSWORD="topsecret"

> nedit create_initrd.sh

Edit two lines to make them look ike this:

l#${sysroot}/sbin/mke2fs -b 1024 -m 2 -v $devname
/sbin/mke2fs -b 4096 -m 2 -v $devname

Edit the "fstab" file

> cd /mj_addon/bootserver/bwm/projects/atlas/etc
> nedit fstab
The last 3 lines have to look like that:
192.168.1.1:/client_installation_slc5_32/usr    /usr       nfs     ro,intr,soft,nolock     0 0
192.168.1.1:/sw                  /sw       nfs     ro,intr,soft,nolock     0 0
192.168.1.1:/clients                                  /clients     nfs     ro,intr,soft,nolock     0 0

> export PATH=/sbin:/usr/sbin:$PATH
> make page1-slc5_32

This should result in the file: /mj_addon/bootserver/bwm/scripts/bwm-page1-slc5_32.img
In case of problems have a look at the log file: less ../logs/bwm-page1-slc5_32.img.log

Check the content of the image

> mkdir /mnt/d1
> /mj_addon/bootserver/bwm/scripts/test_img.sh bwm-page1-slc5_32.img
> ls -l /mnt/d1

If you want to re-build an image:

> /mj_addon/bootserver/bwm/scripts/test_img.sh clean
> make page1-slc5_32

Convert the image to an .elf file (required for machines that use etherboot (bootp)):

> nedit makeelfslc5.sh
Change:
  bootimage=bwm-page1-slc5_32.img
Remove:
  #no PBA for oldest SBC with tiny memory
and all following lines

Fetch the Linux kernel:
> cp /boot/vmlinuz-2.6.18-128.1.10.el5 .
> mv vmlinuz-2.6.18-128.1.10.el5 kernel-slc5_32

Make the "elf" file
> ./makeelfslc5.sh

Create a NFS files system that will be exported to the client

> mkdir /client_installation_slc5_32
> mkdir /client_installation_slc5_32/usr
> cd /client_installation_slc5_32/usr
> cp -a /usr/* .
> cd lib
> mkdir modules
> cd modules
> cp -a /lib/modules .

> nedit /etc/exports
Add:

  /client_installation_slc5_32   192.168.1.99(ro,no_root_squash,sync)
  /clients/daq_area                  192.168.1.99(rw,no_root_squash,sync)
  /sw                                        192.168.1.99(rw,no_root_squash,sync)
Note: 192.168.1.99 may have to be changed. This IP range is for the private network of the page1 system

Start the NFS server
> chkconfig nfs on
> service nfs start

Configure the DHCP server

Copy this (with appropriate modifications) to /etc/dhcpd.conf:
ddns-update-style interim;
ignore client-updates;
not authoritative;
use-host-decl-names on;
ddns-update-style ad-hoc;
deny unknown-clients;

option broadcast-address     192.168.1.255;
option domain-name              "cern.ch";
option domain-name-servers 192.168.1.1;
option time-servers                192.168.1.1;

###################################################################
#   LISTEN ONLY ON THE INTERFACE SPECIFIED IN  /etc/sysconfig/dhcpd
###################################################################
server-name "pcphese08.cern.ch";
subnet 192.168.1.0 netmask 255.255.255.0 {

option routers                             192.168.1.1;
option subnet-mask                    255.255.255.0;
option domain-name-servers      137.138.16.5, 137.138.17.5;
option ntp-servers                       137.138.16.69, 137.138.17.69;
default-lease-time 21600;
max-lease-time 43200;
}
host pcepess26 {
        hardware ethernet 00:40:9e:00:44:50;
        fixed-address 192.168.1.99;
        filename "bwm-page1-slc5_32.elf";
        option domain-name-servers      127.0.0.1;
        option ntp-servers              127.0.0.1;
        }

Create the file "/etc/sysconfig/dhcpd" and fill it with:

# Command line options here
DHCPDARGS=eth0

Then:
> mkdir /tftpboot
> cd /tftpboot
> cp /mj_addon/bootserver/bwm/scripts/bwm-page1-slc5_32.elf .
Restart DHCP:
> cd /etc/init.d
> ./dhcpd stop
> ./dhcpd start

Set-up the post-boot area

> mkdir /clients
> mkdir /clients/daq_area/
> mkdir /clients/daq_area/bwm-post-boot
> cd /clients/daq_area/bwm-post-boot
> mkdir _pcepess
> cd _pcepess
Create the file pcepess26 (use the proper IP name, IP names of machines should have the form . E.g. pcepess26) and fill it with:
#!/bin/sh
cd /sw/add_on/script
./go_page1 start
echo drivers loaded and page1 program started

Prepare the S/W area

> mkdir /sw
> mkdir /sw/add_on
> mkdir /sw/add_on/script

For the page1 server is is now sufficient to create the file /sw/add_on/script/go_page1. It has to be derived from "drivers_tdaq".

If you are installing a generic server the next step is to populate the sw area. Use, if possible the same path definitions as for a TDAQ release
To start with mount the image (see above) and check how the logical links are set up. E.g.: > ls /mnt/d1/bin -la
Missing files for the /sw... file system can be copied across from any ATLAS netbooted system
In principle (most of) the files should also be in the CVS area (e.g. /mj_addon/bootserver/servertools/daq-drivers/scripts/atlas_tdaq_drivers) but they are not necessarily up to date.

For the drivers one has to create these directories (with appropriate content): > /clients/daq_area/daq-drivers/clients-configs
> /sw/tdaq/drivers/drivers-2.0.1

Installation of the ATLAS TDAQ S/W

For the page1 server not much is required (just the vme_rcc related files). For simplicity I have put everything into /sw/add_on

Installation of pcphese12

Raid

Before installing Linux enter the RAID BIOS (Alt+3) and check if the configuration of the disks is OK. The two HDs have to be in a single mirrored unit. After Linux has been installed it seems not to be necessary to install a RAID driver (SLC5 seems to have it included) but one has to install the 3ware Webserver:
  1. cd to /mj_addon/3ware
  2. Execute "./setupLinux_x86.bin -console"
  3. Run "system-config-securitylevel"
    1. Enable "Mail" and "secure WWW"
    2. Click on "Other ports" and add port 888
  4. Connect to the 3ware Web server
  5. Click on "3DM 2 Settings"
  6. Set "notify on" to "INFO"
  7. Enter "cernmx.cern.ch" as Mail Server. Login and password remain empty

NI (GPIB PMC)

  1. cd to /mj_addon/ni
  2. get a SLC5 compatible "NI-488" (e.g. NI-488.2-beta-2.5.1b1) package from NI or EN-ICE
  3. cd NI-488
  4. ./INSTALL
  5. edit /usr/local/natinst/ni4882/etc/gpib.ini
   
[GPIB0]
AUTOPOLL=Yes
BoardName=PMC-GPIB
BoardType=0x62

Add "pci=nobios" to the kernel command line as the GPIB-PMC causes a kernel panic.

  1. cd /client_homes/pcepess39/
  2. mkdir natinst
  3. cd natinst
  4. cp -a /usr/local/natinst .
  5. cd /client_installation_slc5_32/usr/local
  6. ln -s /clienthome/natinst natinst
  7. cd /client_homes/pcepess39/etc/init.d
  8. cp /usr/local/natinst/nipal/etc/init.d/gpibenumsvc .
  9. cp /usr/local/natinst/nipal/etc/init.d/nipal .
  10. cd /usr/local/natinst
  11. cp ./nipal/sbin/nipalsm /client_installation_slc5_32/usr/local/sbin
  12. mkdir /client_homes/pcepess39/etc/natinst/
  13. cd /client_homes/pcepess39/etc/natinst/
  14. cp -a /etc/natinst/* .
  15. cd /client_installation_slc5_32/usr/_lib/modules/2.6.18-164.11.1.el5/kernel
  16. cp -a /lib/modules/2.6.18-164.11.1.el5/kernel/natinst .
  17. rebuild the netboot image
  18. cd /client_homes/pcepess39
  19. nedit start_up
/etc/init.d
./nipal start
./gpibenumsvc start
  1. cd /client_installation_slc5_32/usr/local/lib
  2. cp -a /usr/local/lib/* .
  3. cd /client_installation_slc5_32/usr/local/include/
  4. cp -a /usr/local/include/* .

Old stuff:

   1. cd /client_homes/pcepess39
   1. mkdir ni
   1. cd ni
   1. mkdir etc
   1. cd etc/
   1. cp -a /etc/natinst/* .
   1. cd /client_installation_slc5_32/usr/local
   1. cp -a /usr/local/natinst .
   1. cd /client_installation_slc5_32/usr/local/natinst/nipal/etc/init.d
   1. cp nipal nipal_orig
   1. nedit nipal&
< nipalEtc=/etc/natinst/nipal
> nipalEtc=/usr/local/natinst/nipal/etc
< nikalDir=`cat /etc/natinst/nikal/nikal.dir`
> nikalDir=`cat /usr/local/natinst/nikal/etc/nikal.dir`
   1. cd /client_homes/pcepess39/etc/init.d
   1. cp /usr/local/natinst/nipal/etc/init.d/gpibenumsvc .
   1. cd /usr/local/natinst
   1. cp ./nipal/sbin/nipalsm /client_installation_slc5_32/usr/local/sbin
   1. mkdir /client_homes/lnxpool10/etc/natinst/
   1. mkdir /client_homes/lnxpool10/etc/natinst/nipal/
   1. mkdir /client_homes/lnxpool10/etc/natinst/nipal/services
   1. cp /etc/natinst/nipal/services/libgpibenumsvc.so.2.5.1 /client_homes/pcepess39/etc/natinst/nipal/services
   1. cd /client_installation_slc5_32/usr/_lib/modules/2.6.18-128.1.14.el5/kernel
   1. cp -a /lib/modules/2.6.18-128.1.14.el5/kernel/natinst .
   1. rebuild the netboot image
   1. cd /client_homes/pcepess39
   1. nedit start_up
/usr/local/natinst/nipal/etc/init.d
./nipal start
./gpibenumsvc start
   1. cd /client_installation_slc5_32/usr/local/lib
   1. cp -a /usr/local/lib/libgp* .
   1. cp -a /usr/local/lib/libnip* .
   1. cp /usr/local/include/ni488.h /client_installation_slc5_32/usr/local/include/

If you install a new kernel:

  1. look at: /mj_addon/ni/NI-488/NI-488.2-beta-2.5.1b1/README.txt

Sysadmin S/W

In contrast to the recipe described for the page1 server:
  • the sysadmin S/W is in /mj_addon/bootserver
  • the equivalent of "page1" is "phese"

Disable LDAP

> cd /mj_addon/bootserver/bwm/plugins
> cp -a ldap ldapdisabled
> cp -a ldap.xml ldapdisabled.xml
  • Edit "ldapdisabled.xml" so that it points to the "ldapdisabled" directory
  • Edit file "ldapdisabled/etc/nsswitch.conf": Eliminate all instances of "ldap" word
  • Edit file "ldapdisabled/etc/pam.d/system-auth": Remove all LINES which contain ldap (those with pam-ldap library)
  • Edit your project file (../projects/phese-slc5_32.xml): Replace plugin "ldap" with "ldapdisabled"

DHCP

IMPORTANT: A new DHCP server has to be registered with netops

For some reason after the initial installation of SLC5 the DHCP service was missing. Therefore: > yum install dhcp

We may need a (empty)leases file: > cd /var
> mkdir /var/state/
> mkdir /var/state/dhcp
> touch /var/state/dhcp/dhcpd.leases

The (simplified) dhcp.conf looks like this:

ddns-update-style interim;
ignore client-updates;
not authoritative;
use-host-decl-names on;
ddns-update-style ad-hoc;
deny unknown-clients;

#### PXE SPECIFIC OPTIONS ######
option space pxelinux;
option pxelinux.magic      code 208 = string;
option pxelinux.configfile code 209 = text;
option pxelinux.pathprefix code 210 = text;
option pxelinux.reboottime code 211 = unsigned integer 32;

site-option-space "pxelinux";
option pxelinux.magic f1:00:74:7e;
if exists dhcp-parameter-request-list
{
  # Always send the PXELINUX options (specified in hexadecimal)
  option dhcp-parameter-request-list = concat(option dhcp-parameter-request-list,d0,d1,d2,d3);
}

# These lines should be customized to your setup
option pxelinux.pathprefix "/tftpboot/";
option pxelinux.configfile = concat("../client_homes/",host-decl-name,"/",host-decl-name);
option pxelinux.reboottime 30;
#### END PXE SPECIFIC OPTIONS ######

#subnet 137.138.0.0 netmask 255.255.0.0

shared-network "Shared-137-138-0"
{
  subnet 10.10.0.0 netmask 255.255.0.0 { }

  subnet 137.138.0.0 netmask 255.255.0.0
  {
    option routers 137.138.1.1;
    option domain-name-servers 137.138.16.5, 137.138.17.5;
    option domain-name          "cern.ch";
    server-name                 "pcphese12.cern.ch";
    option subnet-mask          255.255.0.0;
    option ntp-servers          137.138.16.69, 137.138.17.69;
    default-lease-time 21600;
    max-lease-time 43200;

    host pcepess43
    {
      hardware ethernet 00:40:9e:00:75:bd;
      fixed-address     137.138.253.209;
      filename          "bwm-phese-slc5_32.elf";
    }

    host pcatd118
    {
      hardware ethernet 00:40:9E:00:85:ED;
      server-name       "pcphese12";
      next-server       137.138.190.92;
      fixed-address     137.138.111.20;
      filename          "/tftpboot/bwmpxelinux.0";
    }
  }
}

Start dhcp at boot time > cd /etc/rc5.d
> ln -s ../init.d/dhcpd S80dhcp

tftp

  1. Run "system-config-securitylevel"
  2. Click on "Other ports" and add port 69 for tcp and udp
  3. The file /etc/xintet.d/tftp has to look like that:
service tftp
{
        socket_type        = dgram
        protocol             = udp
        wait                    = yes
        user                    = root
        server                  = /usr/sbin/in.tftpd
        server_args             = -s ../
        disable                 = no
        per_source              = 11
        cps                     = 100 2
        flags                   = IPv4
}

NFS

The utility "system-config-securitylevel" allows opening up the firewall for the NFS4 service. The sysadmins, however, are still using NFS3 (for performance and other reasons). NFS3 has the disadvantage of using two static (111 & 2049) and one dynamic port. Even though it seems to be possible to convert the dynamic port into a static one dropping the firewall altogether seems to be the better (read simpler) solution.

AFS

Originally the server has the directory /usr/vice Create: /vice Copy files from /usr/vice to /vice Create a link from /usr/vice to /vice

support for PXE clients

get the file "bwmpxelinux.0" from Costin and copy it to /tftpboot
cp -a ./servertools/tftpboot/pxelinux.cfg /tftpboot/
cp /boot/vmlinuz-2.6.18-128.1.14.el5 /tftpboot/
cp /mj_addon/bootserver/bwm/scripts/bwm-phese-slc5_32.img  /tftpboot/

in /etc/dhcpd.conf use:
    filename "/tftpboot/bwmpxelinux.0";
See /etc/dhcpd.conf for additional instructions

Install the ATLAS S/W

Put the ATLAS drivers into the directory /sw/tdaq/drivers/drivers-2.0.2 The "vmetab" file will be kept in the home directory of the client (e.g. /client_homes/pcatd118 on the server). The other ATLAS files (libraries & applications) will be taken from AFS.

Loading the drivers

> mkdir /clients/daq_area/daq-drivers/clients-configs
populate directory with driver configuration files

Client specific home directories

Each client has to have a directory with its short IP name in /client_homes

Files in the client home directory

The home directory of a client has to contain at least two files: /client_homes//vmetab - This is obviously the vmetab /client_homes//start_up - This is a script that will be run at boot time

Modifications in the BWM area:

File: /mj_addon/bootserver/bwm/projects/atlas/etc/rc.d/rc.local Modification: remove these lines:
atlasloopdir="/var/atlas"
atlasloopowner=":1307"
atlasloopperm="1775"
echo "Set ownership for ${atlasloopdir} ..."
chown -v ${atlasloopowner} ${atlasloopdir}
echo "Set permissions on ${atlasloopdir} ..."
chmod -v ${atlasloopperm} ${atlasloopdir}

Adding drivers to the boot image

Edit the file: bwm/templates/slc5x_32/lib_modules.xml

Tricks for debugging

tftpd log file

> cd /etc/xinetd.d
> edit tftp
server_args = -v -v -v -s /tftpboot > service xinetd restart
> ps auxww | grep tft
Kill the tftpd server process > grep tft /var/log/messages

problems with /etc/bwm/dhcp_info

Boot SBC in single user mode (put e.g. a "S" into the files of the SBC in /tftpboot/pxelinux.conf for a VBP315; for a VP110 you have to edit /mj_addon/bootserver/bwm/scripts/makeelfslc5.sh and rebuild the system). Then check (lsmod) if the network driver is loaded and verify (ifconfig -a) if "ctrl0" is available.

TO DO

Once the system is usable check with "cvs status | grep Sta | grep -v Up" which files in the sysadmin CVS package have been modified and document the changes.


Major updates:
-- MarkusJoos - 22 Apr 2009
Edit | Attach | Watch | Print version | History: r35 < r34 < r33 < r32 < r31 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r35 - 2010-06-14 - MarkusJoos
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback