Vac Configuration for LHCb

This page explains how to run LHCb virtual machines on Vac factory machines. Please see the Vac website for Vac's Admin Guide and man pages, which explain how to install and configure Vac itself and get a working Vac factory. These instructions are based on Vac 01.00.00 or above. This is needed for CernVM 3 and user_data template support in Vac.

Requirements

Before configuring Vac for LHCb, you need to follow these steps:

  • When you configure Vac, you need to choose a Vac space name. This will be used as the Computing Element (CE) name in LHCb DIRAC.
  • One or more CE's are grouped together to form a site, which will take the form VAC.Example.cc where Example is derived from your institutional name and cc is the country code. eg VAC.CERN.ch or VAC.Manchester.uk. Site names are allocated and registered in the Dirac configuration service by the LHCb ops team. If your site is already using gLite/EMI middleware like the CREAM CE for LHCb, it will probably have a site name like LCG.Example.cc and you would normally be allocated VAC.Example.cc for use with Vac.
  • Obtain a host certificate which the VMs can use as a client certificate to fetch work from the central LHCb task queue. One certificate can be used for all LHCb VMs at a site. You should normally use a name which is specific to LHCb but is part of your site's DNS space. It doesn't need to correspond to a real host or really exist as an entry on your DNS servers: just that you are entitled to register it. So if your site's domain name is example.cc then a certificate for lhcb-vm.example.cc with a DN like /C=CC/O=XYZ/CN=lhcb-vm.example.cc would be a good choice.
  • Place the hostcert.pem and hostkey.pem of the certificate in the lhcb (or similar) subdirectory of /var/lib/vac/machinetypes
  • Contact someone in the ops team ( andrew.mcnab AT cern.ch at the moment) to agree a site name and to register your CE, Site, and certificate DN in the central LHCb DIRAC configuration.
  • Create a volume group vac_volume_group which is big enough to hold one 40GB logical volume for each VM the factory machine will run at the same time.
  • Identify a squid HTTP caching proxy to use with cvmfs. If you already have a proxy set up for cvmfs on gLite/EMI worker nodes at your site then you can use that too. You may be able to run without a proxy, but SetupProject failures during LHCb job execution will be more likely.

Adding lhcb to vac.conf

The details of the vac.conf options are given in the vac.conf(5) man page. However, the lhcb (or similar) section should look like this, :

[machinetype lhcb]
user_data_option_cvmfs_proxy = http://squid-cache.example.cc:3128
user_data_file_hostcert = hostcert.pem
user_data_file_hostkey = hostkey.pem 
user_data = https://lhcb-portal-dirac.cern.ch/pilot/user_data
machine_model = cernvm3
root_image = https://lhcbproject.web.cern.ch/lhcbproject/Operations/VM/cernvm3.iso
rootpublickey = /root/.ssh/id_rsa.pub
backoff_seconds = 600 
fizzle_seconds = 600
max_wallclock_seconds = 172800
heartbeat_file = heartbeat
heartbeat_seconds = 600
accounting_fqan=/lhcb/Role=NULL/Capability=NULL

Vac will destroy the VM if it runs for more than max_wallclock_seconds and you may want to experiment with shorter values. Most modern machines should be able to run jobs comfortably within 24 hours (86400 seconds.)

If no work is available from the central LHCb task queue and a VM stops with 'Nothing to do', backoff_seconds determines how long Vac waits before trying to run an LHCb VM again. This waiting is co-ordinated between all factory machines in a space using Vac's UDP protocol.

You can omit the rootpublickey option, but it is extremely useful for debugging. See the Vac Admin Guide for more about how to set it up.

Vac re-reads its configuration files at every cycle (once a minute or so) and so the changes to vac.conf will take effect almost immediately. You should see Vac creating lhcb VMs in /var/log/vacd-factory and the VMs themselves attempting to contact the LHCb matcher to fetch work in the log files stored in subdirectories of /var/lib/vac/machines . These matches will fail until either there are some test jobs directed to your site (eg LHCb Test jobs) or your site is put in the production mask and can receive Monte Carlo jobs.

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2016-11-19 - AndrewMcNab
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback