LxCloud testing for CMS jobs

This page aims to collect the steps of the work done in order to test CMS private-montecarlo and analysis jobs in lxcloud.

Contextualization

Contexualization is a process helping in automatized configuration newly created virtual machines (VM). Since multiple machines will be created, it is crucial to make this process as automatic as possible.

Prerequisites: All tests were carried out in the LxCloud testing environment. It is advised to use euca-* toolset (create, terminate, display running instances and it's state as well as list available images), however pyBoto library can be used as well. If you want to user euca-* toolset you should export variables: EC2_URL, EC2_ACCESS_KEY, EC2_SECRET_KEY. You can find here an example of contextualization file.


During the tests following settings or software were used:

  • LxCloud envoronment - testing
  • CernVM images - cernvm-batch-node-2.5.3-6.1-slc5_x86_64_kvm
  • EC2_ALLOW_USER_DATA_SCRIPTS cloud variable set to 1 (allows executing scripts during contextualization phase)
Contextualization via CernVM parameters or executing custom scripts
According to CernVM documentation contextualization can be configured via specially crafted parameters or by passing shell script that will be executed with superuser (root) permissions. In the ideal situation, whole contextualization process should be carried out with CernVM parameters only, however it's not possible right now. In order to execute custom script on new VM just put the script content at the beggining of the contextualization file (see attached contextualization example).

Various contextualization options were tested, including:

Cvmfs auto mount

Using the distributes filesystem is very useful, so is automatic mounting of these. In the CMS experiment the most useful mountpoints are probably /cvmfs/cms.cern.ch and /cvmfs/grid.cern.ch
In order to have it auto mounted when new VM is created options must be passed to CernVM contextualization parameters:

[cernvm]
organisations = cms 
repositories = cms,grid

Using arbitraty squid server for caching data from cvmfs

All that needs to be done is setting variable in /etc/cvmfs/default.local

echo 'CVMFS_HTTP_PROXY=<squid endpoint>' >> /etc/cvmfs/default.local
/etc/init.d/cvmfs restartclean

Keep in mind that you need to acces data, so listing ls /cvmfs/ after the boot phase won't give any results, however typing cd /cvmfs/cms.cern.ch will cause cvmfs daemon to fetch the data.

System wide environment variables

According to the documentation the system wide environment variables should be set after using parameter:

[cernvm]
environment = VAR1=VAL, VAR2=VAL

However, this discussion reveals that the variables are being not exported. Right now they are only stored in the /etc/cernvm/site.conf file and somehow must be exported.
The best workaround so far is to setup all necesary system wide environment variables in the custom shell script that saves them in a custom file in /etc/profile.d/ directory, which results in sourcing them every time the new shell is executed.
An example of setting custom environment variables in a shell script:

#!/bin/bash
echo "export CMS_SITECONFIG=EC2" >> /etc/profile.d/myenv.sh
echo "export CMS_ROOT=/opt/cms" >> /etc/profile.d/myenv.sh


Sysem monitoring (ganglia)

It is desired for worker nodes to automatically install and configure ganglia gmond daemon.
Unfortunately ganglia opions in CernVM contextualization doesn't work at the moment (the ticket was submitted here, but no response so far). More detailed information about ganglia installation (both worker and monitoring nodes) in similar cloud environment can be found here. According to the ganglia documentation there are two ways of communication between gmond and gmetad processes (usually distributed among multiple VMs):

  • multicast
  • unicast
Since some cloud infrastructures don't support multicast configuration (Amazon EC2 for example) it is advised to setup unicast communication. To do so all gmond processes must be configured to send gmond data to one, specified (hence, hardcoded) worker node with gmond running on it.
The best way to setup gmond on a worker node is to use custom shell script that installs ganglia and sets the /etc/gmond.conf file.
#!/bin/bash
conary install ganglia
cat<<EOF>/etc/gmond.conf
<here you pass whole config file>
EOF

/etc/init.d/gmond start
/bin/kill -HUP `pidof gmond`



You should know IP address of the headnode (node that gathers information from other workernodes) and fix it in new gmond.conf configuration file:

udp_send_channel {
host = <headnode's IP address>
port = 8649
ttl = 1 # or whatever value you need
}


Troubleshooting: /bin/kill -HUP `pidof gmond` is used to reload gmond configuration. For some reason just starting gmond daemon doesn't work correctly and information about node is incorrect (uptime for example). HUPing the processes should help. If not, try putting sleep 1 after you start gmond and before you send HUP signal to gmond process.

Adding system users

If, for some reasons, new shell users are desired, they can be added in CernVM parameters:

[cernvm]
users = USER:GROUP:CLEARTEXTPASSWORD, USER1:GROUP1:CLEARTEXTPASSWORD1

Issues and problems

Contextualization process fails on the first error (it won't continue), and by default, no status of the contextualization is returned. Because of that, in case an error occured nobody will be aware of the problem ocured. It could be beneficial to reuse or create a small system for distributed commands running and checking whether VM contextualization was sucessful.

Running Virtual Machines with contextualization

In order to run new instance and add contextualization options you need to type:

$ euca-run-instances -t -f .

Example:

$ euca-run-instances -t m1.xlarge -f /srv/lxCloud/contextualization.sh ami-00000016

You can terminate instance:

$ euca-terminate-instances <instance id>

Example:

$ euca-terminate-instances i-1024

You can list your running instances after typing:

 
$ euca-describe-instances

MC

cmsRun execution by hand in VM

In order to run cmsRun you should have:

  • have /cvmfs/cms.cern.ch already mounted
  • extended you PATH with /cvmfs/cmd.cern.ch/bin value
It is advised to have /afs/ mountpoint mounted, however in case your VM doesn't have it mounted you can use scripts with 'noafs' option (remember, this is only a workaround, don't rely on this in a long term oprations).

In order to run examplous MC job just download attached tar archive and depending whether you don have /afs/ mounted jus run

$ ./run.sh (afs|noafs)

Please remember this is just an example so filenames are already hardcoded.

Running cmsRun jobs with custom squid proxy configuration

In order to run cmsRun command calls remote endpoints via custom proxy server you need to

  • create configuration file in a location: /SITECONF/local/JobConfig/site-local-config.xml (should be copied from /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml)
  • export environment variable CMS_PATH with value set to
  • prepare you site-local-config.xml file. It is very important to keep directory structure under directory command.

  • set your own directive in site-local-config.xml file.
    Hint: You can put multiple proxy values there, for example:

<site-local-config>
<site name="T1_CH_CERN">
   <event-data>
     <catalog url="trivialcatalog_file:/afs/cern.ch/cms/SITECONF/local/PhEDEx/storage.xml?protocol=rfio"/>
   </event-data>
   <local-stage-out>
     <command value="rfcp-CERN"/>
     <option value=""/>
     <catalog url="trivialcatalog_file:/afs/cern.ch/cms/SITECONF/local/PhEDEx/storage.xml?protocol=stageout"/>
     <se-name value="srm-cms.cern.ch"/>
   </local-stage-out>
   <calib-data>
     <frontier-connect>
       <proxy url="http://YOURPROXY:3128"/>
       <server url="http://frontier.server.probably.at.cern.ch:8000/FrontierInt"/>
       
     </frontier-connect>
     <catalog url=""/>
   </calib-data>
</site>
</site-local-config>

Analysis

Interfacing Glidein-WMS

HammerCloud based testing

Future Ideas

In case of error the contetualization phase will silently quiet and the VM will not be fully functional to do it's tasks. It might be a good idea to write the set of scripts for checking whether contextualization was succesfull. This may consist of checking whether all desired processes are running, checking the log files and so on.

-- MattiaCinquilli - 19-Sep-2012

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng CernVM-context-on-lxCloud.png r1 manage 125.9 K 2012-10-17 - 10:27 MattiaCinquilli Temporary table summarizing contextualization of a CernVM image on lxCloud@CERN
Unix shell scriptsh contextualizationwiki.clean.sh r1 manage 6.2 K 2012-09-26 - 11:43 MarekDenis  
Unknown file formatgz jobs.mod.tar.gz r1 manage 3.7 K 2012-09-26 - 10:23 MarekDenis  
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2012-10-17 - MattiaCinquilli
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback