vNode troubleshooting
vNode general
where to find 'dead' image logs
To know if a machine hasn't start you can find out by checking the VM status in the web interface. If the machine didn't deploy correctly it will appear as 'failed'. Another case that can happen is the VM being stuck in 'deploying' for a long time, this most likely means a problem with the configuration of the dom0 or domU. In both cases you have to login to the dom0/domU and try to debug the problem by running xen commands or checking if the dom0 configuration was done correctly. (libvirtd is off, permissions, etc).
See:
https://twiki.cern.ch/twiki/bin/view/EGEE/VNodeXen
Possible Xen/vNode problems
Newly deployed virtual machine does not ping/Virtual machine does not ping
check that the init scripts are correctly
If they are, try this..
do xm console <domain of virtual machine> and check if the network interface started correctly.
try reboot with xm reboot <domain of virtual machine>.
If the machine was working before check if any newly deployed virtual machines broke the old ones.
To do this reboot or recreate the latest virtual machines with xm destroy <old domain> and then xm create -f /etc/xen/auto/vnode_<user>, also try
to 'balloon down' DOM0's memory reservation with xm mem-set <domain> <memory> because there is some issues with this on some older xen versions.
if nothing works, check the system logs for the domU/dom0 and debug! Something more serious with Xen maybe happening..
vNode does not deploy the machine correctly (VM status: Failed/Undefinitely in deploying)
- Check that the init scripts are correctly.
- Verify that the cgi scripts are pointing to the correct python path (>= 2.4)
- Verify that the http server is configured correctly (check httpd logs etc.)
If the machines is stuck in the deploying state, you have to remove it manually by editing the vnodevmstate.cfg set it back to not Deployed [i.e state=notDeployed, also remove all the other parameters]
debug! (see below for some useful tools)
SElinux
Note: there are issues when SElinux is enabled on the node - if destination machine is not deploying virtual node. (v. node does not change its state to "deploying" and you have alerts like:
[Tue Oct 2 05:06:07 2009] [error] [client 1.2.3.4] (13)Permission denied: access to /cgi/node/cgi-bin/main.py denied
in the error logfile of apache webserver), switching SElinux to permissive mode or policy generation might be required.
Output of audit2allow (can be used to generate the policy):
# audit2allow < /var/log/audit/audit.log
#============= dnsmasq_t ==============
allow dnsmasq_t virt_var_run_t:dir write;
#============= httpd_t ==============
allow httpd_t default_t:dir search;
allow httpd_t default_t:file { read getattr ioctl };
allow httpd_t device_t:blk_file { relabelfrom create setattr };
allow httpd_t device_t:dir { write remove_name create add_name };
allow httpd_t device_t:lnk_file { relabelfrom relabelto create unlink };
allow httpd_t etc_runtime_t:dir { write remove_name search add_name };
allow httpd_t etc_runtime_t:file { rename write link setattr create unlink append };
allow httpd_t etc_t:dir { write remove_name add_name };
allow httpd_t etc_t:file { rename write link setattr create unlink };
allow httpd_t file_t:blk_file { create setattr };
allow httpd_t file_t:chr_file { create setattr };
allow httpd_t file_t:dir { search setattr read create write getattr remove_name add_name };
allow httpd_t file_t:fifo_file { create setattr };
allow httpd_t file_t:file { write getattr link setattr create unlink };
allow httpd_t file_t:lnk_file { create setattr };
allow httpd_t fixed_disk_device_t:blk_file { write ioctl read relabelto unlink getattr };
allow httpd_t fs_t:filesystem { mount unmount };
allow httpd_t fsadm_exec_t:file { read getattr execute execute_no_trans };
allow httpd_t httpd_tmp_t:dir mounton;
allow httpd_t lvm_control_t:chr_file { read write getattr ioctl };
allow httpd_t lvm_etc_t:dir { getattr search };
allow httpd_t lvm_etc_t:file { read getattr };
allow httpd_t lvm_exec_t:file { read getattr execute execute_no_trans };
allow httpd_t lvm_lock_t:dir { write search read remove_name getattr add_name };
allow httpd_t lvm_lock_t:file { lock create unlink getattr };
allow httpd_t lvm_metadata_t:dir { write search read remove_name getattr add_name };
allow httpd_t lvm_metadata_t:file { rename read lock create write getattr link unlink append };
allow httpd_t mount_exec_t:file { read getattr execute execute_no_trans };
allow httpd_t proc_xen_t:dir { search getattr };
allow httpd_t proc_xen_t:file { read write ioctl };
allow httpd_t self:capability { sys_resource fsetid ipc_lock fowner mknod audit_write };
allow httpd_t self:netlink_audit_socket { write nlmsg_relay create read };
allow httpd_t self:process setrlimit;
allow httpd_t soundd_port_t:tcp_socket name_connect;
allow httpd_t usbfs_t:dir getattr;
allow httpd_t user_home_t:dir { search getattr };
allow httpd_t user_home_t:file { execute read lock getattr execute_no_trans write ioctl append };
allow httpd_t virt_var_run_t:dir search;
allow httpd_t virt_var_run_t:sock_file write;
allow httpd_t xend_t:unix_stream_socket connectto;
allow httpd_t xend_var_lib_t:dir { write remove_name search add_name };
allow httpd_t xend_var_lib_t:fifo_file { read write create unlink };
allow httpd_t xend_var_lib_t:file { write create };
allow httpd_t xend_var_lib_t:sock_file write;
allow httpd_t xend_var_run_t:dir search;
allow httpd_t xend_var_run_t:sock_file write;
allow httpd_t xenstored_t:unix_stream_socket connectto;
allow httpd_t xenstored_var_run_t:dir search;
allow httpd_t xenstored_var_run_t:sock_file { write getattr };
allow httpd_t xm_exec_t:file { read getattr execute ioctl execute_no_trans };
vNode server
Log files
-
[vnode installation dir]/vnode/logs/vnode-server.log
(usually /usr/local/vnode/logs/vnode-server.log
)
- httpd logs (
/var/log/httpd/
or /var/log/apache/
)
vNode node
Log files
-
[vnode installation dir]/vnode/logs/vnode-node.log
(usually /usr/local/vnode/logs/vnode-node.log
)
- httpd logs (
/var/log/httpd/
or /var/log/apache/
)
Networking problems, lots of "kernel: xen_net: Memory squeeze in netback driver." in logs/dmesg (on dom0).
If after starting large number (it is relative, large can be more that one...) of virtual machines you start having problems
with networking on some of them, check if output of
$ dmesg
contains errors as mentioned, if so it might be this problem.
It seems like a relatively known Xen issue (
http://support.neosurge.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=43
),
for us worked reconfiguring Xen memory:
- adding "dom0_mem=3072M" to grub (/boot/grub/grub.conf or grub/menu.lst) as kernel parameter
eg.
kernel /xen.gz-2.6.18-164.11.1.el5 dom0_mem=3072M
- setting memory parameters in /etc/xen/xend-config.sxp (dom0 min. memory the same(!) as in kernel parameter):
(dom0-min-mem 3072)
Note that setting (dom0-min-mem 0) does not work for more that one domU (despite suggestions on the web...).
vNode CLI
Connection problems
If vNodeCLI answers only with errors like
Reason: Unable to make request due to unexpected error socket.sslerror
or
Reason: Unable to make request due to unexpected error <class> 'ssl.SSLError'>
then most likely it means problem with your configuration. So please check your configuration file (
config.cfg
).
Check if you have edited the file and changed defaults to values valid for your use case. It should contain:
- correct information about your vNode server (hostname and port)
- valid paths to your certificate / key files (Please note that cannot specify it in shell style eg.
$HOME/mycert/mycert.pem
, because config file is NOT a shell script! The path has to be specified either as relative or (better) absolute path without any shell variables)
vNodeCLI asks for password
If vNodeCLI asks you for password each time (or even few times) while you execute it, it is actually request from
OpenSSL when it uses your certificate.
For now the only solution is to use key/certificate without the passphrase inside.
Helpful links:
Debugging tools
--
TomaszWolak - 07-Jun-2010