Integrating OpenStack with the NetApp iSCSI Direct Driver
Configuration
By following the directions set out at:
http://docs.openstack.org/grizzly/openstack-block-storage/admin/content/netapp-iscsi-driver-direct-7mode.html
we insert the directives below into
/etc/cinder/cinder.conf:
volume_driver=cinder.volume.drivers.netapp.iscsi.NetAppDirect7modeISCSIDriver
netapp_server_hostname=lxfssmXXXX.cern.ch
netapp_server_port=8088
netapp_login=username
netapp_password=password
where 'username' and 'password' are those used to access the
NetApp server. Many other directives are available however they are not mandatory to use the CERN
NetApp service.
To avoid any errors and warnings appearing when creating a
NetApp volume, ensure that you have the following installed:
yum -y install python-suds sysfsutils
- python-suds will introduce the error "no module named suds" within /var/log/cinder/volumes.log if not installed.
- sysfsutils will introduce the warning "systools not installed" within /var/log/nova/compute.log if not installed. This is required to succesfully attach a volume to an instance.
Restart
OpenStack (or at least the
cinder-volume service) to pick up the changes above.
Troubleshooting
The configuration changes made above should typically allow
OpenStack to use
NetApp however in our experience this was not the case due to permission and iSCSI problems. The problems we encountered as well as solutions are found below.
User xxxxx does not have the capability to invoke API...
We encountered various API permission errors when trying to create a
NetApp volume. For example, in the
NetApp logs (
FilerView > Filer > Syslog messages) and Cinder logs, we would respectively see:
"User xxxxx does not have capability to invoke API lun-list-info"
and
"WARNING [cinder.volume.drivers.netapp.iscsi] Error unmapping lun. Code :9016, Message:LUN is not mapped to this group"
This kind of permission error occured for many API calls; the calls we had to obtain access to were:
login-*,api-clone-start,api-clone-list-status,api-nfs-exportfs-storage-path,api-lun-*,api-volume-options-list-*,api-volume-list-*,api-igroup-*,api-iscsi-*,api-storage-shelf-list-info,cli-version
Ask the
NetApp administrator to give you permission to these API calls.
ISCSI: Authentication failed for initiator...
This error appears within the
NetApp logs (
FilerView > Filer > Syslog messages) and relates to permissions on configuration files (e.g nova.conf). If a configuration file has been modified or restored ensure that the file has the correct permissions afterwards:
chown root:nova /etc/nova/nova.conf
This error usuallly coincides with the following in the service log that has an incorrect permissions on its configuration file:
abrt: detected unhandled Python exception in '/usr/bin/nova-compute'
abrtd: New client connected
abrtd: Directory 'pyhook-2013-07-25-13:39:27-7361' creation detected
abrt-server[7367]: Saved Python crash dump of pid 7361 to /var/spool/abrt/pyhook-2013-07-25-13:39:27-7361
abrtd: Package 'openstack-nova-compute' isn't signed with proper key
abrtd: 'post-create' on '/var/spool/abrt/pyhook-2013-07-25-13:39:27-7361' exited with 1
abrtd: Corrupted or bad directory '/var/spool/abrt/pyhook-2013-07-25-13:39:27-7361', deleting
iscsiadm: No session found
When
OpenStack tries to discovery iSCSI sessions on the remote
NetApp server, it may not find one when executing a command such as:
sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.135064716 -p 137.138.144.194:3260 --rescan
This problem can be solved in a number of ways:
- Firstly, check if iSCSI is enabled on the NetApp server! This can be verified by checking FilerView > LUNs > Enable/Disable on the NetApp server.
- This error can also appear when iSCISI is enabled and usually relates the the errow below, hence try its solution.
ISCSI: Authentication failed for initiator iqn.1994-05.com.redhat:c4a522cm2a
This error appears in the
NetApp server's logs (
FilerView > Filer > Syslog messages) and usually coincides with the error above.
The official
NetApp documentation specifies that this as an authentication error when using CHAP:
This is because CHAP is not configured correctly for the specified initiator
Check CHAP settings
Inbound credentials on the storage system must match outbound credentials on the initiator
Outbound credentials on the storage system must match inbound credentials on the initiator
You cannot use the same user name and password for inbound and outbound settings on the storage system
However in cases when CHAP is not used, this solution obviously does not work. In our experience, we found that regardles of whether CHAP is enabled or not, this problem can be solved by simply removing the scanned and discovered targets within
/var/lib/iscsi/nodes aloowing
OpenStack to create a new iSCSI node record and connect.
Api clone-start requires license for flex_clone
This error appears when one takes a snapshot of a
NetApp instance. As the error states, we have to obtain a license for flex_clone and enter its license code to
FilerView > Filer > Manage Licenses.
We do not currently have the licesnse for flex_clone hence the snapshot functionality does not work.
LUN already mapped to this group
This error appears when one tries to mount a previously created volume to an instance. Simply create another volume and mount again.
iSCSI device not found at /dev/disk/by-path/...
This error occurs when one tries to migrate a virtual macine offline when a
NetApp volume is attached. When the virtual machine shuts down, the path to the iSCSI device is removed and upon virtual machine restart, the path to the device cannot be found. A workaround could easily be implemeted to solve this problem (I think!).
Exceeded maximum rsh connections/number of administrative HTTP connections
During the course of using
NetApp, we started receiving errors in the Cinder scheduler logs displaying:
NaApiError: NetApp api failed. Reason - 507:The maximum number of Administrative HTTP connections specified by httpd.admin.max_connections has been exceeded WARNING [cinder.scheduler.host_manager] service is down or disabled
At this point, the volumes were still accessible however volume creation and mounting functionality did not work. Also the
NetApp administration interface was not acccessible and displayed:
java.lang.OutOfMemoryError
Although the exact cause of these errors is still unknown, the problems started to appear when a disk failed and nightly
NetApp snapshot operations could not be taken:
Snapshot operation failed: No space left on device
which eventually caused the following error:
Exceeded the maximum allowed rsh sessions: 24
The error above typically is seen when the
NetApp Filer has more than 24 RSH connections concurrently in use. As an educated guess, it is then likely the nightly snapshots commands have hung and eventually exceeded this amount in turn crashing the Filer hence explaining why it was unavailable but the volumes were available.
To solve this problem, the simplest thing to do is to perform a reboot of
NetApp.
Note that
NetApp may perform much slower than normal after the reboot however performance will increase back to normal levels after a few hours.