Integrating OpenStack with the NetApp iSCSI Direct Driver

Configuration

By following the directions set out at:

http://docs.openstack.org/grizzly/openstack-block-storage/admin/content/netapp-iscsi-driver-direct-7mode.html

we insert the directives below into /etc/cinder/cinder.conf:

volume_driver=cinder.volume.drivers.netapp.iscsi.NetAppDirect7modeISCSIDriver
netapp_server_hostname=lxfssmXXXX.cern.ch
netapp_server_port=8088
netapp_login=username
netapp_password=password

where 'username' and 'password' are those used to access the NetApp server. Many other directives are available however they are not mandatory to use the CERN NetApp service.

To avoid any errors and warnings appearing when creating a NetApp volume, ensure that you have the following installed:

yum -y install python-suds sysfsutils

  • python-suds will introduce the error "no module named suds" within /var/log/cinder/volumes.log if not installed.
  • sysfsutils will introduce the warning "systools not installed" within /var/log/nova/compute.log if not installed. This is required to succesfully attach a volume to an instance.
Restart OpenStack (or at least the cinder-volume service) to pick up the changes above.

Troubleshooting

The configuration changes made above should typically allow OpenStack to use NetApp however in our experience this was not the case due to permission and iSCSI problems. The problems we encountered as well as solutions are found below.

User xxxxx does not have the capability to invoke API...

We encountered various API permission errors when trying to create a NetApp volume. For example, in the NetApp logs (FilerView > Filer > Syslog messages) and Cinder logs, we would respectively see:

"User xxxxx does not have capability to invoke API lun-list-info"

and

"WARNING [cinder.volume.drivers.netapp.iscsi] Error unmapping lun. Code :9016, Message:LUN is not mapped to this group"

This kind of permission error occured for many API calls; the calls we had to obtain access to were:

login-*,api-clone-start,api-clone-list-status,api-nfs-exportfs-storage-path,api-lun-*,api-volume-options-list-*,api-volume-list-*,api-igroup-*,api-iscsi-*,api-storage-shelf-list-info,cli-version

Ask the NetApp administrator to give you permission to these API calls.

ISCSI: Authentication failed for initiator...

This error appears within the NetApp logs (FilerView > Filer > Syslog messages) and relates to permissions on configuration files (e.g nova.conf). If a configuration file has been modified or restored ensure that the file has the correct permissions afterwards:

chown root:nova /etc/nova/nova.conf

This error usuallly coincides with the following in the service log that has an incorrect permissions on its configuration file:

abrt: detected unhandled Python exception in '/usr/bin/nova-compute'
abrtd: New client connected
abrtd: Directory 'pyhook-2013-07-25-13:39:27-7361' creation detected
abrt-server[7367]: Saved Python crash dump of pid 7361 to /var/spool/abrt/pyhook-2013-07-25-13:39:27-7361
abrtd: Package 'openstack-nova-compute' isn't signed with proper key
abrtd: 'post-create' on '/var/spool/abrt/pyhook-2013-07-25-13:39:27-7361' exited with 1
abrtd: Corrupted or bad directory '/var/spool/abrt/pyhook-2013-07-25-13:39:27-7361', deleting

iscsiadm: No session found

When OpenStack tries to discovery iSCSI sessions on the remote NetApp server, it may not find one when executing a command such as:

sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.135064716 -p 137.138.144.194:3260 --rescan

This problem can be solved in a number of ways:

  1. Firstly, check if iSCSI is enabled on the NetApp server! This can be verified by checking FilerView > LUNs > Enable/Disable on the NetApp server.
  2. This error can also appear when iSCISI is enabled and usually relates the the errow below, hence try its solution.

ISCSI: Authentication failed for initiator iqn.1994-05.com.redhat:c4a522cm2a

This error appears in the NetApp server's logs (FilerView > Filer > Syslog messages) and usually coincides with the error above.

The official NetApp documentation specifies that this as an authentication error when using CHAP:

This is because CHAP is not configured correctly for the specified initiator
Check CHAP settings
Inbound credentials on the storage system must match outbound credentials on the initiator
Outbound credentials on the storage system must match inbound credentials on the initiator
You cannot use the same user name and password for inbound and outbound settings on the storage system

However in cases when CHAP is not used, this solution obviously does not work. In our experience, we found that regardles of whether CHAP is enabled or not, this problem can be solved by simply removing the scanned and discovered targets within /var/lib/iscsi/nodes aloowing OpenStack to create a new iSCSI node record and connect.

Api clone-start requires license for flex_clone

This error appears when one takes a snapshot of a NetApp instance. As the error states, we have to obtain a license for flex_clone and enter its license code to FilerView > Filer > Manage Licenses.

We do not currently have the licesnse for flex_clone hence the snapshot functionality does not work.

LUN already mapped to this group

This error appears when one tries to mount a previously created volume to an instance. Simply create another volume and mount again.

iSCSI device not found at /dev/disk/by-path/...

This error occurs when one tries to migrate a virtual macine offline when a NetApp volume is attached. When the virtual machine shuts down, the path to the iSCSI device is removed and upon virtual machine restart, the path to the device cannot be found. A workaround could easily be implemeted to solve this problem (I think!).

Exceeded maximum rsh connections/number of administrative HTTP connections

During the course of using NetApp, we started receiving errors in the Cinder scheduler logs displaying:

NaApiError: NetApp api failed. Reason - 507:The maximum number of Administrative HTTP connections specified by httpd.admin.max_connections has been exceeded
WARNING [cinder.scheduler.host_manager] service is down or disabled

At this point, the volumes were still accessible however volume creation and mounting functionality did not work. Also the NetApp administration interface was not acccessible and displayed:

java.lang.OutOfMemoryError

Although the exact cause of these errors is still unknown, the problems started to appear when a disk failed and nightly NetApp snapshot operations could not be taken:

Snapshot operation failed: No space left on device

which eventually caused the following error:

Exceeded the maximum allowed rsh sessions: 24

The error above typically is seen when the NetApp Filer has more than 24 RSH connections concurrently in use. As an educated guess, it is then likely the nightly snapshots commands have hung and eventually exceeded this amount in turn crashing the Filer hence explaining why it was unavailable but the volumes were available.

To solve this problem, the simplest thing to do is to perform a reboot of NetApp.

Note that NetApp may perform much slower than normal after the reboot however performance will increase back to normal levels after a few hours.

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2013-08-14 - GaryMcGilvary
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback