Information System Troubleshooting Guide
Which BDII version?
This page is about BDII
v5 used on EMI.
Troubleshooting Steps
Before attempting to troubleshoot problems with the information system, it is important to have a general
overview
of the information system, in particular working knowledge of the BDII. Information flows from the
resource level BDII to the
top level BDII via the
site level BDII. For this reason a top down approach for troubleshooting is followed.
Identify the service where the problem is seen.
- Is the information in the top level BDII correct? If yes then there is usually no problem.
- Is the information is correct in the site level BDII but not correct in the top level BDII? If yes then the problem is probably with the top level BDII.
- Is the information is correct in the resource level BDII but not correct in the site level BDII? If yes then the problem is probably with the site level BDII.
Identify the component with the problem
- Is the information correct when the bdii-update script is executed with the same user as the BDII uses? If yes the the problem is with the BDII.
- The problem must be in one of the data sources, investigate each source in the ldif, provider, plugins directory to identify which has the problem.
Investigating the BDII
- Check the BDII log file for error messages.
- Change the BDII_LOG_LEVEL to DEBUG and check the log file.
- Check the files in the BDII_VAR_DIR directory to help locate the problem.
Common Problems
BDII fails to start
For gLite 3.2 on SL5-compatible installations this can happen due to SELinux settings.
One recourse could be to switch SELinux off. The following workaround can be used:
chcon --changes --reference=/var/lib/ldap/ -R /var/bdii/
semanage port -a -t ldap_port_t -p tcp 2170
Further details here:
If the BDII fails to start, this could also be an underlying problem with the LDAP database.
Try to start the
slapd server with the default
slapd.conf file.
/usr/sbin/slapd -f /etc/openldap/slapd.conf -d 255
If this fails, there is a problem with the LDAP installation. Note that this has been experienced when using virtual machines. To solve this problem online forums related to the LDAP and the OS distribution can be useful.
If the LDAP installation has been verified, the
slapd.conf file used by the BDII should be tested.
/usr/sbin/slapd -f /etc/bdii/bdii(-top)-slapd.conf -d 255
If this fails there is could problem with the BDII slapd.conf file.
Unable to initialize mutex Error
There were reports about the following error:
...
bdb_db_init: Initializing BDB database
bdb(o=grid): unable to initialize mutex: Function not implemented
bdb(o=grid): /opt/bdii/var/2171/__db.001: unable to initialize environment lock: Function not implemented
...
This issue may be fixed using the FAQ provided by Oracle :
http://www.oracle.com/technology/products/berkeley-db/faq/db_faq.html#12
Entry's missing in the BDII
If invalid LDIF is produced, then the entry will be rejected when it is being inserted in to the LDAP database. Rejected entries will be recorded in the BDII log file when logging is set to WARNING or higher.
Default values shown instead of dynamic values
The dynamic plugin might have a problem or there is a miss-match with the dn's. Check that the dn's produced by the dynamic plug-in are the same as in the static ldif file. The dynamic plugin should be executed with the same user as the BDII uses to spot permission problems. Run the following command to spot any errors
su ldap /usr/sbin/bdii-update -c /etc/bdii/bdii.conf > /dev/null
BDII started but no response from port 2170
Run
netstat -l
to see if the slapds ports are running on port 2170. These are ports that the LDAP servers are listening on.
tcp 0 0 localhost.localdomain:2170 *:*
LISTEN
The BDII is overloaded with queries
Due to the critical nature of the information system with respect to the operation of the grid, the BDII should be installed as a stand-alone service to ensure that problems with other services do not affect the BDII. In no circumstances should the BDII be co-hosted with a service which has the potential to generate a high load. If there are too many queries to a BDII and the load is too high, multiple instances of the BDII can be deployed high a dns load balanced BDII service behind a "round robin" dns alias. Detailed logging for slapd is availalbe by configuring the slapd syslog.
Change the
loglevel in the slapd.conf to 256
Add in /etc/syslog
local4.* /var/log/slapd.log
Restart the syslog syslog daemon.
service syslog restart
Restart the BDII
The log file can be parsed by this
script
which will generate a summary
BDB backend dies on memory allocation error
This issue has been seen on a virtual machine with limited memory.
slapd -f /opt/bdii/var/2171/bdii-slapd.conf -d 25
bdb_db_open: dbenv_open(/opt/bdii/var/2171)
bdb_db_open: dbenv_open(/opt/bdii/var/2171/infosys)
bdb(o=infosys): mmap: Cannot allocate memory
bdb(o=infosys): PANIC: Cannot allocate memory
bdb_db_open: dbenv_open failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30978)
backend_startup: bi_db_open(1) failed! (-30978)
slapd shutdown: initiated
====> bdb_cache_release_all
====> bdb_cache_release_all
bdb(o=infosys): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
slapd shutdown: freeing system resources.
bdb(o=infosys): txn_checkpoint interface requires an environment configured for the transaction subsystem
bdb_db_destroy: txn_checkpoint failed: Invalid argument (22)
The solution is to reduce the cache memory allocation specified in
/opt/bdii/etc/DB_CONFIG
set_cachesize N_GBytes N_Bytes N_segments