Hints on tuning a DPM cluster

Most of the times the common Linux distributions come with default settings that are not suitable for high performance data serving. On top of that, DPM itself has many parameters that the sysadmin must adjust, in order to improve the performance and the scalability of his setup.

Please note that these settings may have to be applied in different ways, depending on the configuration tools that are used in the site. Examples are YAIM, Quattor, puppet, ...

For efficiently running I/O transaction intensive servers we suggest to use only real hardware, especially at the head node.

Our experience with machines as small as 4cores/2-3GHz/16GB has always been positive. In the absence of other infrastructure-related bottlenecks (e.g. a slow DNS or router), these parameters are supposed to fulfil the more recent requests of the LHC experiments like CMS.

Please remember to restart all the daemons after the changes have been put into place. These daemons may include dpm, dpnsdaemon, mysql, memcached, httpd, xrootd.

Hint: cache the DNS queries

The DPM system must manage URLs containing hostnames, hence it produces a high volume of queries to the DNS system. These queries can be cached very effectively just by installing nscd and making sure it's active all the time.

NSCD must be installed in the head node and in ALL the disk servers.

> yum install nscd
> service nscd start

Hint: File descriptors and processes

The OS must be configured in order to allow DPM to keep open many files and many sockets.

The OS must be configured in order to allow DPM to start many processes and many threads..

Having these setting wrong can have a dramatic impact on the performance. We generally advise sysadmins to globally set the file descriptors and processes limits to 65000. This is done by making sure that the file /etc/security/limits.conf contains appropriate settings like in the following example:

[root@lxfsra04a04 log]# tail -n 5 /etc/security/limits.conf

* hard nofile 65000
* soft nproc 65000
* soft nofile 65000
* hard nproc 65000

These settings must be applied to the head node, all the disk servers and the Mysql/MariaDB node.

Hint: Use the DMLite memcache plugin (OBSOLETE with dmlite >= 1.10 and DOME setup)

The DMLite memcache plugin reduces the number of accessed to MySQL by a factor, hence making performance better and reducing the latency of each transaction. Its usage is highly recommended.

The package to be installed in the headnode is called dmlite-plugins-memcache

If you plan to run memcached in the head node, the memcached daemon must be installed in the head node.

Despite the configuration method chosen (Manual or puppet), one has to make sure that the parameters give sufficient resources for the memcache to be effective.

The memcached daemon (OBSOLETE with dmlite >= 1.10 and DOME setup)

The memcached daemon must have sufficient memory assigned to it. At least a few GB is generally sufficient.

Please make sure that the memcached daemon is up running, and eventually change its default autostart behaviour.

Please note that the default settings for memcached are ridiculously low. They must be changed, like in the following example:

[root@lxfsra04a04 log]# cat /etc/sysconfig/memcached 
PORT="11211"
USER="memcached"
MAXCONN="8192"
CACHESIZE="2048"
OPTIONS="-l 127.0.0.1 -U 11211 -t 4"

When a memcache server is deployed on a different host than the DPM head node, it should be configured to listen on an external interface (which may be 0.0.0.0), and then MUST be appropriately firewalled - e.g. using iptables - to only allow connections FROM the DPM head node towards port 11211/tcp.

All other connections to 11211/tcp must be rejected or discarded e.g. (in a minimalistic way) if your DPM head node is at 10.0.0.1, configure on the host running the memcached server:

-A RH-Firewall-1-INPUT -p tcp --dport 11211 -s 10.0.0.1/32 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp --dport 11211 -j REJECT

The DMLite memcached connection pool (OBSOLETE with dmlite >= 1.10 and DOME setup)

This setting is normally in the file /etc/dmlite.conf.d/zmemcache.conf The values that we suggest depend on the version of DMLite that is in use:

DMLite >= 0.7 needs a value of 250

MemcachedPoolSize 250

DMLite < 0.7 needs a value of 500

MemcachedPoolSize 500

Hint: Adjust the MySQL /MariaDB parameters

The MySQL daemon must be able to accept many connections.

We recommend this setting into /etc/my.cnf

max_connections = 1000

Increase the size of the mysql caches, and relax a bit the ACID compliance. The default values do not give a high transaction rate. Setting the flush method to O_DIRECT also should avoid overheads when flushing pages.

innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_buffer_pool_size=<half the RAM of the machine, in gigabytes, followed by G, e.g. 12G>
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=8
innodb_log_buffer_size = 8M

query_cache_limit=1M
query_cache_size=256M

# Suggested by Andrei Kyrianov to avoid fast clients being blacklisted
max_connect_errors = 4294967295

on C7, MariaDB is quite demanding in terms of resources

and as well on the mariadb unit file

systemctl edit mariadb

[Service]
LimitNOFILE=10240

and then restart the service

systemctl daemon-reload
systemctl restart mariadb

The same configuration can be performed via puppet, details are available at link

Hint: Adjust the DMLite MySQL parameters

The DMLite MySQL plugin needs an internal pool of MySQL connections that is large enough to feed the large number of threads that the modern frontends use.

We recommend this setting into /etc/dmlite.conf.d/mysql.conf

DMLite &gt= 0.7 needs a value of 128

NsPoolSize 128

DMLite &lt 0.7 needs a value of 256

NsPoolSize 256

Hint: Adjust the legacy DPM daemon threads

The sum of the "fast threads" and "slow threads" must always be the maximum allowed, that is 80. The config file is /etc/sysconfig/dpm The number of "fast threads" must be privileged, as in the following example:

# - Number of DPM fast threads :
NB_FTHREADS=60

# - Number of DPM slow threads :
NB_STHREADS=20

Hint: Adjust the legacy DPNS daemon threads

It is possible to set the number of DPNS threads beyond the default number. This can avoid that the dpns becomes a bottleneck for libraries and applications that still need it. The config file is /etc/sysconfig/dpnsdaemon

NB_THREADS=80

Hint: Adjust the legacy SRM2.2 daemon threads

We recommend setting the number of SRM2.2 threads to the maximum allowed. This avoids SRM to be a bottleneck for libraries and applications that still need it. The config file is /etc/sysconfig/srm2.2

NB_THREADS=99

The log level of the various daemons, in particular xrootd, must be low for production usage.

Hint: Doublecheck the DMLite log level

Setups that don't use our puppet templates may use the default logging level, which may impact the performance, by logging too much. The DMLite log level should be set to 1 (or even 0) in /etc/dmlite.conf

# Global Log level configuration higher means more verbose
LogLevel 1

Hint: Doublecheck the Xrootd log level

This must be checked for:

  • redirector in the head node (filename: /etc/xrootd/xrootd-dpmredir.cfg)
  • any federation redirector in the head node (filename: /etc/xrootd/xrootd-dpmfedredir-***.cfg)
  • all the disk nodes (filename: /etc/xrootd/xrootd-dpmdisk.cfg)

Make sure that the '*.trace' configuration directives are commented out, like in the following example:

 #ofs.trace all
 #xrd.trace all
 #cms.trace all
 #oss.trace all

Hint: Apache httpd threading model

The default threading model configured for Apache in the various Linux distros is a serious bottleneck to the DPM HTTP/WebDAV scalability and performance. Its weakness is particularly harmful in the case of metadata access (e.g. file deletion campaigns), yet it heavily affects also the data analysis case.

To make it able to sustain production-level load the Apache server has to be configured to use the 'event' MPM.

vim /etc/sysconfig/httpd
# set HTTPD=/usr/sbin/httpd.event

The file /etc/httpd/conf.d/mpm_event.conf (for puppet configuration you update modules/dmlite/templates/dav/mpm_event.conf) must contain the following parameters for configuring the event MPM. The idea behind these numbers is:

  • a few processes are necessary, to overcome the limitations of the Apache threading implementation
  • more than just a few processes will consume too much memory
  • the parameter MaxRequestsPerChild is set to a number that is sufficiently low to restart the daemon to overcome limitations of the implementation of session caching in libgridsite and sufficiently high not to cause a higher load
 # event MPM configuration /etc/httpd/conf.d/mpm_event.conf
 # StartServers: initial number of server processes to start
 # MinSpareThreads: minimum number of worker threads which are kept spare
 # MaxSpareThreads: maximum number of worker threads which are kept spare
 # ThreadsPerChild: constant number of worker threads in each server process
 # MaxClients: maximum number of simultaneous client connections
 # MaxRequestsPerChild: maximum number of requests a server process serves
 <IfModule mpm_event_module>
     StartServers          4
     ServerLimit          16
     MinSpareThreads       1
     MaxSpareThreads    1200
     ThreadLimit         300
     ThreadsPerChild     300
     MaxClients         1200
 <IfVersion >= 2.4>
     MaxRequestWorkers  4800
 </IfVersion>
 <IfVersion < 2.4>
     MaxClients         4800
 </IfVersion>
     MaxRequestsPerChild   100000
 </IfModule>

This will be the default configuration once DPM 1.13 become available.

Hint: Apache httpd modules clashing

We have seen cases where the default WebDAV modules of httpd cause malfunctioning when used with our module lcgdm-dav. The general hint is to comment them out in the file /etc/httpd.conf


# Comment these out
# LoadModule dav_module modules/mod_dav.so
# LoadModule dav_fs_module modules/mod_dav_fs.so

Hint: Configuring https on disk servers

The DPM team has historically discouraged using https on disk servers, in an attempt to minimise load and reduce the disruptive restarts required to keep crls up to date in apache. The situation has changed (2019) for three reasons

  • We believe we have an apache config for which renders restarts non-disruptive (thanks to Petr)
  • Modern CPUs can handle the crypto just fine
  • Future token auth will require an encrypted channel to the disk server in order to transmit credentials for third party copy

To use https on disk servers you need to configure apache appropriately (see the installation guide).

The apache config which allows graceful restarts to work is (typically in mpm_event.conf)

<IfModule mpm_event_module>
    StartServers          4
    ServerLimit          16
    MinSpareThreads       1
    MaxSpareThreads    1200
    ThreadLimit         300
    ThreadsPerChild     300
    MaxClients         1200
<IfVersion >= 2.4>
    MaxRequestWorkers  4800
</IfVersion>
<IfVersion < 2.4>
    MaxClients         4800
</IfVersion>
    MaxRequestsPerChild   100000
</IfModule>

Puppet will provide this config.

Configuring redirection to an https URL on the disk server is done with the NSSecureRedirect On directive in zlcgdm-dav.conf (or dmlite::dav::params::ns_secure_redirect: 'On' in puppet).

Note - even if https redirection is not configured, access is still subject to the correct authorisation as the head node embeds a token in the disk server URL which allows the client to be identified and authorised properly.

Configuring the secure redirect is not needed for davix clients which will always request an https URL on the disk when performing a third party copy, thus protecting any tokens which may be passed.

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2019-11-21 - OliverKeeble
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DPM All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback