Planning your Xrootd Hardware Deploy
As Xrootd is meant to be a perturbation of the total site storage usage, the hardware requirements are small compared to deploying a storage element.
- Host Count: A beginning site can start with a single node. However, we recommend the "final set" of hardware involve three nodes - two for load-balancing, and one for failover.
- Network connectivity: Each host will need to have public-Internet network connectivity (DNS, host certificate, and open ports. The network ports are specified in the installation documents). It will also need to be able to read from your storage element.
- Hardware requirements: Xrootd needs no significant amount of CPU, memory, disk, or network. Typically, sites will reuse old worker nodes. A node with 4 cores, 8GB RAM, 20GB disk, and 1Gbps connectivity will suffice.
NOTE: If your site hosts
>2 XRootD servers, consider
deployment of XRootD
site redirector which will subscribe upstream and all site servers will subscribe to site redirector instead. This eliminates number of managers (cmsd process) subscribed to the one manager in the hierarchy upstream. Please, refer to subscription details
here.
TIP: For large clusters with 65 and more data servers there is
supervisor mode available to configure, see document
here
.
Recommended tweaks
There are a couple of places you can tweak configuration in case you have evidence XRootD instance (
xrootd
,
cmsd
process) has trouble to serve client requests or limits otherwise functional behavior.
System (OS) level
Data servers with high demand might require some changes in system-wide settings to behave normally. If you encounter problems like
thread limit reached
(in
cmsd
process messages) or
Config maximum number of connections restricted to 65536
(in
xrootd
process messages), here is set of things to check and adjust accordingly:
$ cat /proc/`pidof xrootd`/limits
$ cat /proc/`pidof cmsd`/limits
max open files
is usually set to 65536, check if it's the case if not change, please. Also, don't limit
Max core file size
to 0 in order to get core file when XRootD crashes. Make sure you either set
core file size
to
unlimited
per process or system wide:
$ ulimit -c unlimited
For running process you can generate core file as follows:
$ gcore $(pidof cmsd)
$ gcore $(pidof xrootd)
Usually, when XRootD crashes,
core file
is created under
/var/spool/xrootd/
. If you get one, then report to
hn-cms-wanaccess@cernNOSPAMPLEASE.ch.
If you run stateless firewall it usually does connection tracking and with
conntrack
module loaded exhausts double amounts sockets when runnning XRootD. Please, check and set to recommended values (consider reasonable values depending on hardware resourses of the machine):
$ cat /proc/sys/net/netfilter/nf_conntrack_max
65536
$ cat /proc/sys/net/netfilter/nf_conntrack_buckets
16384
$ cat /proc/sys/net/ipv4/ip_local_port_range
1024 65535
$ cat /proc/sys/kernel/pid_max
131072
Out of couriosity, check
nf_conntrack_count
and
somaxconn
values to see utilization:
$ cat /proc/sys/net/netfilter/nf_conntrack_count
$ cat /proc/sys/net/core/somaxconn

TIP: Unless you use it,
Ganglia Monitoring System
or other piece of monitoring software is recommended to run in case of XRootD troubles it is easier to correlate systems resources usage to XRootD crashes etc.
Enable overcommit_memory:
$ sysctl vm.overcommit_memory
vm.overcommit_memory = 1

If disabled and machine is lack of virtual memory (or has other memory issues) system limits thread creation for the process hence leads often to crash of
cmsd
. When enabled thread creation will only fail when physical memory is fully occupied.
XRootD (configuration) level
XRootD consuming lot of memory
Look in system logs if you see any sign of messages like:
xrootd: page allocation failure: order:0, mode:0x20

This is usually case on EL7 systems and you might consider custom memory allocator instead of standard glibc, i.e. installing
jemalloc
(jemalloc rpm) or
tcmalloc
(gperftools-libs rpm). Then you need to add in your
/etc/sysconfig/xrootd
:
LD_PRELOAD=/usr/lib64/libjemalloc.so.1 # you may need to adjust the library path depending on the linux flavor
MALLOC_ARENA_MAX=4
Then on EL7 systems you might want to enable it in the systemd unit as well,
/etc/systemd/system/xrootd@clustered.service.d/override.conf
:
# systemd override file for xrootd@clustered
[Service]
EnvironmentFile=-/etc/sysconfig/xrootd
Intermittent SAM xrootd-access tests failures
Sometimes people see that
xrootd-access
test result marks site red in critical status while site admins believe SAM test file is present on their system and is reachable when they try manual
xrdcp
copy. Typical symptom of such failed SAM xrootd-access test is in the log file:
error '[ERROR] Server responded with an error: [3011] No servers are available to read the file.
Depending on your
cms.dfs
configuration of XRootD you may consider tweak your setup, especially if you are big site with more than 10 servers where is harder to monitor 100% data server availabilty. What people usually set:
cms.dfs lookup distrib mdhold 20m redirect immed

Meaning file lookup is broadcasted to all servers and redirection is done on first response received. In
redirect immed
mode, the site redirector picks a node at random without trying to see if the file exists. Hence, in this mode, it's important that 100% of data servers are working. Otherwise "file not found" from broken server might get cached and new requests return file missing even file is present on other site server.
If you're experiencing situation described above recommended change will be:
cms.dfs lookup central mdhold 20m redirect verify