Installation of EOS MGM for federated cloud
Test log
Tested on
CentOS 6.7
--
EygeneRyabinkin
Installation process
Base system and repositories
Standard software repositories:
EOS repository (users of Scientific Linux and its derivatives should use alternative files, see below):
cat << EOF > /etc/yum.repos.d/eos.repo
[eos-aquamarine]
name=EOS aquamarine, modern location
baseurl=https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/aquamarine/tag/el-$releasever/$basearch/
gpgcheck=0
enabled=1
priority=45
[eos-aquamarine-depends]
name=EOS aquamarine, dependencies
baseurl=https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/aquamarine-depend/el-$releasever-$basearch/
gpgcheck=0
enabled=1
priority=45
EOF
Scientific Linux and its derivatives have major.minor $releasever, so for these OS variants we should hardcode mainline version into repo files:
cat << EOF > /etc/yum.repos.d/eos.repo
[eos-aquamarine]
name=EOS aquamarine, modern location
baseurl=https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/aquamarine/tag/el-6/$basearch/
gpgcheck=0
enabled=1
priority=45
[eos-aquamarine-depends]
name=EOS aquamarine, dependencies
baseurl=https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/aquamarine-depend/el-6-$basearch/
gpgcheck=0
enabled=1
priority=45
EOF
Use yum-priorities plugin and make EOS repository priority higher than EPEL one (default priority is 99, lower number gives more priority).
Packages
Install packages:
yum install -y eos-server eos-client eos-nginx eos-fuse eos-test eos-apmon eos-cleanup jemalloc nscd
Authentication between MGM and FST
Install EOS keytab: to be done by central team. Keytab ownership/mode must be tweaked:
chmod 400 /etc/eos.keytab
chown daemon:daemon /etc/eos.keytab
Firewall
Firewall configuration:
- MGM allows incoming connections to the port 1094 from the world: it is the main client port for metadata and redirections
- MGM allows incoming connections to the ports 1096 and 1097 fromr all other MGMs and FSTs
EOS MGM configuration
MGM needs X.509 certificate since it does GSI authentication.
It also rungs ALICE token authentication.
So, we must install the needed packages (RDIG CA can be substituted with the whole lcg-CA package: it will install all IGTF trust roots):
yum install -y xrootd-alicetokenacc ca_RDIG
and put X.509 key and certificate to the proper place:
mkdir -p /etc/grid-security/daemon
chmod 600 /etc/grid-security/daemon/hostcert.pem
chown daemon:root /etc/grid-security/daemon/hostcert.pem
chmod 600 /etc/grid-security/daemon/hostkey.pem
chown daemon:root /etc/grid-security/daemon/hostkey.pem
Edit standard
SysV -init script configuration (here muon.grid.kiae.ru is the name of the MGM machine and we're running single-head configuration):
cat << EOF > /etc/sysconfig/eos
XRD_ROLES="mq sync mgm"
export EOS_MGM_ALIAS="muon.grid.kiae.ru"
export EOS_MGM_MASTER1="${EOS_MGM_ALIAS}"
export EOS_MGM_MASTER2="${EOS_MGM_ALIAS}"
export EOS_BROKER_URL="root://localhost:1097//eos"
export EOS_INSTANCE_NAME=eosalice
EOF
Create/edit MGM configuration file:
cat << EOF > /etc/xrd.cf.mgm
###########################################################
xrootd.fslib libXrdEosMgm.so
xrootd.seclib libXrdSec.so
xrootd.async off nosf
xrootd.chksum adler32
###########################################################
xrd.sched mint 8 maxt 256 idle 64
###########################################################
all.export /
all.role manager
###########################################################
oss.fdlimit 16384 32768
###########################################################
# UNIX authentication
sec.protocol unix
# SSS authentication
sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
# GSI authentication
sec.protocol gsi -crl:0 -cert:/etc/grid-security/daemon/hostcert.pem -key:/etc/grid-security/daemon/hostkey.pem -gridmap:/etc/grid-security/grid-mapfile -d:0 -gmapopt:2 -vomsat:1 -moninfo:1
###########################################################
sec.protbind localhost.localdomain unix sss
sec.protbind localhost unix sss
sec.protbind * only gsi sss unix
###########################################################
mgmofs.fs /
mgmofs.targetport 1095
mgmofs.authlib /usr/lib64/libXrdAliceTokenAcc.so
mgmofs.authorize 1
alicetokenacc.noauthzhost localhost
alicetokenacc.noauthzhost localhost.localdomain
alicetokenacc.truncateprefix /eos/alice/grid
###########################################################
#mgmofs.trace all debug
# this URL will be overwritten by EOS_BROKER_URL defined in /etc/sysconfig/eos
mgmofs.broker root://localhost:1097//eos/
# this name will be overwritten by EOS_INSTANCE_NAME defined in /etc/sysconfig/eos
mgmofs.instance eosdev
# configuration, namespace , transfer and authentication export directory
mgmofs.configdir /var/eos/config
mgmofs.metalog /var/eos/md
mgmofs.txdir /var/eos/tx
mgmofs.authdir /var/eos/auth
mgmofs.archivedir /var/eos/archive
# report store path
mgmofs.reportstorepath /var/eos/report
# this defines the default config to load
mgmofs.autoloadconfig default
# this enables that every change get's immediately stored to the active
# configuration - can be overwritten by EOS_AUTOSAVE_CONFIG defined in
# /etc/sysconfig/eos
mgmofs.autosaveconfig true
# this has to be defined if we have a failover configuration via alias -
# can be overwritten by EOS_MGM_ALIAS in /etc/sysconfig/eos
#mgmofs.alias eosdev.cern.ch
###########################################################
# Set the FST gateway host and port
mgmofs.fstgw someproxy.cern.ch:3001
###########################################################
EOF
Create mapfile for X.509 certificates of clients:
cat << EOF > /etc/grid-security/grid-mapfile
/C=RU/O=RDIG/OU=users/OU=spbu.ru/CN=Andrey Zarochentsev" eosuser
/C=RU/O=RDIG/OU=users/OU=pnpi.nw.ru/CN=Andrey Kiryanov" eosuser
/C=RU/O=RDIG/OU=users/OU=grid.kiae.ru/CN=Eygene A. Ryabinkin" eosuser
/C=RU/O=RDIG/OU=users/OU=grid.kiae.ru/CN=Igor Tkachenko" eosuser
EOF
and local Unix user(s) onto which we map external certificates:
groupadd -g 2016 eosuser
useradd -u 2016 -g 2016 eosuser
(Re)start EOS:
service eos restart
Turn on sss and gsi authentication:
eos -b vid enable sss
eos -b vid enable gsi
Create pool groups:
for i in $(seq 1 4); do eos -b group set default.$i on; done
The total number of created groups must be greater than the number of different filesystems with EOS data on any FST: groups are used to avoid putting file replicas into different filesystems on the same server (or it will render more than one replica unusable during single server outage).
Files are replicated within the single group, so such strategy will make replicas to sit on the different servers if we place EOS data filesystems on FST to the different groups.
One can later add new groups, so initially the number of groups can be chosen from characteristics of existing (or forthcoming in near future) FST machines.
Create default space:
eos -b space define default
eos -b space set default on
Create filesystem space for federated cloud and tune its ACL:
eos -b mkdir /eosfedcloud
eos -b chown eosuser:eosuser /eosfedcloud
Typical problems
Can't write anything, but have attached FSTs
If "space ls" shows non-zero "sum(capacity)", but "capacity(rw)" equals to zero,
EOS Console [root://localhost] |/> space ls
#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# type # name # groupsize # groupmod #N(fs) #N(fs-rw) #sum(usedbytes) #sum(capacity) #capacity(rw) #nom.capacity #quota #balancing # threshold # converter # ntx # active #intergroup
#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spaceview default 0 24 6 0 209.24 M 44.54 T 0 0 off off 20 off 2 0 off
we usually just need to enable default space (or space that has this problem) again:
eos -b space set default on
Starting from EOS 4.1 geotags became mandatory. EOS will refuse to write to FST with empty geotag.
Remarks
Debug setting
eos debug notice
Replica setting
Global setting:
eos space config default space.geobalancer=on
eos space config default space.geobalancer.ntx=10
eos space config default space.geobalancer.threshold=5
And for 1 lvl replica (for save files by geotags)
eos space config default space.geo.access.policy.write.exact=on
Show options:
~ ] eos space status default
# ------------------------------------------------------------------------------------
# Space Variables
# ....................................................................................
balancer := off
balancer.node.ntx := 2
balancer.node.rate := 25
balancer.threshold := 20
converter := off
converter.ntx := 2
drainer.node.ntx := 2
drainer.node.rate := 25
drainperiod := 86400
geo.access.policy.write.exact := on
geobalancer := on
geobalancer.ntx := 10
geobalancer.threshold := 5
geotagbalancer := off
geotagbalancer.ntx := 10
geotagbalancer.threshold := 5
graceperiod := 86400
groupbalancer := off
groupbalancer.ntx := 10
groupbalancer.threshold := 5
groupmod := 24
groupsize := 0
quota := off
scaninterval := 604800
IP list setting:
eos vid set geotag 85.143 MEPHI
Check IP list:
~] eos vid ls
geotag:"85.143" => "MEPHI"
gsi:"<pwd>":gid => root
gsi:"<pwd>":uid => root
sss:"<pwd>":gid => root
sss:"<pwd>":uid => root
sudoer => uids()
Catalog setting
Set standart replica setting:
eos attr -r set default=replica eos/fedcloud/zar/rep2
Check replica setting of catalog:
~] eos attr ls eos/fedcloud/zar/rep2
sys.forced.blockchecksum="crc32c"
sys.forced.blocksize="4k"
sys.forced.checksum="adler"
sys.forced.layout="replica"
sys.forced.nstripes="2"
sys.forced.space="default"
Set replica setting for 1 copy (for save data by geotag without replica):
eos attr ls eos/fedcloud/zar/rep1
eos attr set sys.forced.nstripes="1" eos/fedcloud/zar/rep1
--
EygeneRyabinkin - 2016-03-15