EOL Warning 
:
Support for this software ends
in the summer 2024, consider
dCache migration (EGI provides help with migration till summer 2023).
DPM DOME Upgrade Guide
Instructions for upgrading your DPM and activating DOME.
For this you will need to upgrade to the latest version of DPM (or 1.10.0 as a minimum, but latest stable version is always preferred).
Overview
A DPM upgrade will follow the following phases, each of which represents a fully working system which can be maintained indefinitely without proceeding to the next phase.
- 1 "Legacy flavour" DPM - upgrade the rpms. Behaves like a minor upgrade from your current version.
- 2 "Dome flavour" DPM - configuration and activation of Dome, creating Quota Tokens and enabling space reporting
- 3 "Legacy-free Dome flavour" DPM. Remove the legacy software stack (SRM, rfio, dpm, dpns etc).
1 Upgrade your DPM "Legacy Flavour"
If you have any old metapackages, we recommend that you remove them before proceeding.
If you are upgrading from 1.8.11, you should have no metapackages (they were removed for this release).
Install the new metapackages, thus upgrading the rpms,
yum install dmlite-dpmhead
or
yum install dmlite-dpmdisk
.
We recommend the latest version of the metapackages available from EPEL.
This behaves like a minor upgrade from your current version. Dome will be installed but will remain dormant.
Note - there are two other metapackages,
dmlite-dpmhead-domeonly
and
dmlite-dpmdisk-domeonly
which do not install the legacy components. They should not be used for an upgrade.
2 Upgrade to DPM "Dome Flavour"
No new packages need to be installed (see section above), this is a configuration action.
Note that DOME does not support the filesystem weights which were available in the dpm daemon.
Steps 1 and 2 below are preparatory - the system will still work as before. The main upgrade is Step 3. Step 4 can be carried out at any time.
Step 1 - Configure the dormant Dome
This pre-configures Dome but does not activate it. It will not affect the behaviour of the system until Dome is activated.
Puppet
Upgrade your puppet modules.
From this version of DPM you can install the modules directly from EPEL, with a new package:
dmlite-puppet-dpm
this package will install the needed modules under /usr/share/dmlite/puppet/modules/
in order to use those modules in a masterless setup the following command is needed
puppet apply --modulepath /usr/share/dmlite/puppet/modules <your manifest>.pp
or in case of Puppet Infrastructure the installation need to be performed on the Puppet Master.
You can stil also use puppetforge to download your modules, in particular:
From a Puppet perspective, the following changes need to be made to a Puppet manifest to configure Dome
Using puppet-dpm
Head+ disk node
class{"dpm::head_disknode":
...
configure_dome => true,
host_dn => 'your headnode host cert DN',
..
}
Head node
class{"dpm::headnode":
...
configure_dome => true,
host_dn => 'your headnode host cert DN',
...
}
Disk node
class{"dpm::disknode":
...
configure_dome => true,
host_dn => 'your disknode host cert DN',
..
}
Please note that the token_password parameter MUST be now a string with more than 32 chars
Manual steps
The puppet configuration above corresponds to the manual actions described below.
Please see the note in the installation guide about the shared secret and check the consistency of your setup -
https://twiki.cern.ch/twiki/bin/view/DPM/DpmSetupManualInstallation#The_shared_secret
Obligatory Manual Step - DB upgrade
On the head node;
/bin/sh /usr/share/dmlite/dbscripts/upgrade/DPM_upgrade_mysql <dbhost> <dbuser> <dbpass>
Enabling and configuring Dome on head and disk servers
Head Node
For the configuration, the file
/etc/domehead.conf
has to be added and configured as follows
glb.debug: 1
glb.role: head
glb.auth.urlprefix: /domehead/
glb.task.maxrunningtime: 3600
glb.task.purgetime: 3600
head.dirspacereportdepth: 6
head.put.minfreespace_mb: 1
glb.restclient.cli_certificate: /etc/grid-security/dpmmgr/dpmcert.pem
glb.restclient.cli_private_key: /etc/grid-security/dpmmgr/dpmkey.pem
glb.restclient.xrdhttpkey: <token password parameter>
head.filepulls.maxtotal: 1000
head.filepulls.maxpernode: 40
head.checksum.maxtotal: 1000
head.checksum.maxpernode: 40
head.filepuller.stathook: /usr/share/dmlite/filepull/externalstat_example.sh
head.db.host: <DBHOST>
head.db.user: <DBUSER>
head.db.password: <DBPASS>
head.db.port: <DBPORT>
head.db.poolsz: 128
head.db.cnsdbname: cns_db
head.db.dpmdbname: dpm_db
please modify the Xrootd redirector conf file
/etc/xrootd/xrootd-dpmredir.cfg
to add this section within the =if exec xrootd =part
xrd.protocol XrdHttp /usr/lib64/libXrdHttp-4.so
http.exthandler dome /usr/lib64/libdome.so /etc/domehead.conf
http.selfhttps2http yes
http.cert /etc/grid-security/dpmmgr/dpmcert.pem
http.key /etc/grid-security/dpmmgr/dpmkey.pem
http.cadir /etc/grid-security/certificates
http.secretkey <DOME key corresponding to glb.restclient.xrdhttpkey from /etc/domehead.conf>
the xrootd service have to be restarted afterwards
# SL6 service xrootd restart
# C7 systemct restart xrootd@dpmredir
if the headnode is also a disknode please follow the Disk instructions as well
Disk Node
Then the the file
/etc/domedisk.conf
has to be added and configured as follows:
glb.debug: 1
glb.role: disk
glb.auth.urlprefix: /domedisk/
glb.task.maxrunningtime: 3600
glb.task.purgetime: 3600
glb.restclient.cli_certificate: /etc/grid-security/dpmmgr/dpmcert.pem
glb.restclient.cli_private_key: /etc/grid-security/dpmmgr/dpmkey.pem
glb.restclient.xrdhttpkey: <your token passowrd>
disk.headnode.domeurl:
http://
:1094/domehead
disk.filepuller.pullhook: /usr/share/dmlite/filepull/externalpull_example.sh
please modify the Xrootd disk conf file
/etc/xrootd/xrootd-dpmdisk.cfg
adding inside = if exec xrootd= section
xrd.protocol XrdHttp /usr/lib64/libXrdHttp-4.so
http.exthandler dome /usr/lib64/libdome.so /etc/domedisk.conf
http.selfhttps2http yes
http.cert /etc/grid-security/dpmmgr/dpmcert.pem
http.key /etc/grid-security/dpmmgr/dpmkey.pem
http.cadir /etc/grid-security/certificates
http.secretkey <DOME key corresponding to glb.restclient.xrdhttpkey from /etc/domedisk.conf>
the xrootd service have to be restarted afterwards
# SL6 service xrootd restart
# C7 systemctl restart xrootd@dpmdisk
Step 2 - Manual configuration of Quota Tokens
Quota Tokens should be configured
before activating Dome. This can be done with Dome dormant. The QTs will have no effect until Dome is activated.
Convert your existing Space Tokens (
dpm-listspaces
) into Quota Tokens
dmlite-shell -e 'quotatokenmod <Token ID> path /dpm/cern.ch/home/atlas/ATLASDATADISK groups root,<comma_separated_list_of_existing_groups>'
If you have no existing space tokens, you can create new quota tokens in the following way.
dmlite-shell -e 'quotatokenset /dpm/cern.ch/home/atlas pool <value> desc <value> size <value> groups <value>'
NOTE THE FOLLOWING when configuring quota tokens for the first time:
- By default DOME does not allow writing on quotatokens with less than 4GB free space, therefore when creating a new Quotatoken make sure to set at least 4GB space
- Unlike the dpm daemon, Dome requires an explicit allocation of space wherever writing should be supported. This is done by allocating a QT at or above the path in question. You may need to create new QTs at top level user directories or at the top of the namespace.
- The DPM daemon has been adapted so that tokenless writes using SRM will actually be assigned to the correct QT. If you support a VO which currently writes using SRM but without a Space Token (e.g. CMS), you should create a QT of the appropriate size (larger than the currently occupied space in the pool) and allocate this to the experiment area.
- A QT has basic group-level access control. Make sure the relevant group can write to each QT (check with
quotatokenget
) and update if necessary (quotatokenmod <id> groups <VO>
).
- Dome does not support many-to-one mappings between directories and QTs. This means that in situations where a VO writes into multiple directories with the same SRM space token, you have a couple of alternative solutions;
- Create separate QTs, one for each directory
- Allocate a single QT higher up the hierarchy, to apply to all directories below which do not have their own QT
Step 3 - Enable Dome by configuring the Dome adapter
This operation should be done in scheduled downtime as for the system to function properly, all nodes must have Dome enabled.
Obligatory Manual Step - Priming counters
This should be done on the head node, immediately before activating Dome, but after you defined all quotatokens. Depending on your database size and underlying hardware this can take significant time (even hours for petabytes storage). Be aware that script used to update / synchronize DB counters must be
executed while DPM is not running and only DPM database must be online.
- stop DPM services on headnode with init scripts (e.g. SL6)
service httpd stop; service rfiod stop; service srmv2.2 stop; service dpnsdaemon stop; service dpm stop; service dpm-gsiftp stop; service xrootd stop
- stop DPM services on headnode with systemd (e.g. CentOS7)
systemctl stop httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
- fix spacetoken data and directory size counters up to the level defined in
/etc/domehead.conf
(default: 6 levels) dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log --updatedb
Stating with
dmlite version 1.13.2 it is possible to apply these DPM DB updates with same command and
DPM online - stopping DPM services is no longer mandatory:
dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log --updatedb
Unfortunately before you fully migrate to Dome ongoing transfers doesn't always update space occupancy counters correctly (e.g. in case spacetoken is not defined for given transfer request) and to fix these inconsistencies for DPM online migration it'll be necessary to execute same command once more after you have Dome enabled. To avoid second use of
dmlite-mysql-dirspaces.py
after Dome migration you can still follow procedure from first paragraph with
DPM offline to avoid problematic updates from ongoing transfers.
To check directory size consistency and to estimate time necessary for real updates it is possible to run
dmlite-mysql-dirspaces.py
in dry-run mode (no updates)
dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log
Puppet
Upgrade your puppet templates.
From a Puppet perspective, the following changes need to be made to a Puppet manifest to activate Dome
puppet-dpm
Head+ disk node
class{"dpm::head_disknode":
...
configure_domeadapter => true,
...
}
Head node
class{"dpm::headnode":
...
configure_domeadapter => true,
...
}
Disk node
class{"dpm::disknode":
...
configure_domeadapter => true,
...
}
Manual steps
The puppet configuration above corresponds to the manual actions described below.
Connecting the whole system to Dome via dmlite (using the domeadapter).
Head node
Configure
/etc/dmlite.conf.d/domeadapter.conf
LoadPlugin plugin_domeadapter_io /usr/lib64/dmlite/plugin_domeadapter.so
LoadPlugin plugin_domeadapter_pools /usr/lib64/dmlite/plugin_domeadapter.so
LoadPlugin plugin_domeadapter_headcatalog /usr/lib64/dmlite/plugin_domeadapter.so
DavixCAPath /etc/grid-security/certificates
DavixCertPath /etc/grid-security/dpmmgr/dpmcert.pem
DavixPrivateKeyPath /etc/grid-security/dpmmgr/dpmkey.pem
DomeHead http://<your headnode FQDN>:1094/domehead
# Token generation
TokenPassword <yuor token password>
TokenId ip
TokenLife 1000
ThisDomeAdapterDN <your headnone dn>
Zero the files
/etc/dmlite.conf.d/adapter.conf
,
/etc/dmlite.conf.d/mysql.conf
and
/etc/dmlite.conf.d/zmemcache.conf
You should zero them rather than removing them to prevent them from being reinstated by a subsequent yum update.
Disk node
Configure
/etc/dmlite.conf.d/domeadapter.conf
and
/etc/dmlite-disk.conf.d/domeadapter.conf
.
LoadPlugin plugin_domeadapter_diskcatalog /usr/lib64/dmlite/plugin_domeadapter.so
LoadPlugin plugin_domeadapter_io /usr/lib64/dmlite/plugin_domeadapter.so
LoadPlugin plugin_domeadapter_pools /usr/lib64/dmlite/plugin_domeadapter.so
DavixCAPath /etc/grid-security/certificates
DavixCertPath /etc/grid-security/dpmmgr/dpmcert.pem
DavixPrivateKeyPath /etc/grid-security/dpmmgr/dpmkey.pem
DomeDisk http://<your disknode FQDN>:1095/domedisk
DomeHead http://<your headnode FQDN>:1094/domehead
# Token generation
TokenPassword <your token pass>
TokenId ip
TokenLife 1000
ThisDomeAdapterDN < your Disknode DN>
Zero the files
/etc/dmlite.conf.d/adapter.conf
and
/etc/dmlite-disk.conf.d/adapter.conf
You should zero them rather than removing them to prevent them from being reinstated by a subsequent yum update.
Restart the frontends (xrootd, httpd, gsiftp).
Enabling dome checksums
Configure startup options in
/etc/sysconfig/dpm-gsiftp
.
OPTIONS="-S -p 2811 -auth-level 0 -dsi dmlite:dome_checksum -disable-usage-stats"
Step 4 - Turn on gridftp redirection
For efficient SRM-free operation, gridftp redirection must be enabled.
Please note that if you have only one server in your installation with head node and disk installed together you don't have to enable gridftp redirection.
While this step can in principle be performed before enabling Dome, in practise we have seen problems with sites running a legacy DPM with gridftp redirection. We therefore recommend that this step be performed only after enabling Dome.
Note the following:
- With Dome active and gridftp redirection enabled, clients can move from srm to gridftp transfers and space will be accounted correctly
- this does not involve turning off srm, just encouraging clients to use gsiftp://... urls instead of srm:// urls.
- Gridftp redirection does not permit the head node also to run as a disk server in case your intance has also other disknodes installed. If this is the case, please contact support before proceeding.
- For technical reasons voms files installed / configured on all disknodes must be same (for metadata operations like `ls` gridftp client is redirected to the random disknode)
Detecting and dealing with older gridftp clients
Older gridftp clients such as uberftp, lcg-cp do not support redirection (globus-url-copy must be called with "-dp" argument) and thus they are assigned a random disk server for reading. As this is likely not to be where the data is, this will result in a tunelled read. If you experience high load on your system, this is a likely cause.
From version 1.13.0, once you have enabled redirection, you can check
/var/log/dpm-gsiftp/gridftp.log
to see a trace such clients which will produce a line such as
requesting node. DN: '/DC=...' lfn: NULL
where the critical pointer is
lfn: NULL
.
You should be able to use this DN information to contact the user or community in question. This guide has some information on moving to newer clients -
https://twiki.cern.ch/twiki/bin/view/DPM/DpmSrmClients
Puppet
Using puppet-dpm
gridftp redirection must be enabled both on headnode and disknode at the same time
Head node
class{"dpm::headnode":
...
gridftp_redirect => true,
...
}
Disk node
class{"dpm::disknode":
...
gridftp_redirect => true,
...
}
Manual
Head Node
Edit
/etc/gridftp.conf
to include
remote_nodes disk01.domain.org:2811,disk02.domain.org:2811,...
epsv_ip 1
and restart dpm-gsiftp
Edit
/etc/shift.conf
to include
DPM FTPHEAD head.domain.org
and restart both srmv2.2 and the dpm daemon.
Disk Nodes
Edit
/etc/gridftp.conf
to include
data_node 1
and restart dpm-gsiftp.
Things to check
Space reporting.
Check that space reporting via WebDAV is working.
$ curl -L --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u<uid> --cacert /tmp/x509up_u<uid> --request PROPFIND -H "Depth: 0" -d @- https://domehead-trunk.cern.ch/dpm/cern.ch/home/atlas/ATLASTOKEN01 <<EOF
<?xml version="1.0" ?>
<D:propfind xmlns:D="DAV:">
<D:prop>
<D:quota-used-bytes/>
<D:quota-available-bytes/>
</D:prop>
</D:propfind>
EOF
Space reporting and Quota Token consistency
The space reporting via DAV above may give slightly different results to the equivalent via SRM. For example, the SRM equivalent to the above curl command is
gfal-xattr srm://headnode.domain.org:8446/srm/managerv2?SFN=/dpm/cern.ch/home/atlas spacetoken.description?ATLASTOKEN01
The numbers will align to the extent that the VO was respecting its policy of always writing a particular Space Token to a certain namespace directory. A discrepancy between the two numbers, if not large, probably represents some historical writes outside the associated directory. More important is to check if the numbers diverge over time - if the do, get in touch with DPM support for deeper investigation.
dpm-tester
This utility can be used to validate DPM DOME functionality for all supported protocols. This utility is available in the package dmlite-dpm-tester and the basic tests are executed via
dpm-tester.py --host "yourheadnode"
Run these tests from a remote node with valid
dteam
X509 certificate proxy to ensure results consistent with normal clients. It is possible to use this tool also with X509 proxy for any other VO supported by your storage, but you have to pass additional command line options to the
dpm-tester.py
. Also be aware that space accounting is by default done only for first 6 directory levels and that's why you must use top VO directory in case you manually specify test base path:
dpm-tester.py --host dpmhead-trunk.cern.ch --path /dpm/cern.ch/home/atlas
Be aware that gridftp redirection is not enabled by default (at least with DPM DOME 1.13.2) and this leads to suboptimal behavior of pure
GriFTP transfers (or even test failures). You can select only tests for protocols that you plan to use in production, e.g.
dpm-tester.py --host dpmhead-trunk.cern.ch --path /dpm/cern.ch/home/atlas --tests davs root
3 Upgrade to DPM "Legacy-free Dome Flavour"
Once SRM load has reduced to zero, the full legacy stack can be removed from the system.
Puppet
Head
The legacy stack packages have to be removed manually
- dpm-devel
- dpm
- dpm-python
- dpm-rfio-server
- dpm-server-mysql
- dpm-srm-server-mysql
- dpm-perl
- dpm-name-server-mysql
- dmlite-plugins-adapter
- dmlite-plugins-mysql
- dpm-copy-server-mysql
afterwards the puppet headnode manifest can be modified this way:
class{"dpm::headnode":
...
configure_legacy => false,
}
Disk
The following packages have to be manually removed :
- dpm-devel
- dpm
- dpm-python
- dpm-rfio-server
- dpm-perl
- dmlite-plugins-adapter
afterwards the puppet disknode manifest can be modified this way:
class{"dpm::disknode":
...
configure_legacy => false,
}
Manual
TO DO
--
AndreaManzi - 2018-03-08