DPM DOME Upgrade Guide

Instructions for upgrading your DPM and activating DOME.

For this you will need to upgrade to the latest version of DPM (or 1.10.0 as a minimum, but latest stable version is always preferred).

Overview

A DPM upgrade will follow the following phases, each of which represents a fully working system which can be maintained indefinitely without proceeding to the next phase.

  • 1 "Legacy flavour" DPM - upgrade the rpms. Behaves like a minor upgrade from your current version.
  • 2 "Dome flavour" DPM - configuration and activation of Dome, creating Quota Tokens and enabling space reporting
  • 3 "Legacy-free Dome flavour" DPM. Remove the legacy software stack (SRM, rfio, dpm, dpns etc).

1 Upgrade your DPM "Legacy Flavour"

If you have any old metapackages, we recommend that you remove them before proceeding. If you are upgrading from 1.8.11, you should have no metapackages (they were removed for this release).

Install the new metapackages, thus upgrading the rpms, yum install dmlite-dpmhead or yum install dmlite-dpmdisk.

We recommend the latest version of the metapackages available from EPEL.

This behaves like a minor upgrade from your current version. Dome will be installed but will remain dormant.

Note - there are two other metapackages, dmlite-dpmhead-domeonly and dmlite-dpmdisk-domeonly which do not install the legacy components. They should not be used for an upgrade.

2 Upgrade to DPM "Dome Flavour"

No new packages need to be installed (see section above), this is a configuration action.

Note that DOME does not support the filesystem weights which were available in the dpm daemon.

Steps 1 and 2 below are preparatory - the system will still work as before. The main upgrade is Step 3. Step 4 can be carried out at any time.

Step 1 - Configure the dormant Dome

This pre-configures Dome but does not activate it. It will not affect the behaviour of the system until Dome is activated.

Puppet

Upgrade your puppet modules.

From this version of DPM you can install the modules directly from EPEL, with a new package:

dmlite-puppet-dpm

this package will install the needed modules under /usr/share/dmlite/puppet/modules/

in order to use those modules in a masterless setup the following command is needed

puppet apply  --modulepath /usr/share/dmlite/puppet/modules <your manifest>.pp 

or in case of Puppet Infrastructure the installation need to be performed on the Puppet Master.

You can stil also use puppetforge to download your modules, in particular:

  • lcgdm-dpm >= 0.6.0

From a Puppet perspective, the following changes need to be made to a Puppet manifest to configure Dome

Using puppet-dpm

Head+ disk node


  class{"dpm::head_disknode":
    ...
    configure_dome    => true,
    host_dn                 => 'your headnode host cert DN',
    ..
  }

Head node


  class{"dpm::headnode":
    ...
    configure_dome    => true,
    host_dn                 => 'your headnode host cert DN',
    ...
  }


Disk node

  class{"dpm::disknode":
   ...
    configure_dome    => true,
    host_dn                 => 'your disknode host cert DN',
    ..
  }

Please note that the token_password parameter MUST be now a string with more than 32 chars

Manual steps

The puppet configuration above corresponds to the manual actions described below.

Please see the note in the installation guide about the shared secret and check the consistency of your setup - https://twiki.cern.ch/twiki/bin/view/DPM/DpmSetupManualInstallation#The_shared_secret

Obligatory Manual Step - DB upgrade

On the head node;

/bin/sh /usr/share/dmlite/dbscripts/upgrade/DPM_upgrade_mysql <dbhost> <dbuser> <dbpass>

Enabling and configuring Dome on head and disk servers

Head Node

For the configuration, the file /etc/domehead.conf has to be added and configured as follows

glb.debug: 1
glb.role: head

glb.auth.urlprefix: /domehead/

glb.task.maxrunningtime: 3600
glb.task.purgetime: 3600
head.dirspacereportdepth: 6
head.put.minfreespace_mb: 1

glb.restclient.cli_certificate: /etc/grid-security/dpmmgr/dpmcert.pem
glb.restclient.cli_private_key: /etc/grid-security/dpmmgr/dpmkey.pem
glb.restclient.xrdhttpkey: <token password parameter>

head.filepulls.maxtotal: 1000
head.filepulls.maxpernode: 40
head.checksum.maxtotal: 1000
head.checksum.maxpernode: 40
head.filepuller.stathook: /usr/share/dmlite/filepull/externalstat_example.sh

head.db.host: <DBHOST>
head.db.user: <DBUSER>
head.db.password: <DBPASS>
head.db.port: <DBPORT>
head.db.poolsz: 128
head.db.cnsdbname: cns_db
head.db.dpmdbname: dpm_db

please modify the Xrootd redirector conf file /etc/xrootd/xrootd-dpmredir.cfg to add this section within the =if exec xrootd =part

xrd.protocol XrdHttp /usr/lib64/libXrdHttp-4.so
http.exthandler dome /usr/lib64/libdome.so /etc/domehead.conf
http.selfhttps2http yes
http.cert /etc/grid-security/dpmmgr/dpmcert.pem
http.key /etc/grid-security/dpmmgr/dpmkey.pem
http.cadir /etc/grid-security/certificates
http.secretkey  &lt;DOME key corresponding to glb.restclient.xrdhttpkey from /etc/domehead.conf&gt;

the xrootd service have to be restarted afterwards

# SL6 service xrootd restart
# C7 systemct restart xrootd@dpmredir

if the headnode is also a disknode please follow the Disk instructions as well


Disk Node

Then the the file /etc/domedisk.conf has to be added and configured as follows:

glb.debug: 1
glb.role: disk

glb.auth.urlprefix: /domedisk/
glb.task.maxrunningtime: 3600
glb.task.purgetime: 3600

glb.restclient.cli_certificate: /etc/grid-security/dpmmgr/dpmcert.pem
glb.restclient.cli_private_key: /etc/grid-security/dpmmgr/dpmkey.pem
glb.restclient.xrdhttpkey: <your token passowrd>

disk.headnode.domeurl: http://:1094/domehead disk.filepuller.pullhook: /usr/share/dmlite/filepull/externalpull_example.sh

please modify the Xrootd disk conf file /etc/xrootd/xrootd-dpmdisk.cfg adding inside = if exec xrootd= section

xrd.protocol XrdHttp /usr/lib64/libXrdHttp-4.so
http.exthandler dome /usr/lib64/libdome.so /etc/domedisk.conf
http.selfhttps2http yes
http.cert /etc/grid-security/dpmmgr/dpmcert.pem
http.key /etc/grid-security/dpmmgr/dpmkey.pem
http.cadir /etc/grid-security/certificates
http.secretkey  &lt;DOME key corresponding to glb.restclient.xrdhttpkey from /etc/domedisk.conf&gt;

the xrootd service have to be restarted afterwards

# SL6 service xrootd restart
# C7 systemctl restart xrootd@dpmdisk

Step 2 - Manual configuration of Quota Tokens

Quota Tokens should be configured before activating Dome. This can be done with Dome dormant. The QTs will have no effect until Dome is activated.

Convert your existing Space Tokens (dpm-listspaces) into Quota Tokens

dmlite-shell -e 'quotatokenmod <Token ID> path /dpm/cern.ch/home/atlas/ATLASDATADISK groups root,<comma_separated_list_of_existing_groups>'

If you have no existing space tokens, you can create new quota tokens in the following way.

dmlite-shell -e 'quotatokenset /dpm/cern.ch/home/atlas pool <value>   desc  <value>  size  <value>  groups  <value>'

NOTE THE FOLLOWING when configuring quota tokens for the first time:

  • By default DOME does not allow writing on quotatokens with less than 4GB free space, therefore when creating a new Quotatoken make sure to set at least 4GB space
  • Unlike the dpm daemon, Dome requires an explicit allocation of space wherever writing should be supported. This is done by allocating a QT at or above the path in question. You may need to create new QTs at top level user directories or at the top of the namespace.
  • The DPM daemon has been adapted so that tokenless writes using SRM will actually be assigned to the correct QT. If you support a VO which currently writes using SRM but without a Space Token (e.g. CMS), you should create a QT of the appropriate size (larger than the currently occupied space in the pool) and allocate this to the experiment area.
  • A QT has basic group-level access control. Make sure the relevant group can write to each QT (check with quotatokenget) and update if necessary (quotatokenmod <id> groups <VO>).
  • Dome does not support many-to-one mappings between directories and QTs. This means that in situations where a VO writes into multiple directories with the same SRM space token, you have a couple of alternative solutions;
    • Create separate QTs, one for each directory
    • Allocate a single QT higher up the hierarchy, to apply to all directories below which do not have their own QT

Step 3 - Enable Dome by configuring the Dome adapter

This operation should be done in scheduled downtime as for the system to function properly, all nodes must have Dome enabled.

Obligatory Manual Step - Priming counters

This should be done on the head node, immediately before activating Dome, but after you defined all quotatokens. Depending on your database size and underlying hardware this can take significant time (even hours for petabytes storage). Be aware that script used to update / synchronize DB counters must be executed while DPM is not running and only DPM database must be online.

  • stop DPM services on headnode with init scripts (e.g. SL6)
    service httpd stop; service rfiod stop; service srmv2.2 stop; service dpnsdaemon stop; service dpm stop; service dpm-gsiftp stop; service xrootd stop
  • stop DPM services on headnode with systemd (e.g. CentOS7)
    systemctl stop httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
  • fix spacetoken data and directory size counters up to the level defined in /etc/domehead.conf (default: 6 levels)
    dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log --updatedb

Stating with dmlite version 1.13.2 it is possible to apply these DPM DB updates with same command and DPM online - stopping DPM services is no longer mandatory:

dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log --updatedb

Unfortunately before you fully migrate to Dome ongoing transfers doesn't always update space occupancy counters correctly (e.g. in case spacetoken is not defined for given transfer request) and to fix these inconsistencies for DPM online migration it'll be necessary to execute same command once more after you have Dome enabled. To avoid second use of dmlite-mysql-dirspaces.py after Dome migration you can still follow procedure from first paragraph with DPM offline to avoid problematic updates from ongoing transfers.

To check directory size consistency and to estimate time necessary for real updates it is possible to run dmlite-mysql-dirspaces.py in dry-run mode (no updates)

dmlite-mysql-dirspaces.py --log-level=INFO --log-file=/var/log/dpm-dirspaces.log

Puppet

Upgrade your puppet templates.

From a Puppet perspective, the following changes need to be made to a Puppet manifest to activate Dome

puppet-dpm

Head+ disk node


  class{"dpm::head_disknode":
    ...
    configure_domeadapter => true,
   ...
  }

Head node

  class{"dpm::headnode":
    ...
    configure_domeadapter => true,
    ...
  }


Disk node


  class{"dpm::disknode":
   ...
    configure_domeadapter => true,
   ...
  }

Manual steps

The puppet configuration above corresponds to the manual actions described below.

Connecting the whole system to Dome via dmlite (using the domeadapter).

Head node

Configure /etc/dmlite.conf.d/domeadapter.conf

LoadPlugin plugin_domeadapter_io /usr/lib64/dmlite/plugin_domeadapter.so

LoadPlugin plugin_domeadapter_pools /usr/lib64/dmlite/plugin_domeadapter.so

LoadPlugin plugin_domeadapter_headcatalog /usr/lib64/dmlite/plugin_domeadapter.so


DavixCAPath  /etc/grid-security/certificates
DavixCertPath /etc/grid-security/dpmmgr/dpmcert.pem
DavixPrivateKeyPath /etc/grid-security/dpmmgr/dpmkey.pem

DomeHead http://<your headnode FQDN>:1094/domehead

# Token generation
TokenPassword <yuor token password>
TokenId ip
TokenLife 1000

ThisDomeAdapterDN <your headnone dn>

Zero the files /etc/dmlite.conf.d/adapter.conf , /etc/dmlite.conf.d/mysql.conf and /etc/dmlite.conf.d/zmemcache.conf You should zero them rather than removing them to prevent them from being reinstated by a subsequent yum update.


Disk node

Configure /etc/dmlite.conf.d/domeadapter.conf and /etc/dmlite-disk.conf.d/domeadapter.conf.

LoadPlugin plugin_domeadapter_diskcatalog /usr/lib64/dmlite/plugin_domeadapter.so

LoadPlugin plugin_domeadapter_io /usr/lib64/dmlite/plugin_domeadapter.so

LoadPlugin plugin_domeadapter_pools /usr/lib64/dmlite/plugin_domeadapter.so

DavixCAPath  /etc/grid-security/certificates
DavixCertPath /etc/grid-security/dpmmgr/dpmcert.pem
DavixPrivateKeyPath /etc/grid-security/dpmmgr/dpmkey.pem

DomeDisk http://<your disknode FQDN>:1095/domedisk
DomeHead http://<your headnode FQDN>:1094/domehead

# Token generation
TokenPassword <your token pass>
TokenId ip
TokenLife 1000

ThisDomeAdapterDN < your Disknode DN>

Zero the files /etc/dmlite.conf.d/adapter.conf and /etc/dmlite-disk.conf.d/adapter.conf You should zero them rather than removing them to prevent them from being reinstated by a subsequent yum update.

Restart the frontends (xrootd, httpd, gsiftp).

Enabling dome checksums

Configure startup options in /etc/sysconfig/dpm-gsiftp.

OPTIONS="-S -p 2811 -auth-level 0 -dsi dmlite:dome_checksum -disable-usage-stats"

Step 4 - Turn on gridftp redirection

For efficient SRM-free operation, gridftp redirection must be enabled.

Please note that if you have only one server in your installation with head node and disk installed together you don't have to enable gridftp redirection.

While this step can in principle be performed before enabling Dome, in practise we have seen problems with sites running a legacy DPM with gridftp redirection. We therefore recommend that this step be performed only after enabling Dome.

Note the following:

  • With Dome active and gridftp redirection enabled, clients can move from srm to gridftp transfers and space will be accounted correctly
    • this does not involve turning off srm, just encouraging clients to use gsiftp://... urls instead of srm:// urls.
  • Gridftp redirection does not permit the head node also to run as a disk server in case your intance has also other disknodes installed. If this is the case, please contact support before proceeding.
  • For technical reasons voms files installed / configured on all disknodes must be same (for metadata operations like `ls` gridftp client is redirected to the random disknode)

Detecting and dealing with older gridftp clients

Older gridftp clients such as uberftp, lcg-cp do not support redirection (globus-url-copy must be called with "-dp" argument) and thus they are assigned a random disk server for reading. As this is likely not to be where the data is, this will result in a tunelled read. If you experience high load on your system, this is a likely cause.

From version 1.13.0, once you have enabled redirection, you can check /var/log/dpm-gsiftp/gridftp.log to see a trace such clients which will produce a line such as

requesting node. DN: '/DC=...' lfn: NULL

where the critical pointer is lfn: NULL.

You should be able to use this DN information to contact the user or community in question. This guide has some information on moving to newer clients - https://twiki.cern.ch/twiki/bin/view/DPM/DpmSrmClients

Puppet

Using puppet-dpm

gridftp redirection must be enabled both on headnode and disknode at the same time

Head node

  class{"dpm::headnode":
   ...
    gridftp_redirect => true,
   ...
  }

Disk node

  class{"dpm::disknode":
   ...
    gridftp_redirect => true,
   ...
  }

Manual

Head Node

Edit /etc/gridftp.conf to include

remote_nodes disk01.domain.org:2811,disk02.domain.org:2811,...
epsv_ip 1

and restart dpm-gsiftp

Edit /etc/shift.conf to include

DPM   FTPHEAD head.domain.org

and restart both srmv2.2 and the dpm daemon.


Disk Nodes

Edit /etc/gridftp.conf to include

data_node 1

and restart dpm-gsiftp.

Things to check

Space reporting.

Check that space reporting via WebDAV is working.

$ curl -L --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u<uid> --cacert /tmp/x509up_u<uid> --request PROPFIND  -H "Depth: 0" -d @- https://domehead-trunk.cern.ch/dpm/cern.ch/home/atlas/ATLASTOKEN01 <<EOF
<?xml version="1.0" ?>
<D:propfind xmlns:D="DAV:">
<D:prop>
<D:quota-used-bytes/>
<D:quota-available-bytes/>
</D:prop>
</D:propfind>
EOF

Space reporting and Quota Token consistency

The space reporting via DAV above may give slightly different results to the equivalent via SRM. For example, the SRM equivalent to the above curl command is

gfal-xattr srm://headnode.domain.org:8446/srm/managerv2?SFN=/dpm/cern.ch/home/atlas spacetoken.description?ATLASTOKEN01

The numbers will align to the extent that the VO was respecting its policy of always writing a particular Space Token to a certain namespace directory. A discrepancy between the two numbers, if not large, probably represents some historical writes outside the associated directory. More important is to check if the numbers diverge over time - if the do, get in touch with DPM support for deeper investigation.

dpm-tester

This utility can be used to validate DPM DOME functionality for all supported protocols. This utility is available in the package dmlite-dpm-tester and the basic tests are executed via

dpm-tester.py --host "yourheadnode"
Run these tests from a remote node with valid dteam X509 certificate proxy to ensure results consistent with normal clients. It is possible to use this tool also with X509 proxy for any other VO supported by your storage, but you have to pass additional command line options to the dpm-tester.py. Also be aware that space accounting is by default done only for first 6 directory levels and that's why you must use top VO directory in case you manually specify test base path:
dpm-tester.py --host dpmhead-trunk.cern.ch --path /dpm/cern.ch/home/atlas

Be aware that gridftp redirection is not enabled by default (at least with DPM DOME 1.13.2) and this leads to suboptimal behavior of pure GriFTP transfers (or even test failures). You can select only tests for protocols that you plan to use in production, e.g.

dpm-tester.py --host dpmhead-trunk.cern.ch --path /dpm/cern.ch/home/atlas --tests davs root

3 Upgrade to DPM "Legacy-free Dome Flavour"

Once SRM load has reduced to zero, the full legacy stack can be removed from the system.

Puppet

Head

The legacy stack packages have to be removed manually

  • dpm-devel
  • dpm
  • dpm-python
  • dpm-rfio-server
  • dpm-server-mysql
  • dpm-srm-server-mysql
  • dpm-perl
  • dpm-name-server-mysql
  • dmlite-plugins-adapter
  • dmlite-plugins-mysql
  • dpm-copy-server-mysql

afterwards the puppet headnode manifest can be modified this way:


  class{"dpm::headnode":
    ...
    configure_legacy    => false,
  }

Disk

The following packages have to be manually removed :

  • dpm-devel
  • dpm
  • dpm-python
  • dpm-rfio-server
  • dpm-perl
  • dmlite-plugins-adapter

afterwards the puppet disknode manifest can be modified this way:


  class{"dpm::disknode":
    ...
    configure_legacy    => false,
  }

Manual

TO DO

-- AndreaManzi - 2018-03-08

Edit | Attach | Watch | Print version | History: r42 < r41 < r40 < r39 < r38 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r42 - 2019-12-22 - PetrVokac
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DPM All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback