DPM to dCache Migration Guide

Overview

Both DPM and dCache storage implementations can use posix filesystem to store files. Because storage backend is same (and although directory layout differs) it is technically possible to migrate from DPM to dCache without transferring data. DPM and dCache use database (MySQL vs. PostgreSQL) to store file namespace (LFN catalogue) and that means migration can be done just by importing data from DPM in the dCache catalog. This process is roughly similar to legacy (SRM) DPM to DOME DPM migration, because we also completely replaced underlying software implementation and during migration it was necessary to do (minor) database updates. Migration from DPM to dCache is just slightly more complex, because database structure is completely different for DPM and dCache.

Since release 1.15.2 dmlite comes with dCache migration tools that provides support to translate DPM namespace and configuration. Migration tools are integrated with dmlite-shell, but migrate.py script can be also executed directly and e.g. running namespace export/import can be faster if you run this script on machine that hosts namespace database.

Whole migration can be done in three steps

  • make sure DPM is in consistent state and fix potential issues
  • dump DPM namespace and configuration
  • import namespace data in dCache database and distribute generated dCache configuration files
First step can be done with DPM online (no downtime required), but following steps must be done while DPM is offline not to introduce any changes in the stored data. To make migration process smooth and estimate necessary time you can try almost all steps (except final disabling DPM and enabling dCache) in advance and e.g. prepare user and group mapping which will most probably require some manual modifications of exported configuration.

In case you still use legacy DPM without quotatokens than there might be one additional step for sucessfull migration to dCache, please let us know if you plan to migrate away from legacy DPM.

If you find something that's not clear or you would like to be sure what's the best way to migrate your DPM than please don't hessitate to ask question either using general purpose dpm-users-forum mailing list or contact me directly using petr.vokac at cern.ch email.

Meetings & presentations

Migration status

GOCDB Name Done When Size Objects Dbck[h] Export[h] Import[h] Downtime[h] Comment
prague_cesnet_lcg2 DONE February 2022 0.6PB 2M       30 hours

First production site took quite a time (time consuming steps repeated -> documentation improved)

  • writetag & pushtag for WriteToken repeated
  • missing checksums after migration
    • downloads with file checksum verification failed
    • fixed manually and this step took ~ 3 days
    • improved dpm-dbck code
  • spacetoken file accounting issues
    • file transfers (upload / download) not affected
    • fixed manually after migration
    • improved migration code
  • all these remaining issues now fixed in dmlite-1.15.2-1
praguelcg2 DONE May 14 2022 4PB 47M 15 days 2.5 hours 11.5 hours < 24h

Preparatory steps that can be done with DPM online

  • lost and dark data detection (3 hours) / cleanup (1 hour)
    • 10410 dark files, 378 lost files
  • dbck namespace consistency cleanup (40 minutes)
  • move data to right pool / spacetoken (45 hours)
    • 1.3M files with total size 62TB transferred
  • missing ADLER32 checksum calculation (13 days)
    • 9.5M files with total size 533TB
  • these steps finished in April with dmlite-1.15.2-1
    • now waiting for mid of May site maintenance / downtime

Export DPM namespace

  • directly on db host (165 minutes) vs. from headnode (275 minutes)
  • MariaDB size 49GB, exported filesize 20GB

Generated dCache configuration files

  • issue in dcache.conf template fixed in dmlite-1.15.2-7

Import dCache namespace

  • PostgreSQL 14 with default configuration
  • import speed depends on single core performance
  • db host (Intel Xeon E5-2630 v4 @ 2.20GHz, SSD in RAID10)
    • import executed directly on db host (23.5 hours, 940IOPs)
      • !PostgreSQL export to file (4 minutes)
      • !PostgreSQL import from file (25 minutes)
    • import executed directly on db host with data on SHM (23 hours)
    • import executed from different headnode (44 hours, 650IOPs)
  • test machine (Intel i7-8700K CPU @ 3.70GHz)
    • import with local db on SHM (11.0 hours)
    • import with local db on NVMe (11.5 hours, 1480IOPs)
    • import with local db on HDD (23.5 hours, 180IOPs)
  • database size 49GB (database dump / backup 17GB)

Tuning

  • maximum per-pool 100 movers limit seems to be too low for ATLAS direct I/O use case where files are kept open for a long time (e.g. duration of job execution) and it is necessary to increase this limit, for details see internal dCache numbers displayed e.g. at http://se1.farm.particle.cz:2288/queueInfo

TW-NTU-HEP

(BelleII)

DONE start of May 2022              

TW-NTU-HEP

(CMS)

  end of May 2022             No quotatoken defined for VO data (site still relied exclusively on SRM for uploads). migrate.py from dmlite 1.15.2-4 was not able to import file metadata in the dCache unless we applied workaround by explicitly overwriting empty spacetoken using config.csv (e.g. by adding line path,,,,,existing_spacetoken_name. Next dmlite release should be more robust, but it is still necessary to have at least one quotatoken with root / path defined in the DPM before migration
RO-07-NIPNE DONE May 30-31 2022 3PB 17M 5 days 11 hours 14.5 hours  
  • dmlite 1.15.2-4 bugfix release necessary for lost&dark consistency checks (problems with reading stderr from remote machine using paramiko/ssh)
    • took us ~ 2 days to understand / fix / finish this particular step
  • dmlite 1.15.2-5 used for DPM dump (avoid lost db connection to dpm_db database after long cns_db dump)
  • slow namespace export with 2x Intel E5-2660 v3 & SATA HDD even with recommended MySQL tuning
  • fortunately PostgreSQL import not so much affected by SATA HDD and import speed was almost same as in praguelcg2 case
  • dCache services started and everything seems to work since June 1, 10:30am
  • problems with one big 1PB pool and default /etc/dcache/dcache.conf Java heap memory limit dcache.java.memory.heap set to =4096m=

Migration steps quick overview

This is just summary of steps to be done during DPM to dCache in-situ migration. Please read following chapters to understand details.

dpmheadnode$ dmlite-shell --log-level=INFO --log-file=/tmp/dpm-lost-and-dark.log -e 'dbck lost-and-dark-show script' > /tmp/dpm-lost-and-dark.sh
dpmheadnode$ sh /tmp/dpm-lost-and-dark.sh
dpmheadnode$ dmlite-shell --log-level=DEBUG --log-size=104857600 --log-file=/tmp/dpm-dbck.log -e 'dbck dpm-dbck update'
dpmheadnode$ dmlite-shell --log-level=DEBUG --log-size=104857600 --log-file=/tmp/dpm-dbck.pool-file.log -e 'dbck pool-file update nthreads=8'
dpmheadnode$ dmlite-shell --log-level=INFO --log-size=104857600 --log-file=/tmp/dpm-dbck.fill-checksum.log -e 'dbck fill-checksum update nthreads=25'
# declare downtime & stop DPM services
dpmheadnode$ systemctl stop httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
dpmheadnode$ systemctl disable httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
dpmdisknodes$ systemctl stop httpd rfiod dpm-gsiftp xrootd@dpmdisk
dpmdisknodes$ systemctl disable httpd rfiod dpm-gsiftp xrootd@dpmdisk
dpmdbnode$ python3 migrate.py --log-level=DEBUG --log-file=dpm-dump.log --dpm-export --dpm-dbhost=dpmdb.fqdn --dpm-dbuser=dpmdb_user --dpm-dbpasswd=dpmdb_secret
# generate dCache configuration files from config.csv created in previous step
dcachedbnode$ python3 migrate.py --log-level=DEBUG --log-file=migrate-dcache-config.log --dcache-config
# install dCache and its database & distribute dCache configs to /etc/dcache on all storage nodes
dcacheheadnode$ yum install -y https://www.dcache.org/downloads/1.9/repo/7.2/dcache-7.2.16-1.noarch.rpm
dcacheheadnode$ alternatives --set java $(alternatives --display java | grep 'family java-11-openjdk' | cut -d' ' -f1)
dcacheheadnode$ chown dcache /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem
dcacheheadnode$ mkdir /etc/systemd/system/dcache@.service.d
dcacheheadnode$ cat > /etc/systemd/system/dcache@.service.d/capabilities.conf <<EOF
[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE
EOF
dcacheheadnode$ ssh-keygen -C admin@localhost -t rsa -N '' -f id_rsa
dcacheheadnode$ cat /root/.ssh/id_rsa.pub > /etc/dcache/admin/authorized_keys2
dcacheheadnode$ scp dcache.conf gplazma.conf ban.conf ban.conf multi-mapfile.group multi-mapfile.user multi-mapfile.vo multi-mapfile.unmapped vo-group.json vo-user.json omnisession.conf LinkGroupAuthorization.conf dcacheheadnode:/etc/dcache
dcacheheadnode$ scp layout-HEADNODE_FQDN.conf dcacheheadnode:/etc/dcache/layout
dcachedisknodes$ yum install -y https://www.dcache.org/downloads/1.9/repo/7.2/dcache-7.2.16-1.noarch.rpm
dcachedisknodes$ alternatives --set java $(alternatives --display java | grep 'family java-11-openjdk' | cut -d' ' -f1)
dcachedisknodes$ chown dcache /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem
dcachedisknodes$ scp dcache.conf dcachedisknodes:/etc/dcache
dcachedisknodes$ scp layout-DISKNODE_FQDN.conf dcachedisknodes:/etc/dcache/layout
dcachedbnode$ yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
dcachedbnode$ yum install -y postgresql14-server
dcachedbnode$ /usr/pgsql-14/bin/postgresql-14-setup initdb
dcachedbnode$ cat > /var/lib/pgsql/14/data/pg_hba.conf <<EOF
# database on headnode
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
# database on dedicated dbnode
#host   chimera         dcache          192.0.2.123/32          md5
#host   spacemanager    dcache          192.0.2.123/32          md5
#host   pinmanager      dcache          192.0.2.123/32          md5
#host   srm             dcache          192.0.2.123/32          md5
EOF
dcachedbnode$ # enable connections to postgresql database from remote machines in case you use dedicated dbnode
dcachedbnode$ #perl -p -i -e "s/^#.*listen_addresses *= *'localhost'/listen_addresses = '*'/" /var/lib/pgsql/14/data/postgresql.conf
dcachedbnode$ systemctl enable postgresql-14
dcachedbnode$ systemctl start postgresql-14
dcachedbnode$ createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt --no-password dcache
dcachedbnode$ createdb -U dcache chimera
dcachedbnode$ createdb -U dcache spacemanager
dcachedbnode$ createdb -U dcache pinmanager
dcachedbnode$ createdb -U dcache srm
# start dCache on headnode + configure linkgroups and space reservations + stop dCache on headnode
dcacheheadnode$ systemctl daemon-reload
dcacheheadnode$ systemctl start dcache.target
dcacheheadnode$ sleep 300 # wait for dCache start
dcacheheadnode$ cat admin-cli.psu | grep -v ^# | ssh -p 22224 -l admin localhost
dcacheheadnode$ sleep 300 # wait till PSU configuration gets propagated between dCache services
dcacheheadnode$ cat admin-cli.reserve | grep -v ^# | ssh -p 22224 -l admin localhost
dcacheheadnode$ systemctl stop dcache.target
dcachedbnode$ python3 migrate.py --log-level=DEBUG --log-file=migrate-dcache-import.log --dcache-import
dcachedisknodes$ python migrate.py --log-level=INFO --log-file=migrate-dcache-link.log --link --link-file=data-dpmdisk1.example.com.csv
# start dCache services on all storage nodes
# cleanup physical files from DPM locations and keep just new "dcache" subdirectory

Ensuring DPM data consistency

Before starting with migration process it is necessary to check / fix inconsistencies found by dpm-dbck. This tool was integrated in dmlite-shell 1.15.x release and compared to old perl script from dpm-contrib-admintools package new python implementation allows user to run each test individually and provides improved logging. It should be also safe to run all dbck commands while DPM is still running (only *-offline commands must be executed with stopped DPM services, but we don't need them for migration consistency updates).

This dbck consistency updates can be done in four steps

  • lost-and-dark-show
  • dpm-dbck
  • pool-file
  • fill-checksum
It is recommended to create database backup before applying following updates.

Fixing lost and dark data

Inconsistencies between file records stored in database and list of existing files stored on each disknode can be found with dbck command lost-and-dark-show. Before executing this command it is necessary to configure password-less ssh login from DPM headnode to all disknodes (internally this script use paramiko ssh client for listing existing files and directories). Use either rsa or ecdsa keys, because they works with paramiko and ssh distributed by supported OSes, e.g. on your DPM headnode run

ssh-keygen -t rsa
ssh-copy-id root@disknode1.your.domain
ssh-copy-id root@disknode2.your.domain
...

By calling dmlite-shell with following parameters you'll get in the /tmp/dpm-lost-and-dark.sh script that can be used to remove inconsistent items:

dmlite-shell --log-level=INFO --log-file=/tmp/dpm-lost-and-dark.log -e 'dbck lost-and-dark-show script' > /tmp/dpm-lost-and-dark.sh

Generated script contains comment lines describing discovered issue and related details (comment line format: "DARK|LOST|LOSTNODIR, diskserver, filename, fileid, ..."):

  • DARK - lines with diskserver name and physical file location that doesn't have corresponding record in DPM database. This file can be removed, because it is unreachable by DPM.
  • LOST_FILE - lines with diskserver name, expected physical file location and fileid. Physical file doesn't exist and it'll be necessary to remove these file replicas from DPM catalogue.
  • LOST_NODIR - same as LOST_FILE but in addition parent directory on disknode doesn't exist (more suspicious, e.g. is filesystem mounted?)
  • LOST_DELETED - file (replica) status is marked as deleted which means it can be safely removed from DPM (this may happen while user remove file from RDONLY disnode filesystem).
  • LOST_UPLOADING - file transfer in progress for more than 24 hours, most probably serious issue occured during file upload that'll never finish and it is safe to cleanup this file from DPM.
  • LOST_RECENT - unfortunately detection of LOST files is not atomic operation and some recently created files may be marked with this status, by default cleanup command is commented out (don't remove these files from DPM)
You have to fix reported issues manually by executing dpm-lost-and-dark.sh script, because it may be dangerous to do automatic cleanup and it is up to you to validate output script content (e.g. it would be tricky to automatically detect accidentally unmounted filesystem on disknode and prevent cleanup of temporary "missing" files from DPM namespace).

Once you execute cleanup script

sh /tmp/dpm-lost-and-dark.sh

your DPM replica catalog should contain only records about files that really exists on your disknodes and also each file stored in the configured disknode filesystem will be represented by record in DPM namespace (file and replica catalog). In case dpm-lost-and-dark.sh script contains any LOST_FILE/LOST_NODIR/LOST_DELETED/LOST_UPLOADING entries it'll be necessary to finish dpm-dbck updates (described in next section) before your DPM database become consistent.

If you try to repeat steps described in this paragraph you may find few new DARK files, because DPM allows file deletion from database namespace even though disknode is in READONLY state (disknode filesystem is automatically marked READONLY while you run lost-and-dark-show). It is fine to have DARK files, because they'll be removed later after we finish all migration steps.

Fixing internal database inconsistencies

Internal database inconsistencies can be checked by running dbck. There is dpm-dbck sub-command that execute all necessary consistency checks and updates in the right order. All dbck actions by default runs in dry-run mode which just logs info about discovered issues and to fix them it is necessary to add update option

dmlite-shell --log-level=DEBUG --log-size=104857600 --log-file=/tmp/dpm-dbck.log -e 'dbck dpm-dbck update'

Be aware that DPM with huge number of inconsistencies can create quite big debug log file. In the dry-run mode dpm-dbck may not always report all issues correctly, because some consistency validations done later in a sequence of individual actions depends on issues to be fixed in previous steps. If you find huge number of issues and you would like to better understand how they are related you can run each step individually:

dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.namespace-continuity.dry-run.log -e 'dbck namespace-continuity'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.namespace-continuity.log -e 'dbck namespace-continuity update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.wrong-replica.dry-run.log -e 'dbck wrong-replica'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.wrong-replica.log -e 'dbck wrong-replica update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.stuck-replica.dry-run.log -e 'dbck stuck-replica'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.stuck-replica.log -e 'dbck stuck-replica update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.no-replica.dry-run.log -e 'dbck no-replica'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.no-replica.log -e 'dbck no-replica update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.replica-type.dry-run.log -e 'dbck replica-type'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.replica-type.log -e 'dbck replica-type update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.symlink.dry-run.log -e 'dbck symlink'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.symlink.log -e 'dbck symlink update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.nlink.dry-run.log -e 'dbck nlink'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.nlink.log -e 'dbck nlink update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.zero-dir.dry-run.log -e 'dbck zero-dir'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.zero-dir.log -e 'dbck zero-dir update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.spacetoken.dry-run.log -e 'dbck spacetoken'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.spacetoken.log -e 'dbck spacetoken update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.dir-size.dry-run.log -e 'dbck dir-size'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.dir-size.log -e 'dbck dir-size update'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.spacetoken-size.dry-run.log -e 'dbck spacetoken-size'
dmlite-shell --log-level=DEBUG --log-file=/tmp/dpm-dbck.spacetoken-size.log -e 'dbck spacetoken-size update'

Basic description of individual DPM consistency checks can be found in the help for dbck command

dmlite-shell -e 'help dbck'

Normally it should not be necessary to run individual dbck steps, because dbck dpm-dbck execute exactly same list of check/fixes. It may be just faster to run some tests manually in case you are sure all preceding steps already succeeded in the past. Be aware that changing order of these commands may not lead to fully consistent DPM database.

Correct pool for spacetoken

DPM dmlite 1.12.x and older came with an issue that affected disknode draining with dmlite-shell and this bug led to files stored in wrong pool with respect to their spacetoken. To fix this issue it's necessary to move these replicas to the right filesystem using

dmlite-shell --log-level=DEBUG --log-size=104857600 --log-file=/tmp/dpm-dbck.pool-file.log -e 'dbck pool-file update nthreads=8'

This command triggers normal draining process but only for files stored in the wrong pool. You can tune number of parallel transfers from processed disknode filesystem (setting this number too high can overload given filesystem with IOPs).

Calculate missing checksums

dCache expect that all (adler32) checksums are stored in the database after successful file transfer. DPM doesn't always stored checksum during file transfer and in many cases checkum is calculated and stored in the database after explicit checksum request (e.g. using gfal-sum). We can to fill missing ADLER32 checksums using dmlite-shell

dmlite-shell --log-level=INFO --log-size=104857600 --log-file=/tmp/dpm-dbck.fill-checksum.log -e 'dbck fill-checksum update nthreads=25'

This command might take days / weeks depending on number of files stored without checksum. Experiments that rely e.g. on Rucio framework for data management should already have most of the files stored with checksum, because adler32 is usually automatically checked during file uploads/downloads and third-party-copy (our storage still had 20% files without checksum - most probably old files stored few years ago with legacy DPM / SRM). VO that doesn't normally verify checksums can't by design trigger DPM checksum calculation and their files don't have checksum stored in DPM file / replica metadata catalog.

You should run this command several times to see if new files without checksum appeared since it was executed last time. Parameter nthreads tell how many filesystems should be checked in parallel. You can set nthreads as high as number of your filesystem, but with large DPM you should be careful not to overload headnode while running this script (script can be terminated and executed again at any time) also number of threads should not exceed number of available database connections.

Rough estimate of files without ADLER32 checksum and their sizes can be obtained by direct DPM database query

mysql cns_db -h dpmdb.example.com -u dpm -p -e "SELECT r.host, r.fs, COUNT(*) AS count, SUM(m.filesize) AS size FROM Cns_file_metadata m INNER JOIN Cns_file_replica r USING(fileid) WHERE m.filemode & 32768 = 32768 AND r.status = '-' AND m.status = '-' AND m.csumtype != 'AD' GROUP BY r.host, r.fs ORDER BY r.host, r.fs"

(be aware this direct database query doesn't return completely accurate numbers, because it rely on "default" DPM checksum which can be different than AD / ADLER32).

Generating DPM migration dumps

Migration tools were written and tested only with python3, but CentOS7 DPM scripts rely on python2. On CentOS7 it is necessary to install following dmlite-shell python3 dependencies

# CentOS7 python3 dependencies for migration tools (DPM export)
yum install -y python3 python36-dateutil python36-pycurl python36-m2crypto python36-mysql python36-paramiko python36-ldap3 python36-rpm python36-lxml
# CentOS7 python3 dependencies for migration tools (dCache import)
yum install -y python36-psycopg2

DPM migration dumps can be generated on any machine that can access DPM database (e.g. DPM headnode), but to reduce latencies caused by communication with database (significantly reduce script runtime) it may be useful to execute migrate.py directly on the database machine. This script have no dependency on dmlite-shell package so you can just copy migrate.py from headnode to database node without installing dmlite dependencies on the database machine.

# CentOS7 with installed dmlite-shell package
cp /usr/lib/python2.7/site-packages/dmliteshell/migrate.py .
# CentOS8 with installed dmlite-shell package
cp /usr/lib/python3.6/site-packages/dmliteshell/migrate.py .

Be aware that output files namespace.csv, config.csv and dpm-migrate.log can be quite big with size roughly same as all DPM database files (see df -h /var/lib/mysql). It is necessary to stop and disable DPM on headnode to avoid any future updates before starting migration process from DPM to dCache

# stop and disable DPM services first to avoid any updates
systemctl stop httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
systemctl disable httpd rfiod srmv2.2 dpnsdaemon dpm dpm-gsiftp xrootd@dpmredir
# export DPM namespace and configuration
python3 migrate.py --log-level=DEBUG --log-file=dpm-migrate.log --dpm-export --dpm-dbhost=dpmdb.fqdn --dpm-dbuser=dpmdb_user --dpm-dbpasswd=dpmdb_secret

If you would like to just test this step you can execute this command without stopping/disabling DPM. This can be useful to estimate time or see content of config.csv, because this file can be used to prepare customized user/group mapping or force modification in path owner/group/ACLs according VO data access requirements.

This dump may take hours and very rough estimate is 1 hour per each 20M records (see mysql cns_db -h dpmdb.example.com -u dpm -p -e 'SELECT COUNT(*) FROM Cns_file_metadata').

You should also stop and disable DPM services on all disknodes, because they'll be replaced by dCache

# stop DPM services on each disknode
systemctl stop httpd rfiod dpm-gsiftp xrootd@dpmdisk
systemctl disable httpd rfiod dpm-gsiftp xrootd@dpmdisk

Importing data in dCache

Installing dCache

Install dCache 7.2 ( recent version) on DPM headnode and all disknodes

# install most recent dCache 7.2 release
yum install -y https://www.dcache.org/downloads/1.9/repo/7.2/dcache-7.2.16-1.noarch.rpm

DPM configured by puppet automatically install Java 1.8.0, but this release is not compatible with dCache 7.2.x and with old Java it will not be possible to start dCache services. Installed dCache RPM package brings Java 11 as a dependency, but it is not configured as default Java interpreter. Using alternatives it is possible to change preferred java to point to the new installed OpenJDK 11.

# configure preferred java to OpenJDK 11
alternatives --set java $(alternatives --display java | grep 'family java-11-openjdk' | cut -d' ' -f1)

It is safe to change preferred java to OpenJDK 11 at any time, because DPM doesn't rely on java which comes only as an unnecessary package dependency.

dCache provide powerful shell that'll be used later during migration process. Now we can configure direct dCache shell access using headnode ssh keys

# configure dCache admin shell
ssh-keygen -C admin@localhost -t rsa -N '' -f id_rsa
cat /root/.ssh/id_rsa.pub > /etc/dcache/admin/authorized_keys2

Once you start dCache headnode services it'll be possible to access dCache administration shell using ssh -p 22224 -l admin localhost. Be aware that dCache SSH server implementation takes into account also comment / last column in the authorized_keys2 file and at least name must match username used while logging to dCache admin shell (in our case admin).

PostgreSQL database backend for dCache can be installed directly on existing DPM headnode or dedicated database machine, e.g. for CentOS7 and PostgreSQL 14

# install PostgreSQL 14
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql14-server
# enable PostgreSQL for dCache local passord-less access
/usr/pgsql-14/bin/postgresql-14-setup initdb
cat > /var/lib/pgsql/14/data/pg_hba.conf <<EOF
# database on headnode
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
# database on dedicated dbnode
#host   chimera         dcache          192.0.2.123/32          md5
#host   spacemanager    dcache          192.0.2.123/32          md5
#host   pinmanager      dcache          192.0.2.123/32          md5
#host   srm             dcache          192.0.2.123/32          md5
EOF
# enable connections to postgresql database from remote machines in case you use dedicated dbnode
#perl -p -i -e "s/^#.*listen_addresses *= *'localhost'/listen_addresses = '*'/" /var/lib/pgsql/14/data/postgresql.conf
systemctl enable postgresql-14
systemctl start postgresql-14
# create PostgreSQL databases used by dCache
createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt --no-password dcache
createdb -U dcache chimera
createdb -U dcache spacemanager
createdb -U dcache pinmanager
createdb -U dcache srm

CentOS8 or CentOS9 comes with sufficiently recent PostgreSQL database and you could avoid using external postgresql RPM repository. These distributions comes with AppStream modules that allows you to select between different PostgreSQL versions and install more recent version, e.g.

# install PostgreSQL 13 on CentOS8
dnf module list postgresql
dnf module enable postgresql:13
dnf install postgresql-server
# enable PostgreSQL for dCache local passord-less access
/usr/bin/postgresql-setup --initdb
cat > /var/lib/pgsql/data/pg_hba.conf <<EOF
# database on headnode
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
# database on dedicated dbnode
#host   chimera         dcache          192.0.2.123/32          md5
#host   spacemanager    dcache          192.0.2.123/32          md5
#host   pinmanager      dcache          192.0.2.123/32          md5
#host   srm             dcache          192.0.2.123/32          md5
EOF
# enable connections to postgresql database from remote machines in case you use dedicated dbnode
#perl -p -i -e "s/^#.*listen_addresses *= *'localhost'/listen_addresses = '*'/" /var/lib/pgsql/data/postgresql.conf
systemctl enable postgresql
systemctl start postgresql
# create PostgreSQL databases used by dCache
createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt --no-password dcache
createdb -U dcache chimera
createdb -U dcache spacemanager
createdb -U dcache pinmanager
createdb -U dcache srm

Installing dCache on head/disk nodes and PostgreSQL on database machine can be done even before you start Generating DPM migration dumps.

Generate dCache configuration

All dCache configuration files for head/disk nodes can be generated from config.csv file using

python3 migrate.py --log-level=DEBUG --log-file=migrate-dcache-config.log --dcache-config

If you did DPM export directly on (dedicated) database machnine than it'll be necessary to update first config.csv line starting with "headnode" and replace parameter after comma with your full headnode hostname. If you made any manual modifications to the config.csv than it'll be neccessary to run command mentioned above to propagate your changes in generated dCache configuartion files.

Although config.csv produced during DPM dump contains all necessary informations to create compatible dCache configuration it might be useful apply few modifications for more generic dCache configuraton files and these updates are described in following subsections.

User / group account mapping

Internally dCache use uid and gid numbers and dCache configuration files provide means to map storage client identity to these numbers. It is up to storage administrator to come with mapping that is suitable for their users / VOs. It is not usually necessary to map each individual client identity (e.g. X509 subject) to the different dCache user, because for VO it is often sufficient to provide less granular access permissions (e.g. CMS).

Migration configuration file config.csv should be customized before importing data to dCache to get more general and simple dCache identity mapping. Lines starting with user, group and group2uid can be used to modify identity mapping to provide customized uid, gid and username used during dCache namespace import and also for generated dCache configuration files. Command described dCache configuration produce new config.csv.default which is enriched with information used for dCache uid, gid and username. This enriched config file can be useful for your own modifications in the config.csv. Identity mapping configuration lines have following structure

  • user,DPM_username,dCache_username,dCache_uid - it is possible to choose your own username and uid used internally by dCache for a client that comes with X.509 proxy certificate (certificate subject distinguished name). You can use default values from config.csv.default but if you plan to use dCache as posix filesystem mounted with NFS than these mapped identifiers should match your unix user account data. It is possible to map several DPM_usernames to the same username and uid.
  • group,DPM_groupname,simplified_groupname,dCache_gid - it is necessary to map X.509 VOMS FQAN (Fully Qualified Attributes Names / VO Groups and Roles) to the dCache gid. Simplified_groupname is not used by dCache, but migration tools use this identifier in group2uid and path configuration.
  • group2uid,simplified_groupname,dCache_username,dCache_uid - our dCache configuration by default use primary group called here simplified_groupname as username and dCache_gid as gid for unmapped client user identity (no explicit mapping of client certificate subject). This configuration line can be used to overwrite default behavior and map primary group to the customized username and uid. This configuration option makes identity mapping configuration more flexible, but in most cases you should be fine without these special configuration lines.

DPM automatically create new internal user account with X.509 certificate subject for each authorized client (see dmlite-shell -e 'userinfo') and also one group for each VOMS FQAN (see dmlite-shell -e 'groupinfo'). Configuration file config.csv generated by migration tools contain only users and groups with existing objects in the DPM namespace (directory, file, link). If your storage doesn't provide space for local users it may be sufficient to map all VO users to single / few dCache uid and gid numbers. dCache configuration generated by DPM migration tools use primary group as uid for all clients without explicit X.509 subject mapping and that means it is not necessary to create explicit mapping for each individual user.

Example: ATLAS, Belle and CMS you could just remove from config.csv all user lines with certificate subjects that belongs to these VOs (DPM dump log file dpm-dump.log contain information which subjects are used together with each VO group and role) and keep just group mapping, e.g.

# ...
group,/atlas,atlas,2000
group,/atlas/Role=pilot,atlas_pilot,2001
group,/atlas/Role=production,atlas_production,2002
group,/atlas/Role=production,atlas_lcgadmin,2003
group,/atlas/cz,atlas_cz,2010
# ...
group,/belle,belle,3000
group,/belle/Role=production,belle_production,3002
group,/belle/Role=lcgadmin,belle_lcgadmin,3003
# ...
group,/cms,cmsusr001,4000
group,/cms/Role=priorityuser,cmsana001,4001
group,/cms/Role=production,cmsprd001,4002
group,/cms/Role=hiproduction,cmsprd001,4002
group,/cms/dcms/Role=cmsphedex,cmsprd001,4002
# ...

Migration script during dCache namespace import automatically use primary group as username in case X.509 subject was removed from config.csv file. If you plan to provide NFS access your dCache internal uid / gid numbers should match real unix uid / gid. In this case you should provide right user / group name and uid / gid in the config.csv as third and fourth parameter on each line that starts with user or group.

Overwriting user / group / permission for subdirectories

During dCache namespace import config.csv lines starting with path will be used to overwrite directory / file / link permission. This may be useful to change or fix owner, permissions, ACLs or spacetoken and ensure that all subdirectories managed by VO data management tools have same permissions. Configuration line starts with path prefix followed by directory,dCache_username,simplified_groupname,mode,ace_list,spacetoken. All parameters except first path are optional or may be empty in which case original value will be used for all subdirectories that doesn't provide their own path line.

Example: for ATLAS and Belle we could add following lines in the config.csv

# ...
path,/,root,root,0755,[]
path,/dpm/example.com/home/
path,/dpm/example.com/home/atlas,atlas_production,atlas,0750,[]
path,/dpm/example.com/home/atlas/,atlas,atlas,0755,[]
path,/dpm/example.com/home/atlas/atlasdatadisk,atlas_production,atlas,0755,'[["default_group", "rwx", "atlas_production"], ["default_group", "r-x", "atlas"]]'
path,/dpm/example.com/home/atlas/atlasscratchdisk,atlas,atlas,0755,'[["default_group", "rwx", "atlas"]]'
path,/dpm/example.com/home/atlas/atlaslocalgroupdisk,atlas_cz,atlas,0755,'[["default_group", "rwx", "atlas_production"], ["default_group", "rwx", "atlas_cz"], ["default_group", "r-x", "atlas"]]'
path,/dpm/example.com/home/atlas/atlasgroupdisk,atlas_production,atlas,0755,'[["default_group", "rwx", "atlas_production"], ["default_group", "r-x", "atlas"]]'
# ...
path,/dpm/example.com/home/belle,belle_production,belle,0750,[]
path,/dpm/example.com/home/belle/,belle,belle,0775,[]
path,/dpm/example.com/home/belle/DATA,belle_production,belle,0755,'[["default_group", "rwx", "belle_production"], ["default_group", "rwx", "belle_lcgadmin"], ["default_group", "r-x", "belle"]]'
path,/dpm/example.com/home/belle/TMP,belle,belle,0755,'[["default_group", "rwx", "belle_production"], ["default_group", "rwx", "belle_lcgadmin"], ["default_group", "rwx", "belle"]]'
# ...

Distribute dCache configuration files

dCache configuration generated from config.csv must be transferred to the dCache headnode and disknodes.

Majority of generated configuration files are used only on dCache headnode. At the beginning of each file there is a short description that include final configuratoin file location. Following files must be stored on dCache headnode:

  • /etc/dcache/dcache.conf
  • /etc/dcache/layouts/layout-HEADNODE_FQDN.conf
  • /etc/dcache/gplazma.conf
  • /etc/dcache/ban.conf
  • /etc/dcache/multi-mapfile.group
  • /etc/dcache/multi-mapfile.user
  • /etc/dcache/multi-mapfile.vo
  • /etc/dcache/multi-mapfile.unmapped
  • /etc/dcache/vo-group.json
  • /etc/dcache/vo-user.json
  • /etc/dcache/omnisession.conf
  • /etc/dcache/LinkGroupAuthorization.conf
  • /etc/dcache/multi-mapfile.vorole (not used, just example)
  • /etc/grid-security/grid-vorolemap (not used, just example)

Disknode configuration is relatively simple, because it consists of only two files

  • /etc/dcache/dcache.conf
  • /etc/dcache/layouts/layout-DISKNODE_FQDN.conf

If you use dedicated database server it'll be necessary to update (headnode) dcache.conf file with appropriate parameters, e.g.

dcache.db.host = db.example.com
dcache.db.user = dcache
dcache.db.password = secret

WebDAV permissions to use default HTTPS port

Currently dCache startup scripts doesn't have privileges to listen on privileged port < 1024. Because we need WebDAV service to listen on default HTTPS port 443 it is necessary to provide dCache service capability that allows unprivileged process listen on privileged ports. Update dCache systemd configuration with

mkdir /etc/systemd/system/dcache@.service.d
cat > /etc/systemd/system/dcache@.service.d/capabilities.conf <<EOF
[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE
EOF

Host TLS certificates

dCache must be able to read host certificates and they must be owned by dcache user

chown dcache /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem

Change these host certificate owner on all dCache nodes.

Initialize dCache database structures

To be able to import data in dCache namespace it is necessary to create all required database structures in place. This can be done by starting dCache services on the headnode.

# remove any existing / previous / cached dCache configuration files
# (this is not necessary if you just made first clean dCache installation on headnode)
rm -rf /var/lib/dcache/zookeeper/version-2/* /var/lib/dcache/config/*

# start dCache services
systemctl daemon-reload
systemctl start dcache.target

It can take several mitutes before everything gets initialized, you can either monitor CPU load caused by java processes, watch increasing and later stable size of dCache database dump file

# dump database content
cd /tmp; runuser -u postgres pg_dumpall > dcache.sql

or list all services using dCache admin shell

ssh -p 22224 -l admin localhost '\l'

It can take some time after dCache start before admin shell become available and reachable.

Space configuration and reservation

dCache provides very flexible mechanism how to use available filesystems and it is necessary to configure space allocation using dCache pool groups, links and link groups. This configuration can become quite complex, but during namespace import migration script generate admin-cli.psu and admin-cli.reserve files. These files contain dCache admin shell commands that can be used to get dCache configuration compatible with DPM.

To import dCache linkgroup configuration admin-cli.psu execute

cat admin-cli.psu | grep -v ^# | ssh -p 22224 -l admin localhost

For detailed overview of dCache PSU configuration you can execute ssh -p 22224 -l admin localhost '\c PoolManager; psu dump setup'.

Once you configure dCache linkgroups you should be able to see space ssh -p 22224 -l admin localhost '\c SrmSpaceManager; ls link groups' available for reservation. Without running dCache services on disknode there'll be zero free space, but that's sufficient to create empty spacetoken reservations. These reservation can be done by executing admin-cli.reserve script, but it is necessary to wait a minute after executing admin-cli.psu, because it takes some time before linkgroups become available for space reservation

cat admin-cli.reserve | grep -v ^# | ssh -p 22224 -l admin localhost

Configured space reservations can be listed with ssh -p 22224 -l admin localhost '\c SrmSpaceManager; ls spaces -e'. Be sure that you see all spacetokens that were originally defined as quotatokens in the DPM.

Namespace import

Before importing data in dCache namespace it is necessary to have initialized database with defined space reservations and stop dCache services on headnode

systemctl stop dcache.target

Import should be executed directly on dCache database machine to reduce communication latency between migrate.py script and PostgreSQL database. All you need is config.csv (optionally customized), namespace.csv and migrage.py script plus its python3 dependencies that can be installed with

yum install -y python3 python3-psycopg2

Use following command to populate dCache namespace with data from namespace.csv and config.csv

python3 migrate.py --log-level=DEBUG --log-file=migrate-dcache-import.log --dcache-import

Importing data in dCache PostgreSQL namespace is relatively slow and very rough estimate is an hour per each 5M records (see cns_db -e 'SELECT COUNT(*) FROM Cns_file_metadata'). You can monitor progress in the log file.

There should be no log entries with WARNING log level and higher.

Move data files from DPM to dCache directories

dCache rely on different data directory structure. It is necessary to rename and move all data files in the dcache/spacetoken_abc/data subdirectory on each disknode filesystem defined in your original DPM configuration. Sucessfull namespace import provide data-*.example.com.csv files with informations about DPM source directory and dCache destination directory for all data files registered in the dCache namespace. It is necessary to copy data-disknode1.example.com.csv file to the corresponding disknode1.example.com and execute migration script with option that create hard links from dCache data directory to the original DPM file location, e.g.

# execute on each disknode with right data-*.example.com.csv data file
# this operation works with both python2 or python3
python migrate.py --log-level=INFO --log-file=migrate-dcache-link.log --link --link-file=data-dpmdisk1.example.com.csv

Because this command create hard links it'll be later necessary to remove original DPM files / directories. You should check output log files and there should not by any lines with WARNING log level.

Firewall configuration update

External firewall configuration doesn't change, but for internal communication dCache use TCP ports 33115-33145 between dCache nodes (default values for dcache.net.lan.port.min-dcache.net.lan.port.max configuration options).

Some sites share same port range for lan and wan ports, but it is better to use different port range for internal vs. external communication.

Enable and start dCache services

To enable automatic dCache services startup on headnode and disknodes it is necessary to configure systemd to start dcache.target

systemctl enable dcache.target

When you change layout configuration file it is necessary to reload systemd before you can (re)start dCache

systemctl daemon-reload
systemctl start dcache.target

You can check list of running dCache services using admin shell, e.g. ssh -p 22224 -l admin localhost '\l'. Be aware that dCache start may take serveral minutes and you'll be able to reach admin shell only once corresponding service is up and running. You should be able to see not just headnode services, but also disknodes pools and protocol doors.

CA certificates and CRL updates (fetch-crl)

DPM managed with puppet automatically comes with fetch-crl which is also essential for correct dCache functionality. You can just keep existing fetch-crl installation and configuration, but be aware that new or reinstalled (head/disk)nodes must also come with enabled fetch-crl and installed (IGTF) CA certificates.

WriteToken updates

(NOTE: this is no longer necessary, please skip this step)

To write data in the right spacetoken it is necessary to somehow pass this information during file upload. This can be done for SRM protocol, but there is no way to specify spacetoken with WebDAV, xroot or gsiftp protocol. dCache directories can be tagged with WriteToken that is used in case client did not define spacetoken for uploaded file. This tag contain token id (not the token description) and for recursively updating WriteToken it is possible to use pushtag from dCache chimera shell, e.g.

# list existing spacetokens with token id in the first column
$ ssh -p 22224 -l admin localhost '\c SrmSpaceManager; ls spaces -e'
TOKEN LINKGROUP  RETENTION LATENCY ALLO       USED        FREE         SIZE             EXPIRES DESCRIPTION
...
5     link-group REPLICA   ONLINE    0 +   4194304 +  95805696 =  100000000                     SPACETOKEN_DESC
...
# add recursively WriteToken for SPACETOKEN_DESC to the /data directory
$ chimera writetag /dpm/example.com/home/vo WriteToken 5
$ chimera pushtag /dpm/example.com/home/vo WriteToken

Repeat for each spacetoken line in config.csv, following command provides list of directories with different spacetoken:

grep ^spacetoken config.csv | awk -F ',' '{ print $4 }' | sort | while read P; do grep -E "^spacetoken,.*,$P," config.csv; done | awk -F ',' '{ print "chimera writetag "$4" WriteToken replace_with_tokenid_for_spacetoken_"$3"\nchimera pushtag "$4" WriteToken" }'

Writetag add WriteToken label to the top level directory and pushtag recursively assign same WriteToken to the all subdirectories. In case your DPM had quotatoken e.g. on / and than for each individual VO (e.g. /dpm/example.com/home/vo1) you don't have to apply WriteToken recursively for top level directory and it would be sufficient to run following commands

$ chimera writetag / WriteToken tokenid_for_top_level_spacetoken
$ chimera writetag /dpm WriteToken tokenid_for_top_level_spacetoken
$ chimera writetag /dpm/example.com WriteToken tokenid_for_top_level_spacetoken
$ chimera writetag /dpm/example.com/home WriteToken tokenid_for_top_level_spacetoken
$ chimera writetag /dpm/example.com/home/vo1 WriteToken tokenid_for_VO1_spacetoken
$ chimera pushtag /dpm/example.com/home/vo1 WriteToken
$ chimera writetag /dpm/example.com/home/vo2 WriteToken tokenid_for_VO2_spacetoken
$ chimera pushtag /dpm/example.com/home/vo2 WriteToken
...

Applying WriteToken recursively can take quite a lot of time in case there are a lot of subdirectories.

Optional

Argus

update headnode gplazma.argus.endpoint configuration option in layout file and uncomment ban module in gplazma.conf

EGI StAR accounting

configure mapping in dcache.conf and follow EGI instruction how to publish generated data using AMS sender.

Info provider

Info provider use site information data from dcache.conf variables starting with info-provider.*. These configuration options must be updated to describe describe your site and storage. Info provider dCache service is used for publishing BDII and WLCG SRR.

BDII

BDII in dCache is populated with similar mechanism that's also used by DPM, it rely on bdii package which publishes LDAP data generated by gip provider using OpenLDAP server. You have to cleanup original script provided by DPM and create link to the dCache BDII provider script

rm /var/lib/bdii/gip/provider/*
ln -s /usr/sbin/dcache-info-provider /var/lib/bdii/gip/provider

WLCG SRR

Follow officiall dCache WLCG SRR instructions and test access with

curl -s -L http://dcache.example.com:3880/api/v1/srr

With different WLCG SRR URL it'll be necessary to update / ask for update of configuration used by consumers of this information. LHC experiments may rely on WLCG SRR endpoint stored in their own CRIC instance (e.g. for ATLAS it'll be necessary to ask cloud support to change this configuration option), WLCG CRIC use this information for accounting purposes but this instance can be updated directly by site administrators.

If you don't want to expose dCache frontend directly to the world you can use e.g. reverse proxy functionality or download and store SRR data directly in dCache namespace (this can be slightly more fragile).

To publish WLCG SRR data directly in dCache namespace you can periodically download JSON from REST interface (even with frontend access restricted to localhost by setting frontend.srr.public=false) and store these data in the dCache mounted locally via NFS. You have to uncomment following lines in the dCache headnode layout configuration

[doorsDomain/nfs]
nfs.version = 4.1

restart dCache services on the headnode

systemctl daemon-reload
systemctl restart dCache.target

export and mount dCache NFS filesystem

echo "/ 127.0.0.1(rw)" > /etc/exports
mount 127.0.0.1:/ /mnt
mkdir /mnt/static

In case you are not able to mount dCache filesystem try to verify exported fielsystems and/or reload /etc/exports

showmount -e 127.0.0.1
ssh -p 22224 -l admin localhost '\l' | grep NFS
ssh -p 22224 -l admin localhost '\c NFS-headnode; exports reload'

Top level dCache directory / may not be associated with any WriteToken and in such situation it is not possible to write files using NFS protocol. You should associate directory for WLCG SRR with new spacetoken to prevent situation where user can fill whole available space, because that would prevent publishing new WLCG SRR.

# create new 1GB spacetoken reservation from existion link group YOUR_POOL_NAME
$ ssh -p 22224 -l admin localhost '\c SrmSpaceManager; reserve space -desc=SRR -al=online -rp=replica -lg=spacemanager_linkGroup_YOUR_POOL_NAME 1073741824'
# list existing spacetokens with token id in the first column
$ ssh -p 22224 -l admin localhost '\c SrmSpaceManager; ls spaces -e' | grep SRR
# use new token id number from first column (TOKEN_ID_NUMBER) as a WriteToken label for directory used to store WLCG SRR
$ chimera writetag /static WriteToken TOKEN_ID_NUMBER

Create cron task with following command

rm -f /mnt/static/srr.new && curl -s -L http://localhost:3880/api/v1/srr > /mnt/static/srr.new && mv /mnt/static/srr.new /mnt/static/srr

This is cheap operation and WLCG SRR file should be updated at least once in hour.

Changing namespace layout (symlinks)

dCache supports symlinks that can be very useful if you would like to reorganize directory structure, e.g. you can get rid of /dpm/example.com/home and use directly /vo. To avoid transfer failures (e.g. transfers already scheduled in FTS or transfers from running jobs) without declaring downtime it can be useful first to create just symlink to the original /dpm/example.com/home/vo directory. You have to enable dCache NFS service, export filesystem and mount it to be able to create new symlink in dCache namespace.

Be aware that changes in directory structure should be discussed with supported VOs / users and most probably this update would require changes in VO storage management configuration (e.g. CRIC / Rucio). Symlinks can be useful not just for smooth transition, but you can also keep them for legacy use-cases.

Cleanup DPM

Remove DPM services

Remove DPM packages on your headnode and disknodes.

yum remove "*dmlite*" "*dpm*"

Remove old DPM data directories

Be careful while removing from your disknodes filesystems unnecessary DPM directory structures and files.

  • be sure that all data files were correctly hard linked to dCache directory
  • ALERT! never remove "dcache" subdirectory that now keeps all dCache disknode data

For example if your DPM stored files in /mnt/fs1 than migration script create /mnt/fs1/dcache for dCache data. All other /mnt/fs1/vo1, /mnt/fs1/vo2, ... subdirectories were used by DPM and they should be removed once your dCache works. Filesystems used by DPM can be listed with

$ cat config.csv | grep ^filesystem | sed 's/.*,\([^,]*\),\([^,]*\),[[:digit:]]\+,[[:digit:]]\+/\1 \2/' > dpm-hostfs.dat
$ cat dpm-hostfs.dat | while read H P; do echo "ssh $H ls $P | grep -v '^ *$' | grep -v dcache | sed 's#^#ssh $H rm -rf $P/#'"; done > dpm-hostfs-discover.sh
$ sh dpm-hostfs-discover.sh > dpm-hostfs-cleanup.sh
# verify content of dpm-hostfs-cleanup.sh before executing this script that remove old DPM files
# (be very carefull if you discover directory names with special characters)
# sh dpm-hostfs-cleanup.sh

This step is necessary, because hard linked files would still occupy space on the filesystem even after dCache unlink file from its own data directory. We could also just move all files instead of creating hard links (it is possible to call migrate.py script with --move argument), but by following this documentation you'll be able to verify that dCache works fine before data files disappear from original locations.

Remove DPM database

After migration MySQL database cns_db and dpm_db can be removed.

Replacement for dmlite-shell storage management

action dmlite-shell dCache
list directory content ls /dpm/example.com/home/dteam chimera ls /dpm/example.com/home/dteam
get physical file location info /dpm/example.com/home/dteam/1M ssh -p 22224 -l admin localhost '\c PnfsManager; cacheinfoof /dpm/example.com/home/dteam/filename'
POOL_NAME_NUMBER
ssh -p 22224 -l admin localhost '\c POOL_NAME_NUMBER; xgetcellinfo'
logical file name for sfn getlfn dpmdisk1:/mnt/fs/vo/data/pfn # use filename as pnfsid
ssh -p 22224 -l admin localhost '\c PnfsManager; pathfinder pnfsid'
pnfsid for logical file name info /dpm/example.com/home/dteam/1M # show pnfsid
ssh -p 22224 -l admin localhost '\c PnfsManager; metadataof -v /dpm/example.com/home/dteam/filename'
read-only filesystem fsmodify /base/fs/path dpmdisk.fqdn dpmpool RDONLY ssh -p 22224 -l admin localhost '\c PoolManager; psu set pool POOL_NAME rdonly'
draining filesystems drainfs ... # vacating a pool
ssh -p 22224 -l admin localhost '\c PoolManager; psu set pool POOL_NAME rdonly'
ssh -p 22224 -l admin localhost '\c POOL_NAME; migration move -exclude=POOL_NAME'
ssh -p 22224 -l admin localhost '\c PoolManager; psu remove pool POOL_NAME'
     

Troubleshooting

Re-trying dCache database import

If import script fails due to external problem (e.g. machine restart) or you decide to terminate import yourself than it'll be first necessary to drop all existing data from PostgreSQL database

# destroy dCache database
dropdb -U dcache chimera
dropdb -U dcache spacemanager
dropdb -U dcache pinmanager
dropdb -U dcache srm

and start again with " Initialize dCache database structures".

dCache pool service periodically restarted (OutOfMemoryError)

dCache pool service needs huge Java heap memory size to cache file metadata and ~ 1GB is minimum for each 1M files. If your dCache pool is really big / containg e.g. 6M files than you should set in /etc/dcache/dcache.conf configuration option dcache.java.memory.heap to at least 8192m.

Low default number of per-pool clients (max mover limit)

By default dCache configuration comes with 100 transfer limit per each pool (filesystem). This might be too low if your jobs on a cluster with thousands cores opens directly files from your storage (e.g. "ATLAS direct I/O jobs"). You can watch number of allocated movers on the internal dCache monitoring page http://dcacheheadnode.example.com:2288/queueInfo and in case your pools are close to limit than you can change it by using dCache admin shell and executing e.g.

ssh -p 22224 -l admin localhost '\c YOUR_CellName_CLOSE_TO_MOVER_LIMIT; mover set max active 500; save'

Add missing checksum to dCache file

Find dCache files without adler32 checksum:

cd /tmp; runuser -u postgres -- psql -A -t -d chimera -c "SELECT i.inumber||' '||ipnfsid FROM t_inodes i LEFT JOIN t_inodes_checksum c ON i.inumber = c.inumber AND c.itype = 1 WHERE i.itype & 32768 = 32768 AND isum IS NULL" > /tmp/nochksum; cd -

Calculate missing checksums on disknodes in /mnt/*/dcache directories

#!/usr/bin/python
import os, sys
import subprocess

def get_xrdadler32(filename):
    p = subprocess.Popen(["xrdadler32", filename], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdoutdata, stderrdata = p.communicate()
    if p.returncode == 0:
        return stdoutdata.split(' ')[0]
    return None

data = {}
with open('nochksum') as f:
    while True:
        line = f.readline()
        if line == '': break
        if line.strip() == '': continue
        inumber, ipnfsid = line.strip().split(' ')
        data[ipnfsid] = inumber

for mnt_dir in os.listdir('/mnt'):
    dcache_path = '/mnt/{0}/dcache'.format(mnt_dir)
    if not os.path.isdir(dcache_path): continue
    for pool_dir in os.listdir(dcache_path):
        data_path = '/mnt/{0}/dcache/{1}/data'.format(mnt_dir, pool_dir)
        if not os.path.isdir(data_path): continue
        for data_dir in os.listdir(data_path):
            if data_dir not in data: continue
            inumber = data[data_dir]
            filename = '/mnt/{0}/dcache/{1}/data/{2}'.format(mnt_dir, pool_dir, data_dir)
            chksum = get_xrdadler32(filename)
            print("# {0} {1} {2}".format(filename, inumber, chksum))
            print("ssh -p 22224 -l admin localhost '\c PnfsManager; add file checksum {0} adler32 {1}'".format(data_dir, chksum))

This script writes to stdout commands that can be used to add adler32 checksums in the dCache.

Fixing allocated space counters

Spacetoken total space is automatically calculated by adding/removing srmspacefile table records. Running writetag/pushtag with WriteToken doesn't change spacetoken for existing files, only files uploaded in the future will be associated with spacetoken according directory WriteToken tag. You can use migrate.py script also to associate files with right spacetoken according directory WriteToken tag

python3 migrate.py --log-level=DEBUG --log-size=104857600 --log-file=dcache-fix-spacetokens.log --dcache-fix-spacetokens

Removing DPM path prefix /dpm/fqdn/home

It is possible to get rid of DPM default and enforced namespace prefix /dpm/fqdn/home, because dCache supports symbolic links. They can be created with normal ln -s src dst if you mount dCache using NFS protocol.

# mount dCache namespace on headnode
mkdir /mnt/dcache
mount 127.0.0.1:/ /mnt/dcache
cd /mnt/dcache

# variant I - just create symlink
ln -s dpm/example.com/home/dteam dteam

# variant II - move directory and create symlink in original location
chimera mv /dpm/example.com/home/dteam /dteam
ln -s /dteam x
chimera mv /x /dpm/example.com/home/dteam
May be there is more simple solution for "variant II", but with our local configuration it was necessary to combine posix tools with chimera not to get "permission denied" errors.

-- PetrVokac - 2021-12-31

Edit | Attach | Watch | Print version | History: r122 < r121 < r120 < r119 < r118 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r122 - 2022-09-01 - PetrVokac
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DPM All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback