DDMOperationsGroup
Introduction
The Distributed Data Management operation team provides support for all Data Management related questions. It can be reached via email
atlas-adc-ddm-support@cernNOSPAMPLEASE.ch. For bug reporting, there's a
JIRA tracker
. The current Data Management system of ATLAS is called
Rucio
and it replaced the old one called DQ2 on December 2014. Rucio is a complete Data Management system that includes :
- Management of transfers between sites using tools like replication rules, subscriptions.
- Smart deletion of unneeded data
- Self-discovery of inconsistencies (so called "Dark Data" and lost files) and automatic recovery of bad files.
- Fine grained permission and quota system for differents user/services/activities.
- and many other things.
A quick overview of
Rucio concepts and jargon
Monitoring tools
Monitoring
Documentation on external components
FTS
AGIS :
Data Replication
The replication policy is documented in
ReplicationPolicy. Different tools are used to implement it
Centrally managed replications
The subscriptions are used to automatically replicate new produced datasets. They can be monitored on the
Subscription Monitor
Different subscriptions are set :
- For functional tests : Functional tests are generate by the daemon called Automatix on CERN-PROD_RUCIOTEST_DATADISK and exported on the T1s and T2s.
- For data export of RAW and AOD from Tier 0
- For DAOD export
- For EVNT export
- Export of valid dataset to specific T1s
- For other specific workflows (e.g. archiving of site datasets on TAPE)
RPG
RPG configuration
Manual Replication
Minbias distribution for pileup
campaign |
r-tag |
dataset or container |
Nfiles |
size |
file size |
mc21a |
|
mc21_13p6TeV.800831.Py8EG_minbias_inelastic_highjetphotonlepton.merge.HITS.e8341_s3775_s3787 |
5100 |
21.59 TB |
|
mc21a |
|
mc21_13p6TeV.900311.Epos_minbias_inelastic_lowjetphoton.merge.HITS.e8341_s3775_s3787 |
6000 |
8.94 TB |
|
mc16a/mc16c mc16d/mc16e |
r9364/r9781 r10201/r10724 |
mc16_13TeV.361239.Pythia8EvtGen_A3NNPDF23LO_minbias_inelastic_high.simul.HITS.e4981_s3087_s3111_tid10701335_00 |
4999 |
10.16 TB |
|
mc16a/mc16c mc16d/mc16e |
r9364/r9781 r10201/r10724 |
mc16_13TeV.361238.Pythia8EvtGen_A3NNPDF23LO_minbias_inelastic_low.simul.HITS.e4981_s3087_s3111_tid10701323_00 |
1001 |
3.2 TB |
|
mc15a/mc15b/mc15c |
r7326/r7772 etc |
mc15_13TeV.361035.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e3581_s2578_s2195 |
|
|
|
mc15a/mc15b/mc15c |
r7326/r7772 etc |
mc15_13TeV.361034.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e3581_s2578_s2195 |
|
|
|
mc15a(50ns) |
r6630 etc |
mc15_13TeV.361035.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e3581_s2578_s2169 |
|
|
|
mc15a(50ns) |
r6630 etc |
mc15_13TeV.361034.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e3581_s2578_s2169 |
|
|
|
mc12c |
r4829 etc |
mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1746_s1747 |
|
|
|
mc12c |
r4829 etc |
mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1746_s1747 |
|
|
|
mc12b |
r4485 etc |
mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1669_s1671 |
|
|
|
mc12b |
r4485 etc |
mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1669_s1671 |
|
|
|
mc12a |
r3945 etc |
mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1469_s1471 |
|
|
|
mc12a |
r3945 etc |
mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1469_s1471 |
|
|
|
mc11b |
r2923 etc |
mc11_7TeV.108119.Pythia8_minbias_Inelastic_high.merge.HITS.e848_s1354_s1360 |
|
|
|
mc11b |
r2923 etc |
mc11_7TeV.108118.Pythia8_minbias_Inelastic_low.merge.HITS.e816_s1354_s1360 |
|
|
|
Data Replication Monitoring for DDM FT
Data Replication Monitoring package provides users with access to overview information about subscribed dataset transfers and distribution.
This system is to be designed for monitoring datasets replicated to sites during Functional Tests (FT), Cosmic Runs (CR) and lately data taking.
It's complement to the existing
DDM monitoring Dashboard
tools.
FT Data Replication Monitoring
TWiki page.
Data Replication for HammerCloud
The patterns (listed
here
) (with limited list of datasets) have been put in two technical containers:
hc_test.pft
and
hc_test.aft
. Both these containers have to be replicated to all
DATADISK
endpoints associated with Analysis (AFT) or Production queues (PFT). The Rucio rules created for this purpose by DDM operations use the
Express
activity and are commented as
'Input for HC tests'
.
Using Rucio rules on containers (not on datasets) allows us to simply change content of the
hc_test
containers if the HC tests need a different input. We cannot specify one rule for all the replicas, because the number of RSEs changes too often.
The containers can be modified using the rucio attach/detach commands.
The files in hc_test should be synchronised with the list of HC in this
page.
Rucio commands to replicate the HC containers
rucio --account ddmadmin add-rule --activity Express --comment 'Input for HC tests' hc_test:hc_test.aft 1 __RSE__
rucio --account ddmadmin add-rule --activity Express --comment 'Input for HC tests' hc_test:hc_test.pft 1 __RSE__
__RSE__ should be replaced as appropriate. For AFT, if there is no
DATADISK
endpoint, the container should be replicated on the
LOCALGROUPDISK
instead.
AFT and PFT
The list of PFT and AFT tasks are listed
here
Optimal usage of storage
Lifetime model
In ATLAS, all datasets (except the RAW) have a lifetime, i.e. if they are not used they disappear. More details in
DDMLifetimeModel
Unused Data Understanding
Some of the DDM plots for the C-RSG reports are generated automatically now (zeroaccess and horn plot).
scrutiny group horn plot
http://atlstats.web.cern.ch/atlstats/scrutiny/
no acces (unused data)
http://atlstats.web.cern.ch/atlstats/zeroaccess/
The following snippet allows you to get the list of unused DAODs by creation date :
More... Close
import sys
import time
if __name__ == "__main__":
month, year = sys.argv[1:]
month = int(month)
year = int(year)
with open('list-2017-01-23', 'r') as f:
for line in f:
line = line.rstrip('\n')
if line.find('DAOD') > -1:
scope, name, size, created, nbreplicas = line.split('\t')
created_at = time.gmtime(float(created))
if created_at.tm_year == year and created_at.tm_mon == month:
print scope, name, nbreplicas, size
unused data
https://monit-zeppelin.cern.ch/#/notebook/2C7RHB1RM
detailed analysis of dump of unused data (created or touched ) organized by few months bin, split by project&datatype
https://docs.google.com/spreadsheets/d/1UHC21dso3PrUN8SrtK54Y71aDHk5tSeGqaEHowAvW7g/edit
Management of problematic files (Lost or dark data)
The site admin responsibility and optimal actions are listed in
this section
Discovery through consistency checks
The sites declared in AGIS are supposed to report every files that are found to be corrupted or lost. They are also asked to provide monthly storage dump of all their endpoints to allow automatic consistency checks. All details can be found in
this section
Declare files permanently lost
Declare files temporary unavailable
- Motivation : Prevents to access file while a disk server is temporary down
- Prevents that HC tries to access the problematic file and then blacklist the site (Aug 2019 : Why seems to use always the same input file)
- Documentation
Priority to recover files from unstable storage
Adding/removing/moving a site
All the sites know by DDM are registered in
AGIS
Adding a new site
Standard RSE

Only the sites with more than 300 TB of disk space can qualify to be a Standard RSE.
- The site need to provides a storage with SRM
- First setup 2 space tokens ATLASDATADISK and ATLASSCRATCHDISK (for T1s more are needed) associated to /blah/blah/atlasdatadisk and /blah/blah/atlasscratchdisk needs to be setup.
- Create the sites in AGIS
- The AGIS collector probe will create automatically the files in Rucio.
SRMless sites
Cache sites
Decommission or migrate RSE
Documentation
Reducing storage space at a site
This is necessary when disk servers need to be decommission and the site is full.
If the site hosts enough secondary according to
this monitoring
- The site admin can reduce the space smoothly and the secondaries will be automatically by Rucio
- Rucio team can also force the cleaning of all secondary files (procedure ?)
If the site has not enough secondaries to release enough space, replication of primary replicas has to be implemented by Rucio team.
SRM-less space reporting
If a site uses
XRootD and
WebDAV doors provided by native software (e.g.
XRootD
)
without running a full suite of Grid middleware (such as
dCache
), space reporting has to be provided externally.
This is facilitated with a JSON file which the site has to update at least every 2 hours (e.g. via cron job).
Example of the JSON format:
{
"ATLASDATADISK": {
"status": "online",
"status_message": "",
"list_of_paths": ["/xrootd/atlas/atlasdatadisk"],
"total_space": 1950000000000000,
"used_space": 1964155346110464,
"num_files": -1,
"time_stamp": 1485345907},
"ATLASUSERDISK": {
"status": "online",
"status_message": "",
"list_of_paths": ["/xrootd/atlas/atlasuserdisk"],
"total_space": 180000000000000,
"used_space": 61978398534656,
"num_files": -1,
"time_stamp": 1485345907},
"ATLASGROUPDISK": {
"status": "online",
"status_message": "",
"list_of_paths": ["/xrootd/atlas/atlasgroupdisk"],
"total_space": 650000000000000,
"used_space": 297667584194560,
"num_files": -1,
"time_stamp": 1485345907},
"ATLASLOCALGROUPDISK": {
"status": "online",
"status_message": "",
"list_of_paths": ["/xrootd/atlas/atlaslocalgroupdisk"],
"total_space": 280000000000000,
"used_space": 153225728644096,
"num_files": -1,
"time_stamp": 1485345907},
"ATLASSCRATCHDISK": {
"status": "online",
"status_message": "",
"list_of_paths": ["/xrootd/atlas/atlasscratchdisk"],
"total_space": 200000000000000,
"used_space": 71081117891584,
"num_files": -1,
"time_stamp": 1485345907}
}
Details on the format, how to create it and scripts for validation are provided in the
Rucio GitHub repository
.
Creation of the json at dCache:
remarks from Shawn from AGLT2:
crontab -l -u rsv
0 0-23/8 * * * mk-job rsv-voms-proxy-init voms-proxy-init -valid 96:00 -voms atlas:/atlas/usatlas/Role=production -out /tmp/x509up_srmcp -pwstdin < ...[path to pswd]
27,57 * * * * mk-job ruby-space-usage ruby space_usage.rb
The 'mk-job' is just a wrapper for Check_MK (used at our site). You can remove that if you are not running check_mk. The first cron just keeps credentials updated to all the Ruby script to write the output file into our dCache. The second cron does the actual work of creating the space_usage.json file (See attached example space_usage-example.rb)
We have added another "check" cron to verify the space_usage.json is getting updated:
cat /etc/cron.d/space_usage_json_check # this file written by CFEngine
12,42 * * * * root mk-job space_usage_update /bin/bash /root/tools/space-usage-json-check.sh
This script verifies the space_usage.json is not older than 30 minutes or it emails us. I am attaching this script as well.
space-usage-json-check.sh
space_usage-example.rb
Creation of the json at DPM
DPM has built-in support for
WLCG storage size reporting
since version 1.10.x, but this feature is only available with
DPM DOME configuration. Starting with DPM DOME 1.13.2 WLCG SRR is automatically enabled after installation (or puppet re-configuration) and corresponding JSON is available at
https://headnode.your.domain/static/srr
, more details in
DPM documentation.
Providing the space-usage report
The space usage report has to be updated at least each two hours and can be provided via
WebDAVs. The following restrictions can be made:
- Readable by
VO ATLAS
with Role=production
.
- Only accessible from host
rucio-nagios-prod-02
.
In case
WebDAVs is used, you can test using the following commands on a site with
ATLAS Local Root Base
available (for example, via CVMFS, or use lxplus) in case you have the appropriate permissions with your VOMS proxy:
setupATLAS
lsetup rucio
voms-proxy-init -voms atlas
lsetup davix
davix-get -P Grid https://my-example-size.com:8443/atlas/atlaslocalgroupdisk/space-usage.json
Registering JSON space-usage report in CRIC
Once the space reporting is set up, it has to be registered in CRIC. For this, the DDM Endpoint has to be configured so that the
Space Usage setting contains the URL to the JSON file. Preferably, the protocol, host and port should be the same as top-priority
read_wan
protocol defined in the storage service.
An example for an URL could be:
https://my-example-site.com:8443/atlas/atlaslocalgroupdisk/space-usage.json
Definition of closeness for PandaJedi
For the production job brokering done by
PandaJedi (
Twiki), the file
http://atlas-adc-netmetrics-lb.cern.ch/metrics/latest.json
[Network Resource Service]] is regularly updated using a cron (frequency ?)
Inside this file, two important values are filled for each pair of Pandasite(PQ):PandaSite(RSE):
- closeness : This is based on the max transfer rate value over one hour over the last civil month restricted to activities with FTS transfers. The information is extracted from DDM dashboard.
- Dynamic information : This is based on the mean transfer rate over the last hour/day/week for transfers through FTS. The information is extracted from DDM dashboard
Another information is the
semi-static closeness
. The information is stored in AGIS.and should never change.
Major updates:
--
CedricSerfon - 2016-02-03
Responsible:
CedricSerfon
Last reviewed by:
Never reviewed