DDMOperationsGroup

Introduction

The Distributed Data Management operation team provides support for all Data Management related questions. It can be reached via email atlas-adc-ddm-support@cernNOSPAMPLEASE.ch. For bug reporting, there's a JIRA tracker. The current Data Management system of ATLAS is called Rucio and it replaced the old one called DQ2 on December 2014. Rucio is a complete Data Management system that includes :

  • Management of transfers between sites using tools like replication rules, subscriptions.
  • Smart deletion of unneeded data
  • Self-discovery of inconsistencies (so called "Dark Data" and lost files) and automatic recovery of bad files.
  • Fine grained permission and quota system for differents user/services/activities.
  • and many other things.
A quick overview of Rucio concepts and jargon

Monitoring tools

Monitoring

Documentation on external components

FTS

AGIS :

Data Replication

The replication policy is documented in ReplicationPolicy. Different tools are used to implement it

Centrally managed replications

The subscriptions are used to automatically replicate new produced datasets. They can be monitored on the Subscription Monitor Different subscriptions are set :

  • For functional tests : Functional tests are generate by the daemon called Automatix on CERN-PROD_RUCIOTEST_DATADISK and exported on the T1s and T2s.
  • For data export of RAW and AOD from Tier 0
  • For DAOD export
  • For EVNT export
  • Export of valid dataset to specific T1s
  • For other specific workflows (e.g. archiving of site datasets on TAPE)

RPG

RPG configuration

Manual Replication

Minbias distribution for pileup

campaign r-tag dataset Nfiles size file size
mc16a/mc16c
mc16d/mc16e
r9364/r9781
r10201/r10724
mc16_13TeV.361239.Pythia8EvtGen_A3NNPDF23LO_minbias_inelastic_high.simul.HITS.e4981_s3087_s3111_tid10701335_00 4999 10.16 TB  
mc16a/mc16c
mc16d/mc16e
r9364/r9781
r10201/r10724
mc16_13TeV.361238.Pythia8EvtGen_A3NNPDF23LO_minbias_inelastic_low.simul.HITS.e4981_s3087_s3111_tid10701323_00 1001 3.2 TB  
mc15a/mc15b/mc15c r7326/r7772 etc mc15_13TeV.361035.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e3581_s2578_s2195      
mc15a/mc15b/mc15c r7326/r7772 etc mc15_13TeV.361034.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e3581_s2578_s2195      
mc15a(50ns) r6630 etc mc15_13TeV.361035.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e3581_s2578_s2169      
mc15a(50ns) r6630 etc mc15_13TeV.361034.Pythia8EvtGen_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e3581_s2578_s2169      
mc12c r4829 etc mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1746_s1747      
mc12c r4829 etc mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1746_s1747      
mc12b r4485 etc mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1669_s1671      
mc12b r4485 etc mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1669_s1671      
mc12a r3945 etc mc12_8TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.merge.HITS.e1119_s1469_s1471      
mc12a r3945 etc mc12_8TeV.119995.Pythia8_A2MSTW2008LO_minbias_inelastic_low.merge.HITS.e1119_s1469_s1471      
mc11b r2923 etc mc11_7TeV.108119.Pythia8_minbias_Inelastic_high.merge.HITS.e848_s1354_s1360      
mc11b r2923 etc mc11_7TeV.108118.Pythia8_minbias_Inelastic_low.merge.HITS.e816_s1354_s1360      

Data Replication Monitoring for DDM FT

Data Replication Monitoring package provides users with access to overview information about subscribed dataset transfers and distribution.
This system is to be designed for monitoring datasets replicated to sites during Functional Tests (FT), Cosmic Runs (CR) and lately data taking.
It's complement to the existing DDM monitoring Dashboard tools.

FT Data Replication Monitoring TWiki page.

Data Replication for HammerCloud

The patterns (listed here ) (with limited list of datasets) have been put in two technical containers: hc_test.pft and hc_test.aft. Both these containers have to be replicated to all DATADISK endpoints associated with Analysis (AFT) or Production queues (PFT). The Rucio rules created for this purpose by DDM operations use the Express activity and are commented as 'Input for HC tests'.

Using Rucio rules on containers (not on datasets) allows us to simply change content of the hc_test containers if the HC tests need a different input. We cannot specify one rule for all the replicas, because the number of RSEs changes too often.

The files in hc_test should be synchronised with the list of HC in this page.

Rucio commands to replicate the HC containers

rucio --account ddmadmin add-rule --activity Express --comment 'Input for HC tests' hc_test:hc_test.aft 1 __RSE__
rucio --account ddmadmin add-rule --activity Express --comment 'Input for HC tests' hc_test:hc_test.pft 1 __RSE__

__RSE__ should be replaced as appropriate. For AFT, if there is no DATADISK endpoint, the container should be replicated on the LOCALGROUPDISK instead.

AFT and PFT

The list of PFT and AFT tasks are listed here

Optimal usage of storage

Lifetime model

In ATLAS, all datasets (except the RAW) have a lifetime, i.e. if they are not used they disappear. More details in DDMLifetimeModel

Unused Data Understanding

Some of the DDM plots for the C-RSG reports are generated automatically now (zeroaccess and horn plot).
scrutiny group horn plot http://atlstats.web.cern.ch/atlstats/scrutiny/
no acces (unused data) http://atlstats.web.cern.ch/atlstats/zeroaccess/ The following snippet allows you to get the list of unused DAODs by creation date : More... Close

import sys
import time

if __name__ == "__main__":

    month, year = sys.argv[1:]
    month = int(month)
    year = int(year)
    with open('list-2017-01-23', 'r') as f:
        for line in f:
            line = line.rstrip('\n')
            if line.find('DAOD') > -1:
                scope, name, size, created, nbreplicas = line.split('\t')
                created_at = time.gmtime(float(created))
                if created_at.tm_year == year and created_at.tm_mon == month:
                     print scope, name, nbreplicas, size


unused data https://monit-zeppelin.cern.ch/#/notebook/2C7RHB1RM
detailed analysis of dump of unused data (created or touched ) organized by few months bin, split by project&datatype https://docs.google.com/spreadsheets/d/1UHC21dso3PrUN8SrtK54Y71aDHk5tSeGqaEHowAvW7g/edit

Management of problematic files (Lost or dark data)

The site admin responsibility and optimal actions are listed in this section

Discovery through consistency checks

The sites declared in AGIS are supposed to report every files that are found to be corrupted or lost. They are also asked to provide monthly storage dump of all their endpoints to allow automatic consistency checks. All details can be found in this section

Declare files permanently lost

Declare files temporary unavailable

  • Motivation : Prevents to access file while a disk server is temporary down
    • Prevents that HC tries to access the problematic file and then blacklist the site (Aug 2019 : Why seems to use always the same input file)
  • Documentation

Priority to recover files from unstable storage

Adding/removing/moving a site

All the sites know by DDM are registered in AGIS

Adding a new site

Standard RSE

HELP Only the sites with more than 300 TB of disk space can qualify to be a Standard RSE.

  • The site need to provides a storage with SRM
  • First setup 2 space tokens ATLASDATADISK and ATLASSCRATCHDISK (for T1s more are needed) associated to /blah/blah/atlasdatadisk and /blah/blah/atlasscratchdisk needs to be setup.
  • Create the sites in AGIS
  • The AGIS collector probe will create automatically the files in Rucio.

SRMless sites

Cache sites

Decommission or migrate RSE

Documentation

Reducing storage space at a site

This is necessary when disk servers need to be decommission and the site is full.

If the site hosts enough secondary according to this monitoring

  • The site admin can reduce the space smoothly and the secondaries will be automatically by Rucio
  • Rucio team can also force the cleaning of all secondary files (procedure ?)

If the site has not enough secondaries to release enough space, replication of primary replicas has to be implemented by Rucio team.

SRM-less space reporting

If a site uses XRootD and WebDAV doors provided by native software (e.g. XRootD) without running a full suite of Grid middleware (such as dCache), space reporting has to be provided externally.

This is facilitated with a JSON file which the site has to update at least every 2 hours (e.g. via cron job).

Example of the JSON format:

{
"ATLASDATADISK": {
    "status": "online",
    "status_message": "",
    "list_of_paths": ["/xrootd/atlas/atlasdatadisk"],
    "total_space": 1950000000000000,
    "used_space": 1964155346110464,
    "num_files": -1,
    "time_stamp": 1485345907},
"ATLASUSERDISK": {
    "status": "online",
    "status_message": "",
    "list_of_paths": ["/xrootd/atlas/atlasuserdisk"],
    "total_space": 180000000000000,
    "used_space": 61978398534656,
    "num_files": -1,
    "time_stamp": 1485345907},
"ATLASGROUPDISK": {
    "status": "online",
    "status_message": "",
    "list_of_paths": ["/xrootd/atlas/atlasgroupdisk"],
    "total_space": 650000000000000,
    "used_space": 297667584194560,
    "num_files": -1,
    "time_stamp": 1485345907},
"ATLASLOCALGROUPDISK": {
    "status": "online",
    "status_message": "",
    "list_of_paths": ["/xrootd/atlas/atlaslocalgroupdisk"],
    "total_space": 280000000000000,
    "used_space": 153225728644096,
    "num_files": -1,
    "time_stamp": 1485345907},
"ATLASSCRATCHDISK": {
    "status": "online",
    "status_message": "",
    "list_of_paths": ["/xrootd/atlas/atlasscratchdisk"],
    "total_space": 200000000000000,
    "used_space": 71081117891584,
    "num_files": -1,
    "time_stamp": 1485345907}
}

Details on the format, how to create it and scripts for validation are provided in the Rucio GitHub repository.

Creation of the json at dCache:

remarks from Shawn from AGLT2:

crontab -l -u rsv 

0 0-23/8 * * * mk-job rsv-voms-proxy-init voms-proxy-init -valid 96:00 -voms atlas:/atlas/usatlas/Role=production -out /tmp/x509up_srmcp -pwstdin < ...[path to pswd]
27,57 * * * * mk-job ruby-space-usage ruby space_usage.rb

The 'mk-job' is just a wrapper for Check_MK (used at our site). You can remove that if you are not running check_mk. The first cron just keeps credentials updated to all the Ruby script to write the output file into our dCache. The second cron does the actual work of creating the space_usage.json file (See attached example space_usage-example.rb)

We have added another "check" cron to verify the space_usage.json is getting updated:

cat /etc/cron.d/space_usage_json_check # this file written by CFEngine
12,42 * * * * root mk-job space_usage_update /bin/bash /root/tools/space-usage-json-check.sh

This script verifies the space_usage.json is not older than 30 minutes or it emails us. I am attaching this script as well.

space-usage-json-check.sh

space_usage-example.rb

Creation of the json at DPM

DPM has built-in support for WLCG storage size reporting since version 1.10.x, but this feature is only available with DPM DOME configuration. Starting with DPM DOME 1.13.2 WLCG SRR is automatically enabled after installation (or puppet re-configuration) and corresponding JSON is available at https://headnode.your.domain/static/srr, more details in DPM documentation.

Providing the space-usage report

The space usage report has to be updated at least each two hours and can be provided via WebDAVs. The following restrictions can be made:

  • Readable by VO ATLAS with Role=production.
  • Only accessible from host rucio-nagios-prod-02.

In case WebDAVs is used, you can test using the following commands on a site with ATLAS Local Root Base available (for example, via CVMFS, or use lxplus) in case you have the appropriate permissions with your VOMS proxy:

setupATLAS
lsetup rucio
voms-proxy-init -voms atlas
lsetup davix
davix-get -P Grid https://my-example-size.com:8443/atlas/atlaslocalgroupdisk/space-usage.json

Registering JSON space-usage report in CRIC

Once the space reporting is set up, it has to be registered in CRIC. For this, the DDM Endpoint has to be configured so that the Space Usage setting contains the URL to the JSON file. Preferably, the protocol, host and port should be the same as top-priority read_wan protocol defined in the storage service.

An example for an URL could be: https://my-example-site.com:8443/atlas/atlaslocalgroupdisk/space-usage.json

Definition of closeness for PandaJedi

For the production job brokering done by PandaJedi ( Twiki), the file http://atlas-adc-netmetrics-lb.cern.ch/metrics/latest.json[Network Resource Service]] is regularly updated using a cron (frequency ?)

Inside this file, two important values are filled for each pair of Pandasite(PQ):PandaSite(RSE):

  • closeness : This is based on the max transfer rate value over one hour over the last civil month restricted to activities with FTS transfers. The information is extracted from DDM dashboard.
  • Dynamic information : This is based on the mean transfer rate over the last hour/day/week for transfers through FTS. The information is extracted from DDM dashboard

Another information is the semi-static closeness. The information is stored in AGIS.and should never change.


Major updates:
-- CedricSerfon - 2016-02-03

Responsible: CedricSerfon
Last reviewed by: Never reviewed

Topic attachments
I Attachment History Action Size Date Who Comment
Unix shell scriptsh space-usage-json-check.sh r1 manage 1.5 K 2018-01-12 - 14:27 TomasJavurek shell script
Unknown file formatrb space_usage-example.rb r1 manage 2.6 K 2018-01-12 - 14:28 TomasJavurek ruby script
Edit | Attach | Watch | Print version | History: r37 < r36 < r35 < r34 < r33 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r37 - 2021-03-10 - AlessandraForti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback