DDMDarkDataAndLostFiles

Introduction

This Twiki documents the actions for temporary unaccessible files or permanently lost or dark files.

Temporary unaccessible files are files not accessible but known to the headnode (tested through gfal-ls). It is up to the site admin to check if the file is

  • on a temporary unavailable disk server
  • on a broken disk server
  • not physically on a disk server
On the two last cases, the file has to be declared lost in Rucio

The massive detection of lost or dark files can be done through the detection of inconsistencies between the Rucio database and the Storage elements are:

  • Dark data: Files present on the Storage Element that don't exists on the Rucio Database.
  • Lost files: Files registered on Rucio that don't exists on the Storage Element.

These inconsistencies can be detected in different ways :

  • applications (rucio mover in pilots, FTS) cannot access files (only lost files). These problems are reported to Rucio monitoring as well as successfull accesses.
  • comparing the content of the databases coming from the SE (sites are expected to publish them on weekly basis) to the Rucio. Dumps of these databases are used. The problematic files are reported to Rucio monitoring.

The objective of this TWiki is to document the different tools and actions taken automatically or manually by ATLAS-Rucio

Site admin responsibility

To smooth the operation of data management, the site admin roles are :

  • Smooth operation
    • Declare downtimes for RSE in GOCDB
    • Provide a dump of the RSE on a monthly/weekly basis. Rucio tool will process it automatically to
      • detect files unkown (also called dark files) or missing according to Rucio catalog. Lost files will automatically cleaned by Rucio tools
  • Failure at site
    • Provide a dump of files temporary unavailable. The files should be registered in the declare-bad command by the squad (Rucio team only as backup).
    • Provide a dump of files permanently unavailable. The files should be registered in the declare-bad command by the squad (Rucio team only as backup). It will trigger :
      • Sorting out files which are unique or have second replicas
      • Replication from another replica when possible
      • Declaration of file lost in Rucio and, if necessary, remove the file from dataset content. The list of files produced centrally will be sent to the ADC production team who will decide to reproduce them or not

If the site admin is trying to transfer files from an unstable hardware to another disk, the priorities are the following :

  • Files which should never be transfered and identified by their pattern
    • Files in rucio/tests directory should not be replicated (they should have disappeared anyway)
    • Files with name finishing with 'upload' pattern are remnants of failed replication to RSE
  • Files which should be transfered with low priority and identified by their pattern
    • The .log. files
  • Files not mandatory to be transfered as Rucio can do it
    • Files with a second replica on the Grid : The squad should declare them as lost and Rucio will trigger the replica. Please check that replication is done before giving up
  • Files to be recovered as much as possible * All files not listed before * All files in LOCALGROUPDISK can be assumed as single copy

ATLAS search for lost/dark data

Inconsistency in size between Rucio and storage

As stated above the amount of dark data for a given RSE endpoint can be calculated as the difference between:

  • occupied space reported by SE
  • sum of files for given DDM endpoint as accounted by Rucio

The occupied storage space according to RUCIO is available through :

$ rucio list-rse-usage PRAGUELCG2_LOCALGROUPDISK Usage: == used: 287203280562763 ...

Inconsistencies between list of files provided by sites and Rucio catalog

Whenever a dump of the storage is exposed by the site admin, a Rucio tool is comparing with Rucio dumps produced 3 days before and after :

  • The files not available in RSE dump but known to Rucio are automatically deleted from Rucio catalog
  • The files not available in Rucio dumps but in the RSE dump have to be deleted by the site admin (list to be found)

This is done though the comparison of list of files from storage and Rucio dumps.

Produce Rucio dump (for Rucio admin)

To use this procedure, in order to check dark and lost files, the SE dump should be at least one day old (as the checking algorithm needs to download a rucio replica dump newer than the SE dump and the rucio replica dumps are generated on a daily basis).

As for the rucio replica dump, the difference in days is not too important as far as one of the dumps is older and the other newer than the SE dump. But is recomended to take rucio replica dumps with at least 3 days of difference with the SE dump (this can be set with the --delta argument of the script rucio_dump) where the newest rucio replica dump is as new as possible to avoid false positives by transient files.

Procedure to provide Storage Element dumps (Site responsibility)

The procedure to generate the storage dumps depend on the implementation of the Storage element used. The output needed is a text file with the following properties:

  • List of files (with relative path) from the directory tree belonging to particular DDM endpoint
    • No slash at the beginning of the file path (as this is a relative path)
    • Generally the mapping between DDM endpoint and storage directory is SITE_TOKEN -> /pathtoatlasdiskspace/atlasTOKEN/
      • For PRAGUELCG2_DATADISK it means everything under /dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/ but listed without this prefix (see CRIC for details)
      • An exception is group space, where many different physics group tokens can be located under atlasgroupdisk/<groupname>/rucio
    • There should be no other directories under atlasTOKEN except SAM and rucio. If there are others they can be deleted.
  • For DPM and DCache specify an age of at most 1 day (arguments "-a" and "-D" described below)*.

Example of the expected format for the storage dumps:

SAM/testfile-DEL-ATLASDATADISK-1445909862-745df21152ec.txt
SAM/testfile-GET-ATLASDATADISK.txt
SAM/testfile-prep-GET-ATLASDATADISK.txt
step09/a7/bd/step09.20201328010443.physics_B.recon.AOD.closed._lb0004._0001_1373530360
step09/65/df/step09.20201334011395.physics_B.recon.AOD.closed._lb0002._0001_1376957544

The scripts provided below already produce this format. But, if a custom script is used, the following piece of code may help as hint of how to parse the dump to get the expected format:

lastcomponent=atlasdatadisk  # Change this variable for each endpoint
cat dump | sed -r "s@^.+/$lastcomponent/@@;s@^rucio/@@" > stripped_dump

# lastcomponent should be equal to the last component in the path before the rucio directory
# for example for PHYS-TOP groupdisk some sites will need to set:
# lastcomponent=phys-top

Info In the past we used the syncat format for the dumps of Storage elements. Our experience showed that this format is difficult to use and comparing dumps of huge Storage elements takes too long. This is why we switched to simple text format.

Info * It is important for the detection of lost files the age of the storage dump to be smaller than the --delta argument used to check for consistency.

DPM

The DPM dump can be obtained with script https://gitlab.cern.ch/lcgdm/dpm-contrib-admintools/blob/develop/src/dpm-dump.

All dumps for praguelcg2 endpoints can be obtained by single command
$ /usr/bin/dpm-dump \
    --log-level=INFO --log-file=/var/log/dpm-dump.log --log-size=$((1024*1024)) --log-backup=4 \
    --txt-path=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasdatadisk/dumps/dump_$(date "+%Y%m%d") \
    --txt-path=/dpm/farm.particle.cz/home/atlas/atlasscratchdisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasscratchdisk/dumps/dump_$(date "+%Y%m%d") \
    --txt-path=/dpm/farm.particle.cz/home/atlas/atlaslocalgroupdisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlaslocalgroupdisk/dumps/dump_$(date "+%Y%m%d") \
    --txt-path=/dpm/farm.particle.cz/home/atlas/atlasgroupdisk/phys-hi/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasgroupdisk/phys-hi/dumps/dump_$(date "+%Y%m%d")

In case of failure with WebDAV upload you can try to use root protocol instead of davs or save output data in the file://path/filename.txt. More details how to use dpm-dump are provided by built-in help

$ /usr/bin/dpm-dump --help

Since dmlite 1.15.0 this functionality is part of dmlite shell and dpm-dump command above can be replaced with

dmlite-shell --log-level=INFO --log-file=/var/log/dpm-dump.log --log-size=$((1024*1024)) --log-backup=4 -e "dump txt-path=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasdatadisk/dumps/dump_$(date +%Y%m%d) txt-path=/dpm/farm.particle.cz/home/atlas/atlasscratchdisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasscratchdisk/dumps/dump_$(date +%Y%m%d) txt-path=/dpm/farm.particle.cz/home/atlas/atlaslocalgroupdisk/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlaslocalgroupdisk/dumps/dump_$(date +%Y%m%d) txt-path=/dpm/farm.particle.cz/home/atlas/atlasgroupdisk/phys-hi/rucio/,davs://golias100.farm.particle.cz:443/dpm/farm.particle.cz/home/atlas/atlasgroupdisk/phys-hi/dumps/dump_$(date +%Y%m%d)"

Note the dpm-dump script is also part of the DPM admin tools, but to be able to use command line options mentioned above you'll need latest package version 0.2.5

dCache

The dCache (chimera DB) dump script by Gerd Behrmann can be obtained here: chimera_find.sh

$ # Assuming the current date is 2015-07-30
$ sh chimera_find.sh -D "1 day ago" dump_20150729 /pnfs/ndgf.org/data/atlas/disk/atlasdatadisk/rucio /

Posix Storage Elements (NFS, PGFS, Lustre, etc...)

The Posix dump script based on the checks of Horst Severini can be obtained here: posix_dump.sh.

$ sh posix_dump.sh --prefix /storage/data --spacetokens atlasdatadisk,atlaslocalgroupdisk > dump_20150727

Other implementations of SE

If the site runs another implementation of Storage element, the site administrator should find a specific way to dump the content in a file with the structure described above.

Compare the dumps

The script to download and compare the dumps is called "rucio_dump" and can be downloaded rucio_dump (a bundled version of the script with no dependencies can be downloaded from rucio_dump (binary x86_64)).

Example:

$ ./rucio_dump.py consistency --delta 3 FZK-LCG2_SCRATCHDISK dump_20150721
DARK,atlas/atlasscratchdisk/user10.KaiLeffhalm/test.2gb.01
DARK,SAM/testfile--GET-ATLASSCRATCHDISK.txt
DARK,SAM/testfile-DEL-ATLASSCRATCHDISK-1417766663-22ab37a0dcbd.txt
DARK,SAM/testfile-DEL-ATLASSCRATCHDISK-1417853202-5c6b32af5f7f-sgl.txt
...
LOST,data10_7TeV/00/41/data10_7TeV.00152221.physics_MinBias.recon.ESD.f572._lb0145._0003.1
LOST,data10_7TeV/00/42/data10_7TeV.00152221.physics_MinBias.recon.ESD.f572._lb0101._0001.1
LOST,data10_7TeV/00/4e/data10_7TeV.00152221.physics_MinBias.recon.ESD.f572._lb0063._0010.1
....

This script does the following:

  • Downloads rucio replica dumps 3 days older and 3 days newer than the SE dump (taking the date of the SE dump from its name).
  • SE dump lines are parsed removing the prefix of the path (if there is one) based on the prefix reported by CRIC. This is done in order to match the paths in the SE dumps with the ones in the Rucio Dumps.
  • Sorts the rucio replica dumps and the SE dump by path.
  • Traverse the 3 sorted dumps printing:
    • Files present in both rucio dumps but not in the SE dump (lost files)
    • Files present in the SE dump but not in any of the rucio dumps (dark files)

Important note: Sorted and downloaded files are saved under a "cache" subdirectory. If you are working with different SE dumps for the same spacetoken and with the same date, delete the "_sorted" file corresponding to the SE dump in the "cache" subdirectory in each run.

It is possible to run the check with custom dates or with manually downloaded rucio dumps, check the README.md file for further details.

The results of this script can be checked using Rucio to verify darks and GFAL to verify lost files.

cat output | grep '^LOST,' | ./verify-gfal.py

The output of verify-gfal.py has the format:

STATUS<CODE>,URL

Where

<CODE>

is the status code returned by gfal_ls, a status code of 0 means the file exist in the RSE and a status code of 2 means the file is really lost, other codes may reflect different kinds of errors and should be checked in GFAL documentation.

The next example output only the URLs of lost files verified with GFAL:

cat output | grep ^LOST, | ./verify-gfal.py | grep ^STATUS2, | cut -d, -f2

Automated checks: Site Responsibility

Dear ATLAS Site Responsible, to minimize the operational efforts on your side for what concern storage consistency checks, we have developed a tool to centrally taking care of the storage consistency checks.
The (New) rucio-auditor daemon can perform consistency checks for all ATLAS sites. For it to work we need that the sites publish one dump per disk endpoint each month using the following schema:

<se>/remove_suffix("rucio/",<resource.endpoint><endpoint>)/dumps/dump_<YYYYMMDD>

Where <se> , <resource.endpoint> and <endpoint> are the fields with the same name as reported by CRIC in: https://atlas-cric.cern.ch/api/atlas/ddmendpoint/query/?json N.b.: ATLASDATADISK , ATLASSCRATCHDISK spacetokens(storage areas) for almost all the sites coincide with the path of the ATLAS DDM endpoint, but for the ATLAS DDM GroupDisks endpoints the site admin need to really check CRIC to see what to publish.

For example a dump generated on 28/10/2015 for FZK-LCG2_DATADISK (using the argument "-D '1 day ago'" equivalent to "-a 1" for DPM) is expected to be accessible on:

root://atlasxrootd-kit.gridka.de:1094//pnfs/gridka.de/atlas/disk-only/atlasdatadisk/dumps/dump_20151027

The date in the filename should match the date specified in the argument "-a" if using dpm_dump.py or the argument "-D" if using "chimera_find.sh", it is recommended to use an age of 1 day. If the dump is generated using "posix_dump.sh" the date should match the date when the script was executed.

The dumps will be checked monthly. The current implementation downloads the dumps of the sites starting on the first days of the month, therefore to be checked with no delay the dumps should be generated at the end of the month (e.g. on the 25th day of the month).

The following script can be used (with appropriate variables changed) on DPM to generate the dumps automatically with proper paths: dpm_generate_and_upload.sh

Dark Data

Dark Data cleanup

The dumps provided by the sites are compared to the Rucio dumps and Dark Data are identified and automatically cleaned up ! No action of the sites/clouds are needed at this point.

Manual recovery of lost or corrupted files

  • This method is known to report false positives for lost files: First make sure the file is not present in the SE and is expected to be located at the RSE (for example with gfal-ls and rucio list-file-replicas).
  • If a file is lost or confirmed to be corrupted at a site it should be declared bad.
  • If the total file loss is large (>50k files) then please report it in the DDM ops Jira so the recovery can be coordinated with central ADC operations.
  • The bad files can be declared using the rucio-admin CLI. Cloud admins can declare bad files in their own cloud. Anyone inside the egroup atlas-adc-cloud-xx is given cloud admin privileges for cloud xx.
    • To check if you have cloud admin privileges run rucio-admin account list-attributes youraccount and look for "cloud-xx: admin" in the output.
rucio-admin replicas declare-bad -h
usage: rucio-admin replicas declare-bad [-h] --reason REASON
                                        [--inputfile [INPUTFILE]]
                                        [listbadfiles [listbadfiles ...]]

positional arguments:
  listbadfiles          The list of bad files

optional arguments:
  -h, --help            show this help message and exit
  --reason REASON       Reason
  --inputfile [INPUTFILE]

You can either pass a list of bad files directly or give a file (using the --inputfile option) where the list of bad files is located. You also have to specify a reason : it can be a URL pointing to a GGUS/JIRA ticket or whatever text describing the reason of the loss or corruption :

rucio-admin replicas declare-bad --reason "https://its.cern.ch/jira/browse/ATLDDMOPS-xyz" --inputfile mytestfile.txt

If some files are not known to Rucio, they will be printed out like this :

UKI-LT2-QMUL_DATADISK : PFN srm://se03.esc.qmul.ac.uk/atlas/atlasdatadisk/rucio/tests/38/50/blahblah cannot be declared.
UKI-NORTHGRID-LIV-HEP_DATADISK : PFN srm://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/atlas/atlasdatadisk/rucio/tests/38/50/blahblah cannot be declared.

These files must be removed manually by site admins using whatever method is easiest.

For the other files, Rucio will try to recover them from another place if possible or delete the files from the site otherwise. The progress of the system dealing with bad files can be seen in 2 places :

  • In the DDM dashboard by selecting the recovery activity : DDM dashboard with activity Recovery .
  • In the Rucio UI, where you have access to the whole history : Rucio UI Recovery. You can use different filter to check the status of the recovery at a given site or on a given period.

Site admin reporting lost/unaccessible files

Lost files on tape

Files on tape are treated differently from those on disk. On disk the same filename is always used since the path is deterministic, however the convention is different for tapes and transfer retries and recovery always use a different filename (a timestamp is appended) for safety reasons. In addition deletion of data on tape is only run manually and occasionally during large cleanup campaigns. As a result bad files on tape are never removed from the storage namespace whether they are recovered or not. Therefore before declaring bad files on tape, the site admins should themselves remove the bad files from the storage namespace.

Manual checks

If instead of the rucio_dump script you want to make the checks manually you can find the procedure below.

Description of the detection of dark data and lost files using dumps

The following is a visualization of how to verify the state of the files given we have a dump of the DDM endpoint and two Rucio replica dumps (one previous to the endpoint dump and another newer), this is an idealized model based on optimal conditions (ie: the dumps are complete, and the date of the dumps accurate):

The subsets marked with an asterisk are supposed to be "Deleted Files" or "New Files" respectively but it cannot be confirmed only with this 3 dumps. The rest of the subsets are more likely to be ok, but anyways is a good idea to verify the results by other means before taking action.

In conclusion:

  • The files in the storage dump but not in the rucio dumps are very likely Dark Data.
  • The files in both rucio dumps but not in the storage dump are very likely Lost Files.
  • Adjusting D can change the results, larger numbers in D produce more reliable information about Lost Files but at the same time create more false positives for Dark Data (if files are created between T-D and T and then deleted between T and T+D).

Downloading Rucio dumps manually

The dump of files registered in Rucio for a given DDM endpoint (rucio replica dumps) can be obtained from https://rucio-ui.cern.ch/dumps, these are tab-separated files compressed with bzip2.

$ wget -q --no-check-certificate 'https://rucio-hadoop.cern.ch/replica_dumps?rse=PRAGUELCG2_LOCALGROUPDISK&date=18-08-2014' -O PRAGUELCG2_LOCALGROUPDISK.ruciodump.18-08-2014.bz2
$ bunzip2 PRAGUELCG2_LOCALGROUPDISK.ruciodump.18-08-2014.bz2

Note: Is necessary to be careful parsing the dumps with shellscripts as fields may contain spaces (i.e: "bzcat PRAGUELCG2_LOCALGROUPDISK.ruciodump.18-08-2014.bz2 | awk '{print $7}'" may not print the 7th column). One way to overcome this is: "bzcat PRAGUELCG2_LOCALGROUPDISK.ruciodump.18-08-2014.bz2 | tr '\t' ',' | cut -d, -f7".

Hints:

  • In the past the command "comm" was used to compare the dumps
  • The command "sort" can sort big files in a efficient way
  • Setting the environment variable LC_ALL=C makes many GNU commands related to text processing to work faster ("sort" in particular)

General algorithm

  • Download two rucio replica dumps (one older and one newer than the SE dump)
  • Parse the files and generate a file for each replica dump with only the paths of the files
  • Parse the SE dump and remove the path prefixes, the result should have paths consistent with the rucio replica dumps (relative parts without the "rucio" component).
  • In the following steps below, every mention of a "dump" refers to the parsed and sorted version generated here.

Dark files

  • Generate a sorted file with the content of both rucio replica dumps.
  • Compare the file generated above with the SE dump. Every path present only in the SE dump is dark data.

Lost files

  • Generate a sorted file with the intersection of both rucio replica dumps.
  • Compare the file generated above with the SE dump. Every path present only in the intersection is a lost file.
  • The current status of the files can be obtained from the SE with "gfal-ls" in order to verify the results.


Major updates:
-- FernandoLopez1 - 2015-07-15

Responsible: Main.unknown
Last reviewed by: Never reviewed

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng consistency_states.png r1 manage 56.3 K 2015-07-15 - 14:57 UnknownUser States of the files from the point of view of the dumps
Edit | Attach | Watch | Print version | History: r54 < r53 < r52 < r51 < r50 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r54 - 2022-05-17 - FabioLuchetti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback