Procedure for recovering lost data at T2s


This procedure is intended to recover and identify lost data at a CMS T2. Here it is assumed that a list of lost files (lfn,pfn,pnfsid) is given, called lost-dcache.lst, and the example is performed using dCache.

There are various types of data that can be lost:

  • Official Datasets
  • Production files to be uploaded to T1s
  • User files
  • Not Identified

For each of the identified types, a different procedure has to be followed:

  • Official datasets: retransfer them
  • Production files: ask for invalidation
  • User/Other files: I hope you have a backup frown

Some useful script can be found here:

Category Identification

Retrieving lost files from PhEDEx

Needed input is a filelist. The files registered to PhEDEx can be retrieved using the BlockConsistencyCheck utility, from the PhEDEx node:

PHEDEX/Utilities/BlockConsistencyCheck -verbose -db ~/config/DBParam.CSCS:Prod/CSCS \
-buffer T2_CH_CSCS -block '%' -tfc config/SITECONF/T2_CH_CSCS/PhEDEx/storage.xml > consistency_check_`date +'%Y%m%d'`.txt

-verbose actually lists all the missing files and not only the blocks, while

`date '%Y%m%d'`
automatically retrieves the current date in the YYYYMMDD format.

The outpufile contains lots of data, but we are interested only in missing files (not in size mismatch, in this case), so:

cat consistency_check_20091029.txt | grep LFN | grep "SizeMismatch" | \
cut -d= -f2 | awk '{print $1}' | grep store > missingFiles_20091029_mismatch.txt

cat consistency_check_20091029.txt | grep LFN | grep "Missing" | \
cut -d= -f2 | awk '{print $1}' | grep store > missingFiles_20091029_missing.txt

The missingFiles_20091029_missing.txt is the file you're interested in, an contains a lfn list. Let's suppose you have a a file containing all the lost files in pfn format: you need to convert it to be able to check the two lists.

Finding categories

First of all, production files need to be found. It is assumed that a production file has no other replicas. 
takes as input a lfn list and checks for files with no other replicas (please, modify MY_SITE variable in the script accordingly). In order, to be used,
is needed:

$ wget --no-check-certificate
$ chmod +x dbssql
$ ./ lost-lfn.lst 

In this case, lost-lfn_withReplica.lst will contain files with replicas, lost-lfn_noReplica.lst then will be production files. Now, we need to remove from the total list files the production ones:

$ cat lost-lfn.lst | sort > lost-lfn_sorted.lst & cat  lost-lfn_withReplica.lst | sort >  lost-lfn_withReplica_sorted.lst
$ comm -23 lost-lfn_sorted.lst  lost-lfn_withReplica_sorted.lst > lost-lfn-2.lst
$ rm lost-lfn_sorted.lst lost-lfn_withReplica_sorted.lst

Given lost-lfn-2.lst, now we need to divide it in categories. 
is a script which takes a lfn list as input and then outputs a file for each of the following categories:

If a second lfn list is given, this is actually subtracted from the first: the resulting list is *_common.lst. So:

# ./ lost-lfn-2.lst missingFiles_20091029_missing.txt 
# ls *.lst
lost-lfn-2_common.lst  lost-lfn-2_LT07.lst   lost-lfn-2_Unmerged.lst  
lost-lfn-2.lst     lost-lfn-2_Other.lst  lost-lfn-2_User.lst  

Getting files lost by users

Run this command:

./ lost-dcache.lst

This will give you the number of lost files per user, as well as a text file per user containing his/her files.


"PhEDEx" files

The lost blocks are inside the consistency_check_20091029.txt file. You can use
to retrieve them:

#./ consistency_check_20091116.txt

"Production" files

They are the files contained in lost-lfn_noReplica.lst. Open a Savannah ticket to DataOps ( ), attaching the file list

"LoadTest" files

They are part of the LoadTest for debugging links. Open a Savannah ticket as above or, if there are many missing files, restranster them from some other site

"User" files

Send a sad mail

"Unmerged" files

They should not be a problem, check with the Prod team

"Other" files

You actually need to parse the list and see what they are...

-- LeonardoSala - 30-Oct-2009

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2012-02-02 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback