Categories of files/samples on _Disk:

  • files/samples without custodial location (we produced them but don’t want them to go to tape)
  • files/samples with custodial location, we decided to keep them on _Disk permanently
    • RAW of 2012 data for re-reconstruction passes
    • AOD of 2012 data for analysis
    • => the rules we used for populating the _Disk endpoint
  • input to workflows which are not in the other categories
    • can be cleaned up after we checked that the workflow is finished
    • start talking to the workflow team => why can’t the workflow team delete the input samples after they announced the output
    • there is also a web interface where you can see if a workflow is still running or not
  • output of workflows (AOD, AODSIM)
    • in general we would like output of recent workflows to stay on disk
    • old output can be considered to be cleaned up
* important that we check that there is a custodial location * we split in the above 3 categories * make lists for all samples split into era/tier

Goal:

to have the T1_*_Disk endpoints filled below 80% of their capacity

Procedure

  • check every week and if _Disk is > 80% full, make suggestion to clean up
  • check for samples that are on several _Disk endpoints

First to clean up:

* GEN-SIM which has already been processed and is not expected to be processed again

1- Get all GEN-SIM datasets at T1_UK_RAL_Disk

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicasummary?node=T1_UK_RAL_Disk&create_since=0&dataset=/*/*/GEN-SIM or

python datasvc.py --service blockreplicasummary --options "node=T1_UK_RAL_Disk&create_since=0" --path /phedex/block/name | tee cleaning_blocks

2- Exclude datasets without custodial location

cat cleaning_blocks | cut -d '#' -f 1 | sort | uniq > cleaning_datasets
awk '{system("python checkReplica.py --option custodial:y --dataset "$1)}' cleaning_datasets

3- Exclude datasets if they are being used by a running workflow as an input ("assigned|acquired|running|running-open|running-closed|assignment-approved")

4- Exclude dataset if it is an output dataset of a workflow which finished within last ~6 months

5- Cross-check if remaining datasets have a custodial location

6- Send the list to Andrew(andrew.lahiff@stfcNOSPAMPLEASE.ac.uk) and Dave(dmason@fnalNOSPAMPLEASE.gov)

7- If they approve the list, create a deletion request.

Second to clean up:

data RAW: first 2011, then 2012

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-06-13 - MericTaze
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback