An elegant and rather convenient way of managing a number of related CMS datasets as one group is through the use of Rucio Containers.

In a typical CMS user's analysis workflow, one usually needs a number of closely related input datasets to be available at the site where the analysis jobs will run, before those jobs are submitted via CRAB or locally. Later, in the course of expanding or improving the analysis, one frequently needs to add or remove some input datasets. Finally, when the work on this particular analysis is done, one wants to clean up the storage space used by these datasets at the site, and reclaim their user-quota.

All this can be handled with a single user-defined Rucio Container.

CMS datasets are Rucio containers already, consisting of multiple blocks (Rucio 'datasets'), each holding multiple files. But a Rucio Container can contain other containers (recursively), so a user can create a new (meta)container in their own user-scope (user.joe:) and add (or 'attach') the already existing CMS datasets (e.g. from the cms: scope) to that container. Then, by creating just a single Rucio replication rule, the user can have their container (with all selected CMS datasets in it) copied/transferred to the desired site (RSE).

The beauty of this approach is that when the user decides to add/remove datasets to their container, no new replication rules are needed - Rucio handles all the data-transferring automatically as part of the already existing rule.

Let's demonstrate this with a specific example:

Create a User Container

$ rucio add-container user.piperov:/Analyses/Hmumu2020/USER
Added user.piperov:/Analyses/Hmumu2020/USER

NB: The container name must have the datatier "/USER" at the end. Creating containers like /X/Y/MINIAOD etc is prohibited.

Add some initial datasets to the Container

$ rucio attach user.piperov:/Analyses/Hmumu2020/USER cms:/SingleMuon/Run2018A-02Apr2020-v1/NANOAOD cms:/SingleMuon/Run2018B-02Apr2020-v1/NANOAOD
DIDs successfully attached to user.piperov:/Analyses/Hmumu2020/USER

Subscribe/Transfer the container to a site

$ rucio add-rule user.piperov:/Analyses/Hmumu2020/USER 1 T2_US_Purdue

$ rucio list-rules --account piperov now shows the new rule

Add/Remove datasets to the Container

$ rucio attach user.piperov:/Analyses/Hmumu2020/USER cms:/SingleMuon/Run2017B-02Apr2020-v1/NANOAOD cms:/SingleMuon/Run2017C-02Apr2020-v1/NANOAOD
DIDs successfully attached to user.piperov:/Analyses/Hmumu2020/USER

$ rucio detach user.piperov:/Analyses/Hmumu2020/USER cms:/SingleMuon/Run2018A-02Apr2020-v1/NANOAOD
DIDs successfully detached from user.piperov:/Analyses/Hmumu2020/USER

Check the current contents of the container

$ rucio list-content user.piperov:/Analyses/Hmumu2020/USER
| SCOPE:NAME                                    | [DID TYPE]   |
| cms:/SingleMuon/Run2016B-02Apr2020-v1/NANOAOD | CONTAINER    |

List all your user-containers

When you don't remember the name of a container that you created long ago, you can list all your containers like this (substituting 'piperov' with your username):

$ rucio list-dids --filter type=CONTAINER user.piperov:*
| SCOPE:NAME                                          | [DID TYPE]   |
| user.piperov:/Analyses/Tests/USER                   | CONTAINER    |

Delete the container from a site

$ rucio delete-rule 2c9dbcc2c72549e890fa53de7f46a75d (the hash is from the add-rule command above)

Delete the container permanently

Note that one does not need to delete the container after finishing work on a given analysis. As long as there are no active replication rules for the container at any site, it is not 'wasting' any storage resources, and can be kept for bookkeeping or for future reference and re-runs of this same analysis.

If, however, the user is determined that a given container will never be needed again, it can be deleted permanently:
$ rucio erase user.piperov:/Analyses/Hmumu2020/USER

Since the operation is irreversible, Rucio will show a warning and will give a 24h window in which the user can rethink and reverse that decision, along with a sample command to use as an undo:
$ rucio erase --undo user.piperov:/Analyses/Hmumu2020/USER


  • Container names have to follow the standard naming convention for CMS Datasets, with some further restrictions for users. As a result, a valid user-container name needs to have the 'tier' (third, last) part identically 'USER', as in the current example: user.piperov:/Analyses/Hmumu2020/USER
    That's why we chose to encode the useful part of the name in the previous two fields.
  • Container names are unique and non-reusable. After deleting a container one cannot create a new one with the same name. Ever! So it is advisable to name the containers in a way that will prevent future conflicts. For example, by including the year/month or some other temporal characteristics in the name of the container as we did in this example

-- StefanPiperov - 2020-06-16

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2023-03-02 - BenediktMaier
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback