%CERTIFY%

-- DO NOT FOLLOW THIS INFORMATION ! --

It is archived for informational purposes only!

Instead, go here for the successor tools: AtlasComputing/DQ2Clients

How to access CSC data using DQ2 end-user tools

Introduction

This page describes how to access CSC data managed by DQ2 using the dq2_get and other end-user tools

Setup

Warning, important You need to setup LCG User Interface before using DQ2 end-user tools.

1. LCG User Interface

First, make sure you have a grid certificate. See Starting on the Grid.

Second, make sure you are a member of the ATLAS Virtual Organisation:

Third, make sure you use a recent Grid client installation, especially when it comes to Grid CA certificates. Ask your Grid system administrator when in doubt.

  • Warning, important whenever you encounter "credentials" error or "authentication" error, contact your local Grid administrator first!

Then

For CERN lxplus users

$ source /afs/cern.ch/project/gd/LCG-share/current/external/etc/profile.d/grid-env.[c]sh
$ voms-proxy-init -voms atlas

For BNL ACF users (when using acas nodes, see this)

on acas nodes
$ source /afs/usatlas.bnl.gov/lcg/current/etc/profile.d/grid_env.[c]sh
on gridui01
$ source /etc/glite/glite.[c]sh

$ grid-proxy-init

You should see e.g.

$ voms-proxy-init -voms atlas
Your identity: /DC=org/DC=doegrids/OU=People/CN=Tadashi Maeno
Enter GRID pass phrase:
Creating temporary proxy ....................................... Done
Contacting  voms.cern.ch:15001 [/C=CH/O=CERN/OU=GRID/CN=host/voms.cern.ch] "atlas" Done
Creating proxy ........................................................................ Done
Your proxy is valid until Thu Apr 20 05:15:49 2006

Warning, important If it reports User unknown to this VO like

$ voms-proxy-init -voms atlas
Your identity: /DC=org/DC=doegrids/OU=People/CN=Tadashi Maeno
Enter GRID pass phrase:
Creating temporary proxy ....................................... Done
Contacting  lcg-voms.cern.ch:15002 [/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch] "atlas"
Warning: atlas: User unknown to this VO. Error: VERR_SERVERCODE Failed.
Failed to contact servers for atlas.
this means that you have not been registered to the ATLAS VO. You need to join the ATLAS VO first (see How to join the ATLAS VO).


2. DQ2 end-user tools

For CERN/lxplus users

$ source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.[c,z]sh.CERN

For BNL/ACF users

$ source /afs/usatlas.bnl.gov/Grid/Don-Quijote/dq2_user_client/setup.[c,z]sh.BNL

Warning, important setup.sh for bash, setup.zsh for zsh.

For UC tier2 users, see UsingDQ2#UcTier2.

For other site users, see Details on Setup or general setup.sh.


Examples

Copy files from local storage

dq2_get copies data from local storage or over the grid.

$ dq2_get -v mc11.007200.singlepart_mu2.recon.CBNT.v11000303

where mc11.007200.singlepart_mu2.recon.CBNT.v11000303 is a dataset name. You can omit -v. Basically DDM works with datasets. All files in the dataset are copied.

Copy files to a destination directory (note that the directory you are copying to may need to be group writeable: see http://it-dep-fio-ds.web.cern.ch/it-dep-fio-ds/Documentation/gridftp-faq.asp)

$ dq2_get -d /castor/cern.ch/user/t/tmaeno/unko \
     mc11.007200.singlepart_mu2.recon.CBNT.v11000303

Copy two files in a dataset

$ dq2_get mc11.007200.singlepart_mu2.recon.CBNT.v11000303 \
     mc11.007200.singlepart_mu2.recon.CBNT.v11000303._00320.root.1 \
     mc11.007200.singlepart_mu2.recon.CBNT.v11000303._00086.root.2

Copy files over the grid

If files are not found in the local storage, dq2_get stops by default. But if the user authorizes remote retrieval with '-r' option, the files are copied over the grid.

$ dq2_get -rv csc11.005001.pythia_minbias.evgen.EVNT.v11000401

Some files are missing in the local storage
They are in the following sites;
[1] : http://dms02.usatlas.bnl.gov:8000/dq2/
[2] : http://doe-dhcp195.bu.edu:8000/dq2/
Which site to retrieve them from ? [1-2] : 

Choose a site as you like

Which site to retrieve them from ? [1-2] : 1

List datasets matching a given pattern

You can search for datasets by using dq2_ls which lists datasets matching a given pattern. The pattern may contain wildcards representing any strings. The wildcard symbol is the asterisk *.

$ dq2_ls mc11.*.singlepart*.recon.*

mc11.007200.singlepart_mu2.recon.CBNT.v11000303
mc11.007207.singlepart_mu6.recon.AOD.v11000303
mc11.007430.singlepart_singlepi_pt2.recon.AOD.v11000309
... 

File information is available using '-g' or '-f' option. '-g' for central DQ2 catalog and '-f' for LRC.

$ dq2_ls -g mc11.*.singlepart*.recon.*

mc11.007211.singlepart_mu10.recon.AOD.v11000303
  Total: 2198
    mc11.007211.singlepart_mu10.recon.AOD.v11000303._00001.pool.root.2
    mc11.007211.singlepart_mu10.recon.AOD.v11000303._00002.pool.root.1
    ... 

You would see the name of dataset, the total number of files in the dataset, and a list of the available files. Use '-p' option if you need PFNs.


When -f/-g is used, one can apply a selection criteria on number of files in the dataset.

$ dq2_ls -f  pattern criteria
$ dq2_ls -g pattern criteria
There are two key words,
Total
the total number of files
Local
number of local files
Criteria can contain logical operators (and, or ...).
$ dq2_ls -ft csc11*recon*.AOD* "Total>10 and Total<100 and Local>0"

csc11.005101.JimmyWmunu.recon.AOD.v11004103   Total: 49  - Local: 49
csc11.005056.PythiaPhotonJet2.recon.AOD.v11004201   Total: 50  - Local: 49
csc11.007502.singlepart_K3.recon.AOD.v11004103   Total: 50  - Local: 42

Browse datasets on the web

You can browse DQ2 datasets in the DQ2 dataset browser. Datasets can be browsed on the basis of dataset type, associated metadata, and sites at which the datasets are present.

Create user-defined datasets

dq2_put allows the user to create user-defined datasets. DQ2 can see Tier1/Tier2 Storage Elements (castor,dCache,DPM) only, so files need to be copied to SE first. The procedure is as follows;

1) copy files to a local storage where DQ2 can see.

$ rfcp mcatnlo31.005251.Wminmunu._00001.tar.gz /castor/cern.ch/grid/atlas/users/hage
  ...

Warning, important For CERN, the location is under /castor/cern.ch/grid/atlas/ but not /castor/cern.ch/user/. All ATLAS members can write to the former.

2) create a dataset which is composed of the copied files

$ dq2_put -d /castor/cern.ch/grid/atlas/users/hage mydataset.v001

All files in /castor/cern.ch/grid/atlas/users/hage are added to the dataset. Note that you need to give dq_put a PoolFileCatalog if files have own GUIDs like POOL files. i.e.,

$ pool_insertFileToCatalog \
   castor:/castor/cern.ch/grid/atlas/users/hage/mcatnlo31.005251.Wminmunu.digit.RDO._00001.pool.root
  ...
$ dq2_put -p PoolFileCatalog.xml mydataset.v002

Run Athena

See How to run Athena on CSC samples using DQ2.



Troubleshooting

Increase time to timeout for LFC lookup

The following helped when dq2_get -r failed with a timeout message:
export LFC_CONNTIMEOUT=60
export LFC_CONRETRY=2
export LFC_CONRETRYINT=6

How to migrate to DQ2 0.3

Get the latest dq2_* from here and change DQ2_URL_SERVER and DQ2_URL_SERVER_SSL in setup.*sh.* as follows.
export DQ2_URL_SERVER=http://atlddmcat.cern.ch/dq2/
export DQ2_URL_SERVER_SSL=https://atlddmcat.cern.ch:443/dq2/

~/.srmconfig/config.xml: No such file or directory

The error message in dq2_get
.../.srmconfig/config.xml: No such file or directory
mkdir: too few arguments
Try `mkdir --help' for more information.
configuration file not found, configuring srmcp 
is harmless. If you are worried about it, do
$ mkdir -p ~/.srmconfig
then config.xml will be created automatically.

/etc/grid-security/certificates (No such file or directory)

Change ~/.srmconfig/config.xml;

from

<x509_user_trusted_certificates>
/etc/grid-security/certificates
</x509_user_trusted_certificates>
to, e.g.,
<x509_user_trusted_certificates>
/afs/cern.ch/project/gd/LCG-share/certificates
</x509_user_trusted_certificates>
Ask your local admin which directory you should use for x509_user_trusted_certificates.

NameError: global name 'lfc' is not defined

The lfc module is required to access LCG datasets. It is available in LCG UI or via RPM (see Installation). Check if you can import lfc.
$ python -W ignore
>>> import lfc
If the above gives an error, lfc.py is missing in PYTHONPATH.

425 Can't open data connection or 426 Connection timed out

dq2_get uses srmcp by default. srmcp requires incoming connections to copy fies from a remote site. It doesn't work when the PC is behind a firewall. You can make dq2_get use a copy tool which works behind a firewall. For LCG, e.g.,

$ export DQ2_COPY_COMMAND='lcg-cp -v --vo atlas'
$ dq2_get ...
Or use --srmstreams=1.
$ dq2_get --srmstreams=1 ...
See HN.

dq2_get cannot copy datasets from LCG to BNL/acas000X

The firewall on acas000X refuses incoming connections which srmcp requires. There could be two solutions. One is to use one of GridUI PCs like gridui01.usatlas.bnl.gov. In this case, ~/.srmconfig/config.xml needs to contain
  <globus_tcp_port_range>20000,21000</globus_tcp_port_range>
Another solution is to set DQ2_COPY_COMMAND or to use --srmstreams=1 (see above).

Contact

Submit all problems/questions to DQ2/DDM Savannah.


DQ2 end-user tools

dq2_get

NAME
        dq2_get - provide access to DQ2 datasets

        dq2_get [ -h | --help]
                [ -v | --verbose ]
                [ -n | --nfiles ]
                [ -r | --remote ]
                [ -a | --all ]
                [ -c | --choose ]
                [ -p | --parallel n ]
                [ -t | --timeout n ]
                [ -d | --destination destination ]
                [ -s | --source sourceSite ]
                datasetname
                [lfn1 [lfn2 [...]]]
DESCRIPTION

        For datasets already present on a local storage element (SE) data is
        copied to the local directory or to another directory in the SE.

OPTIONS

        -h | --help             Print this message

        -v | --verbose          Verbosity

        -n | --nfiles           Copy N files

        -r | --remote           Copy files over the grid if files are not found in the local SE

        -a | --all              Scan all remote sites to find replicas

        -c | --choose           Choose appropriate site automatically when copy files over the grid

        -p | --parallel         Number of copy threads (default:3)

        -t | --timeout          Timeout limit in second for each file transfer (default:1800)

        -d | --destination      Directory in the storage element where files will be put.
                                Files will be copied to local directory, if omitted.

        -s | --source           Specify source site from which files get copied

dq2_ls

NAME
        dq2_ls - return a list of datasets matching a given pattern

SYNOPSIS

        dq2_ls [ -h | --help]
               [ -v | --verbose ]
               [ -f | --file ]
               [ -p | --pfn ]
               [ -g | --global ]
               [ -l | --long ]
               [ -t | --total ]
               [ -a | --all ]
               [ -r | --replica ]
               [ -c | --count ]
               [ -s | --site site ]
               pattern

DESCRIPTION

        List information about datasets matching a given pattern. The pattern
        may contain wildcards which represent any strings. The wildcard symbol
        is the asterisk *.

OPTIONS

        -h | --help             Print this message

        -v | --verbose          Verbosity

        -f | --files            List files in LRC

        -p | --pfn              List Physical File Names instead of Logical File Names

        -g | --global           List files in global file catalog

        -l | --long             Use long list format

        -t | --total            Print only total information

        -a | --all              Do not hide datasets used by production system internally

        -r | --replica          List replica sites

        -s | --site             List datasets at a site

        -c | --count            Count the number of files. This option disables -f for fast access

dq2_put

NAME
        dq2_put - register datasets to DQ2

SYNOPSIS

        dq2_put [ -h | --help]
                [ -v | --verbose ]
                [ -o | --official ]
                [ -d | --directory directory ]
                [ -p | --pool poolfilecatalog ]
                datasetname

DESCRIPTION

        Registers files to LRC, creates a dataset which is composed of the files, and
        then registers the dataset to DQ2. If a PoolFileCatalog is given, a list of
        files is extracted from the PoolFileCatalog. Otherwise, GUIDs are generated
        for files under a directory using uuidgen.

OPTIONS

        -h | --help             Print this message

        -v | --verbose          Verbosity

        -o | --official         Enable to use official datasetname. Normal users should
                                follow the naming convention for user datasets, i.e.,
                                user.<FirstnameLastName>.<user-controlled string...

        -p | --pool             PoolFileCatalog containing a list of constituent files

        -d | --directory        Directory where constituent files exist. The directory
                                needs to be in local storage where DQ2 can see.
                                This option is ignored when -p is set

        datasetname             Name of dataset

dq2_poolFCjobO

NAME
        dq2_poolFCjobO - create PoolFileCatalog and Athena job-option
        for DQ2 datasets

SYNOPSIS

        dq2_poolFCjobO
               [ -h | --help]
               [ -v | --verbose ]
               [ -s | --storage ]
               [ -p | --pool ]
               [ -j | --jobo ]
               [ -d | --directory directory]
               [ -l | --lcg ]
               datasetname

DESCRIPTION

        dq2_poolFCjobO resolves GUIDs for constituent files of a DQ2 dataset,
        and creates PoolFileCatalog.xml and Athena job-option. The constituent
        files need to be copied to a local area first. If PoolFileCatalog.xml
        already exists, file records are appended. If directory is not given,
        DQ2 local replica catalog is scanned to find corresponding files.

OPTIONS

        -h | --help             Print this message

        -v | --verbose          Verbosity

        -p | --pool             Output PoolFileCatalog.xml

        -j | --jobo             Output job-option file for Athena

        -d | --directory        Directory where files exist. If omitted, local replica
                                catalog is scanned

        -l | --lcg              Access LCG datasets. This option will be removed once
                                LCG prodsys is integrated with DQ2

dq2_register

This tool requires a valid grid certificate and sourcing the LCG Grid environment and dq2 v0.3 setup script.

source /afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2.sh

It is located at

/afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2_register

It also requires that you have the java executable in your PATH (used by underlying file transfer tools), for example on lxplus

export PATH=$PATH:/usr/java/j2re-1.4.2_13/bin

At some point in the future this tool and dq2_put will be merged.

NAME
        dq2_register - upload and register external generator input files to DQ2

SYNOPSIS

        dq2_register
               [ -q ]
               site
               file(s)

DESCRIPTION

        dq2_register is a tool specifically designed for handling external
        generator input files used by the production system.
        
        It will upload local files to the Grid storage at the 
        specified site and register them in the local Grid catalog and in
        DQ2 central catalogs. file(s) is the path(s) to local files and site
        must be a valid TiersOfATLAS destination site. This tool has very
        strict rules on the format of the input files, they must be of the form

        mcatnlo31.005204.ttbar_fulhad._00001.tar.gz

        The dataset the files are registered to is taken from the filename.
        Everything up to the "._" in the filename is the dataset name, in
        this case mcatnlo31.005204.ttbar_fulhad. If the dataset does not 
        exist you will be prompted as to whether to create it. The DQ2 LFN 
        is simply the filename, ie in this case 
        mcatnlo31.005204.ttbar_fulhad._00001.tar.gz and a GUID is generated
        for the file automatically. The tool checks for duplicate LFNs in 
        the dataset and will exit with an error if it finds any.

        If the upload is successful the file is known to DQ2 and can be 
        copied around by subscribing the dataset to another site or using
        DQ2 end-user tools such as dq2_get. 

OPTIONS
        -q                      Suppress logging output

dq2_cleanup

This tool requires setting up the LCG environment (for deleting datasets on LCG sites) and a valid grid certificate, and sourcing the DQ2 v0.3 setup. It runs on SLC4 only.

On SLC4 (lxplus):

source /afs/cern.ch/project/gd/LCG-share/sl4/etc/profile.d/grid_env.[c]sh
source /afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2.sh
voms-proxy-init -voms atlas

The tool is located at

/afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2_cleanup

NAME
        dq2_cleanup - delete a dataset from a site's catalog and storage.

SYNOPSIS

        dq2_cleanup
               [-l]
               [-t timeout]
               dataset
               site

DESCRIPTION

        dq2_cleanup finds files in the given dataset that are present at
        the given site and deletes them from the local catalog and physically
        from the local storage. It also deletes any subscription and location
        of the dataset at this site from the DQ2 central catalogs.

        On LCG dq2_cleanup uses the lcg_del command.

        On OSG dq2_cleanup uses MySQL LRC and Globus gridftp commands.

OPTIONS
        -l                      Do not delete but list the files (SURLs)
                                that would be deleted.
        -t timeout              Gives a timeout in seconds per file for use
                                in lcg-del (default is 300)

dq2_sample

This tool requires a valid grid certificate but does not require sourcing any environment. It is located at

/afs/cern.ch/atlas/offline/external/GRID/ddm/test/dq2_sample

NAME
        dq2_sample - copies a portion of an existing dataset and registers it to DQ2


SYNOPSIS

        dq2_sample
               dataset name
               number of files

DESCRIPTION

        dq2_sample is a tool specifically designed to create a small sample dataset
        from an existing dataset.
        
        It will read the original dataset files and replica information.
        After it creates a new sample dataset registering only the user defined number of files
        and mark the sample dataset replicas on the DQ2 central catalogs.
        
        The sample dataset name will be generated automatically using the following pattern:
        
        user.<user_DN>.sample.<original_dataset_name>
        
        A user can only have a sample dataset per dataset.
        Running this script will automatically erase a previous sample dataset for that dataset.



Details on Setup

1. DQ2 setup

DQ2_URL_SERVER
DQ2 server
$ export DQ2_URL_SERVER=http://atlddmcat.cern.ch/dq2/

DQ2_URL_SERVER_SSL
DQ2 server for secure connection
$ export DQ2_URL_SERVER_SSL=https://atlddmcat.cern.ch:443/dq2/

DQ2_LOCAL_ID
local site ID
For CERN/lxplus users
$ export DQ2_LOCAL_ID=CERN

For BNL/ACF users

$ export DQ2_LOCAL_ID=BNL

For other site users, ask your DDM admin.

If your site doesn't deploy a DQ2 site service,

$ export DQ2_LOCAL_ID=

DQ2_LOCAL_PROTOCOL
Protocol to access the local storage (rfio,castor,dcap,unix,dpm)
For CERN/lxplus users,
$ export DQ2_LOCAL_PROTOCOL=castor

For BNL/ACF users

$ export DQ2_LOCAL_PROTOCOL=dcap

If you use normal disk storage,

$ export DQ2_LOCAL_PROTOCOL=unix

DQ2_STORAGE_ROOT
root directory of local storage
$ export DQ2_STORAGE_ROOT=/pfns

DQ2_SRM_HOST
local SRM server
$ export DQ2_SRM_HOST=srm://castorgrid.cern.ch:8443

DQ2_GSIFTP_HOST
local GridFTP server
$ export DQ2_GSIFTP_HOST=gsiftp://castorgrid.cern.ch:2811

DQ2_USE_SRM
use SRM for all data transfer (default: False)
$ export DQ2_USE_SRM=True

LCG_CATALOG_TYPE
LCG catalog type
export LCG_CATALOG_TYPE=lfc

DQ2_LFC_HOME
LCG_HOME of local replica catalog
export DQ2_LFC_HOME=/grid/atlas

DQ2_COPY_COMMAND
which command is called in dq2_get. Specify this when srmcp doesn't work in your environment
export DQ2_COPY_COMMAND='lcg-cp -v --vo atlas'

2. Setup a program to access local storage

You need to setup a program to access your local storage, such as CASTOR and dCache. Currently rfio and dcap are supported. If you are using normal HDD for the local storage (i.e., DQ2_LOCAL_PROTOCOL=unix), you can skip this section.



More site-specific setup

UC Tier3 (and UC Tier2 prototype)

The instructions on how to use DQ2 End User Tools on the facilities at the University of Chicago have been moved to the MWT2 Twiki:

-- MarcoMambelli - 02 Jul 2007


Installation

The dq2 enduser tools don't require the whole DQ2 stuff. All you need is dq2_* and setup.* which are available here. So the install procedure is e.g.,

$ wget http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/*checkout*/offline/DataManagement/\
DQ2_0_2/endusers/dq2_get?rev=HEAD\&content-type=text/plain
$ mv dq2_get* dq2_get
$ chmod +x dq2_get
$ wget http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/*checkout*/offline/DataManagement/\
DQ2_0_2/endusers/dq2_ls?rev=HEAD\&content-type=text/plain
...
$ wget http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/*checkout*/offline/DataManagement/\
DQ2_0_2/endusers/setup.sh.any?rev=HEAD\&content-type=text/plain

Then configure setup.sh.any (see details). The setup.sh.CERN and setup.sh.BNL in CVS may help.

When you copy data from a remote site to your local PC, your PC needs to be reachable from the remote site (see Access to remote SRM/GridFTP server). Or set DQ2_COPY_COMMAND.

Warning, important dq2_* requires python lfc bindings when accessing LCG datasets. There could be three options;

1) use LCG UI which contains python lfc bindings

https://twiki.cern.ch/twiki/bin/view/LCG/TarUIInstall

2) intall only LFC-interfaces which is available at

http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/RPMS.Release3.0/

3) access OSG/NG datasets only

If you choose 3), you don't need to install python lfc bindings.



Download

The latest version of the end-user tools is available here



Major updates:
-- TadashiMaeno - 15 Dec 2005 -- PedroSalgado - 10 Oct 2006

%RESPONSIBLE% TadashiMaeno
%REVIEW%

Edit | Attach | Watch | Print version | History: r106 < r105 < r104 < r103 < r102 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r106 - 2015-03-18 - RyanTaylor
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Atlas All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback