TWiki> LCG Web>PublishToDASYourself (revision 9)EditAttachPDF

Publish to DBS(DAS) Yourself

Contacts

Introduction

This twiki explains how to publish something to DBS (DAS), w/o the use of crab. The procedure encompasses the following steps:

  • bookkeep file properties in Framework Job Report (fjr) files,
  • create your own JSON file,
  • install WMAgent on a virtual SLC5-x86_64 machine,
  • publish your dataset to the analysis DBS,
  • optional: elevation to global DBS
The procedure has been tested for only few use cases. So, in case of problems, please contact the authors.

It is assumed that your datasets are stored on some storage element that is accessible from the grid. Your files should be located in a subdirectory of the CMS store directory of the particular storage element. This store directory is usually of the form

/<SOMETHING>/cms/<SOMETHING ELSE>/store/

Notice:

  • When publishing files to the analysis DBS, dataset files are not altered and not copied. In other words, publishing to analysis DBS is nothing more than a bookkeeping procedure.
  • Elevating a dataset from the analysis DBS to global DBS encompasses an official copy of your dataset (this is done for you). In global DBS, the copies are publishes and not the original files. For technical reasons, events that were in one single original file may get scattered over several global DBS copies or events from several original files might end up in one single DBS copy.

Disclaimer:
We, the authors of this page, are not DBS experts (far from). We're just sharing our experiences. Let us know if our prescription does not work for you.

Retrieving / Producing fjr files

Framework Job Report (fjr) files summarize the properties of a CMSSW file. In case you have managed your production with crab, fjr files are available in the crab directory. After retrieving the output with 'crab -getoutput -c ', you may list the fjr files as follows:

ls <YOUR_CRAB_DIR>/results/crab_fjr_*.xml

In case you produced your data sets outside crab, you may produce the fjr files yourself with the script mkfjr.py as follows

cd CMSSW_A_B_C/src
cmsenv
python mkfjr <YOUR_CMSSW_FILE>.root <OUTPUT_FJR_FILE>.xml

e.g. for DESY

python dcap://dcache-cms-dcap.desy.de:22125//pnfs/desy.de/cms/tier2/store/user/lveldere/pMSSMInterpretation/test_simulation/pMSSM12_MCMC1_10_260401_output.root pMSSM12_MCMC1_10_260401_fjr.xml

This script will automatically retrieve the lumisections in the CMSSW file, required for publication. For large files this might take a while. Therefore you might consider setting the lumisections by hand using the third optional argument as follows

python mkfjr <YOUR_CMSSW_FILE>.root <OUTPUT_FJR_FILE>.xml '{"runId1":[lumiId1,lumiId2,...],"runId2":[...],...}'
mind the " and ', the parser of this option is very sensitive to the syntax.

Making a JSON File of Your Dataset

The information from the fjr is summarized in a json file with the script mkjson.py. First, put all fjr files in a single directory, then run the script as follows:

cd CMSSW_A_B_C/src
cmsenv
python mkjson.py <fjr files directory> \
                           <dataset path> \
                           <version number> \
                           <global tag> \
                           <application family> \
                           <CMSSW version> \
                           <storage location> \
                           <acquisition era> \
                           <output json filename>

With

  • dataset path the desired path for the dataset,
    e.g. /SMS-T1tttt_Mgluino-350to2000_mLSP-0to1650_8TeV-Pythia6Z/Spring12-PU_START52_V9_FastSim-v1/USER
  • version number the version number for the dataset,
    e.g. 1
  • global tag the global tag used for the production of the data set
    e.g. START42_V11::All
  • application family
    e.g. FastSim
  • CMSSW version
    e.g. CMSSW_5_2_4_patch1
  • storage location
    e.g. cmssrm.fnal.gov (for fermilab), dcache-se-cms.desy.de (for DESY)
  • acquisition era
    e.g. Spring12
  • output json filename
    e.g. T2tt.json

Usual format for datasetpath

 
/<short description of the considered process>/<acquisition era>-<pile up scenario>_<globaltag w/o ::All>_<application family>-v<version number>/USER

IMPORTANT:
If you plan to elevate your dataset to global DBS, the dataset path should be of the form

/<A NAME>/<YOUR HYPERNEWS NAME>-<A NAME>/<A NAME>
Elevation can only work when YOUR HYPERNEWS NAME is known to hypernews.

NOTE:
The script is not tested on the fjr files produced by crab. Probably it will not work on these files, but minor changes chould make it do its job. Please let us know if you have tested the script of fjr files produced by crab.

Obtaining a Virtual Machine from CERN

The installation of WMagent requires 'super user' acess to a SLC5-x86_64 machine. If you don't have this, request a Virtual Machine (VM) on lxplus as follows:

  • go to this site
  • Click on 'Request a Virtual Machine' and fill out the request form with:
Owner: yourUserName
Main User: yourUserName 
Computer Name: yourVMName 
Description: What you will use it for (uploaded to DAS is probably it) 
Host Group: Central Service 
Physical Host: Best rated 
Operating System: Scientific Linux CERN/SLC5-x86_64 
Expiration Date: Anytime up to 6 months CPUs: 1 Memory: 2GB

Submit the request. The processing of the request should not take too long, from a couple of minutes to a day. You will receive a confirmation email from the virtualization service. Once confirmed login as follows:

ssh username@lxplus.cern.ch ssh yourVMName

Setting up WMAgent Enviroment

The general instructions for deploying the WM system can be found at WMDeployment. However, here we are just going to go through what we need to, assuming that you already have a 'fresh' VM with SVN installed. Remember that you have to have 'super user' controls over this machine. First, create the directories you need and cd to them:

sudo mkdir /data/ 
sudo chown <YOUR_USER_NAME> /data 
cd /data/ 
mkdir install

Then check out the deployment materials:

svn co svn+ssh://<USERNAME>@svn.cern.ch/reps/CMSDMWM/Infrastructure/trunk/Deployment cd Deployment 

You might have to change the SCRAM_ARCH variable as follows:

setenv SCRAM_ARCH slc5_amd64_461 # for csh
export SCRAM_ARCH=slc5_amd64_461 # for bash

Now you should be ready to install:

./Deploy -R wmagent@0.8.37 -s prep -A slc5_amd64_gcc461 -t v01 /data/install/ wmagent 
./Deploy -R wmagent@0.8.37 -s sw -A slc5_amd64_gcc461 -t v01 /data/install/ wmagent 
./Deploy -R wmagent@0.8.37 -s post -A slc5_amd64_gcc461 -t v01 /data/install/ wmagent
 

In case of problems, the logfile /data/install/v01/sw/bootstrap-slc5_amd64_gcc461.log might help. The option -A must match whatever your SCRAM_ARCH variable is set to and the wmagent version number (in this case 0.8.37) may change over time. (hopefully this version will work for a while). The Deployment may take around 5 min and it should says "Installation successful" upon completing.

After the installation, and at every logon, set the environment as follows:

bash # if your default shell is not bash 
source /data/install/current/sw/slc5_amd64_gcc461/cms/wmagent/0.8.37/etc/profile.d/init.sh 
export PYTHONPATH=$PYTHONPATH:/data/install/v01/sw/slc5_amd64_gcc461/cms/dbs-client/DBS_2_1_6-comp3/lib/

Publish to DBS

Once you have done all this you are finally ready to publish to DAS with the script publishToDBS.py. Run it as follows:

python publishToDBS.py <file.tgz> 100
where 100 is the block size (this value is supposed to be appropriate). the tgz archive should contain or or more json files for publication, the script will run over all files contained in the archive. Create the tgz file e.g. as follows
tar -czf file.tgz *.json

Note The script will check which of the files listed in the json files are already published. files that are already published are skipped. So, it is easy to publish the files in several goes.

Elevation to global DBS

After publishing to the analysis dbs, a dataset can be elevated to global dbs. This requires approval from the PAG or POG convenors. After approval, follow the procedure outlined in WorkBookGroupActivities#The_StoreResults_Service

Important Once a dataset is published in global dbs, it is impossible to add further files

-- ChristopherSilkworth - 19-Apr-2012

Topic attachments
I Attachment History Action Size Date Who CommentSorted ascending
Texttxt mkfjr.py.txt r1 manage 3.3 K 2012-08-07 - 17:42 LukasVanelderen  
Texttxt mkjson.py.txt r1 manage 2.2 K 2012-08-07 - 17:42 LukasVanelderen  
Texttxt publish.py.txt r1 manage 7.8 K 2012-09-13 - 18:13 LukasVanelderen  
Texttxt publishToDBS.py.txt r1 manage 4.8 K 2012-08-07 - 17:49 LukasVanelderen  
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2012-09-13 - LukasVanelderen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback