StoreResults Operations

-- LuisContreras - 28 Feb 2014 -- JulianBadillo - 27 Aug 2014

Introduction

StoreResults workflows are basically merge task workloads. The purpose of this service is to elevate a user dataset to global DBS and PhEDEx. This will make that dataset available for transferring anywhere across CMS sites grid. This service is based on Savannah right now, but will be migrated to ReqMgr 2 soon.

Basically a StoreResults wokflow is composed of the following steps:

  1. Get a ticket from savannah
  2. Migrate dataset from a local dbs to global dbs.
  3. Create and assign a StoreResults workflow in ReqMgr
  4. Wait for the workflow to complete
  5. Announce the workflow
storeresults_flow

StoreResults workflows read data from a local dbsUrl. This creates a problem when uploading a dataset to global: The parent cant be found on global. In order to avoid parentage problems, before a workflow can run the input dataset must be migrated to global DBS. This transfer uses DBSMigration service. Input dataset are commonly located at T3s, T2s and EOS. To elevate datasets at FNAL, It might be necessary to move the data from cmssrm.fnal.gov to cmssrmdisk.fnal.gov in order to allow the agent to read files (If the user dataset was produced before the disk/tape separation).

Savannah Requests

StoreResults users send requests through Savannah https://savannah.cern.ch/task/?group=cms-storeresults. You can also see a more detailed list by clicking the 'Display criteria' box on the list

store_results1

Each ticket contains the information needed to create a workflow:

  • user dataset
  • local dbs url
  • CMSSW release
store_results2

You can manually create a request with reqmgr.py.

Note: We only process tickets that are Open and have status Done, this means that the request has been approved by the physics group. There can be a few exceptional tickets that need to be processed as soon as they are created, we should manually move them to Done before processing. You can find the documentation users know about this service at The StoreResults Service

Before you Start

  • You must have admin privileges to storeResults team in Savannah.
  • If you are using cmssrv95 (StoreResults temporary agent), you may skip the rest of this section. The environment/scripts are already set up at:
      cd ~/storeResults/
      
  • Get the scripts: RequestQuery.py and MigrationToGlobal.py
       curl https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/RequestQuery.py > RequestQuery.py 
       curl https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/MigrationToGlobal.py > MigrationToGlobal.py 
       
  • install mechanize libs:
    • unless you have root access or sudo access to an agent machine, it is advised to install this libraries on your local directory.
    • download the mechanize source code from http://wwwsearch.sourceforge.net/mechanize/download.html
    • untar the library and install it, use the --user option to install in your local user folder.
       tar -xf mechanize-0.2.5.tar.gz cd mechanize-0.2.5 python setup.py install --user 
  • install beautifulsoup following the same steps:
  • You will need also WMAgent python environment and a valid VOMS proxy: Creating a Proxy
  • Load the installed libraries (from you user folder) into the python Path:
     PYTHONPATH=$PYTHONPATH:~/.local/lib/python2.6/site-packages/ 
  • Optional You may create a separate storeResults folder to keep scripts and ticket json files:
     
       mkdir /data/srv/wmagent/storeResults
       mkdir /data/srv/wmagent/storeResults/Tickets
       

Handle Savannah Requests

Note: The following directions assume that you are logged in cmssrv95, or you have set up your environment to work according to the previous section.

  • Set up the environment to run the scripts:
       source /data/admin/wmagent/env.sh
       source /data/srv/wmagent/current/apps/wmagent/etc/profile.d/init.sh 
       cd ~/storeResults/
       
  • Open a python interactive console
     python Python 2.6.8 (unknown, Nov 20 2013, 13:07:46)  [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 
  • Type in the python console the follwing instructions, replace the %USER and %PASSWORD strings with your info:
       from RequestQuery import RequestQuery 
       rq = RequestQuery({'SavannahUser':%USER,'SavannahPasswd':%PASSWORD,'ComponentDir':'/home/cmsdataops/storeResults/Tickets'}) 
       report = rq.getRequests(resolution_id='1',task_status='0')
       
  • 'report' is going to be used in the next section so don't quit the python interpreter.
  • The output of this looks like:
       >>> report=rq.getRequests()
       Processing ticket: 49730
       I tried to Close ticket 49730, dataset is not at DBS url
       Processing ticket: 50360
       ...
       Processing ticket: 50286
         Savannah Ticket     Status  json                         Assigned to  local DBS                                              Sites                                           se_names
       -------------------- ---------- ----- ----------------------------------- ---------- -------------------------------------------------- --------------------------------------------------
                      50360       Done     y      cms-storeresults-jets_met_hcal     phys02                                          T3_US_UMD                                   hepcms-0.umd.edu
                      50286       Done     y                cms-storeresults-ewk     phys02                                         T1_US_FNAL                                    cmssrm.fnal.gov
       
  • This contains the following information:
    • Savannah Ticket: the identification number of the ticket from Savannah
    • Status: Ticket status (Done means that it was approved by physics group)
    • json: If the json file for the given ticket was created, it prints 'y'. If not it prints 'n'
    • Assigned to: Physics group for store results assigned
    • Local DBS: the origin DBS location of the dataset
    • Sites: Site where the input dataset was created (matches from se_name). This is going to be useful if you assign manually the workflow in ReqMgr.
    • se_names: the storage element for the given dataset
Notes
  • The JSON file is generated on the Tickets directory as Ticket_TICKETNUM.json.
  • approval_status option = "1" means that we only care about Done status tickets (approved by physics group)
  • task_status option = "1" means we only look through Open tickets
  • team option = "0" means that we process tickets for all the physic groups
  • 'ComponentDir' is the location where json files are going to be saved, you may use this folder as default if it does not exist:

Migrate Datasets to Global DBS

  • You need to use the same 'report' from the previous section. If you closed the python console, run again the previous steps.
  • In the same python console:
       from MigrationToGlobal import MigrationToGlobal 
       m = MigrationToGlobal() 
       m.Migrates(report)
       
  • Wait until the migrations are done. The output of this looks like:
       >>> m.Migrates(report)
       Migrate: from url https://cmsweb.cern.ch/dbs/prod/phys02/DBSReader dataset: /QCD_Pt_100_2019_14TeV_612_SLHC6_patch1/mmarionn-QCD_Pt_100_2019_14TeV_612_SLHC6_patch1-937aad1f45adb1601c9c68355e9f2630/USER
       Migration submitted: Request 12
       Migrate: from url https://cmsweb.cern.ch/dbs/prod/phys02/DBSReader dataset: /SingleMu/ajkumar-SQWaT_PAT_53X_2012C-PromptReco-v-v2_pt3-3e4086321697e2c39c90dad08848274b/USER
       Migration submitted: Request 19
       Timer started, timeout = 600 seconds
       ...
        All migration requests are done
            Savannah Ticket    Migration id          Migration Status                                                                                                                                                Dataset
       -------------------- --------------- ------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------
                      50360              12                successful                          /QCD_Pt_100_2019_14TeV_612_SLHC6_patch1/mmarionn-QCD_Pt_100_2019_14TeV_612_SLHC6_patch1-937aad1f45adb1601c9c68355e9f2630/USER
                      50286              19                successful                                                        /SingleMu/ajkumar-SQWaT_PAT_53X_2012C-PromptReco-v-v2_pt3-3e4086321697e2c39c90dad08848274b/USER
       
  • Once the migrations are successful, you can submit requests to ReqMgr.
  • Note: The script waits 5 min until all migrations are completed, this is usually enough to finish several migrations, however if some of them take longer, run again the script to check the status (it will submit the migration request only once).
  • Another note: You can increase the waiting time of the script if needed.
  • Another note more: All failed migration requests are deleted from the migration service.

Submit requests to ReqMgr

  • Assign the request only if the migration was successful.
  • You should use reqmgr.py and the JSON file created on the previous steps.
  • Create the request:
     reqmgr.py -u https://cmsweb.cern.ch -f ./storeResults/Tickets/Ticket_51000.json -j reqmgr.py -u https://cmsweb.cern.ch -j '{"createRequest" : {"RequestString" : "StoreResults_51000_v1","Requestor":"YOUR_USER", "ProcessingVersion":1}}' -i 
  • Assign the request using ReqMgr to:
    • team: step0
    • sites: sites where the user dataset is*.
    • Allways check the "Trust site list" to allow xrootd.
    • If (and only if) the dataset is at a T2 or T3, you can choose "Non-custodial subscription" to the given site.
AssignExample.jpg

Notes:

  • If you use cmssrv95 you want to point to the local ReqMgr url http://131.225.206.91:8687 when creating the request.
  • Also if you use cmssrv95, assign to cmsdataops team
  • Increase the version number when assigning retries for the same ticket.
  • If you have not injected a json file to ReqMgr before, read section 3 at: https://github.com/dmwm/WMCore/wiki/All-in-one-test
  • If the input data is at T3_US_FNALLPC (cmseos.fnal.gov), assign to T1_US_FNAL and enable xrootd option. The jobs will run at the tier1, not at the tier3 (cmseos.fnal.gov cannot run jobs)
  • PhEDEx subscriptions: Data should only be subscribed to Disk, never to Tape (for T1 sites). The reason is that users don't have quota for Tape. Then, always pick "Non-Custodial Sites". The other parameters can be default (Subscription Priority = Low, Custodial Subscription Type = Move). This also creates a problem when subscribing to T1_XX_Site_Disk. The agent cant subscribe automatically to Disk only. so subscription request should be done manually for T1 sites (This problem will be solved soon).

Local ReqMgr and WMStats links (cmssrv95.fnal.gov)

Closeout script

  • If you ran the workflow in the central ReqMgr, you should use the regular procedure to close out workflows.
  • If you are running workflows on cmssrv95, please run the close out script available at:

   cd ~/storeResults/scripts/
   python closeOutStoreResults.py
   

  • This script is modified to look into de local ReqMgr and WMStats.

Announcing

The workflow team is the responsible for announcing StoreResults workflows, so once the workflows are closed out:

  • Set the output dataset status to VALID:
       python DBS3SetDatasetStatus.py -d $DATASET -s VALID -r False
       
  • Set the workflow to "announced", you can do that manually with request manager or with announceWorkflows.py

Closing tickets

  • If everything with the workflow is ok, when it is completed just reply to the ticket telling: what is the elevated dataset, site where the dataset is subscribed (PhEDEx). This is an example: https://savannah.cern.ch/task/?50539
  • If there is a problem, reply the ticket with a brief explanation of the problem and possible solutions. This is an example: https://savannah.cern.ch/task/?51219

Transfer files from cmssrm.fnal.gov to cmssrmdisk.fnal.gov

NOTE: Only FNAL Luis can do this, ask him if there you need to transfer files from Tape to Disk (For old datasets).

/store/user data at cmssrm.fnal.gov is not available for the agent to read. Data has to be copied to cmssrmdisk.fnal.gov before the workflow can run. First, you have to create a list of files from the dataset.

  • Create a directory where you can save the lists, and go there. i.e

 
mkdir /tmp/fileListsFNAL
cd /tmp/fileListsFNAL

  • Create the lists by doing:
source ~WmAgentScripts/setenvscript.sh
python ~WmAgentScripts/StoreResults/transferFiles_FNAL.py [dbslocal] [ticket] [dataset]

You find dbslocal, ticket and dataset from the reports that print out RequestQuery and MigrationToGlobal

Now copy the the lists to a FNAL lpc machine (i.e. cmslpc42), login to that machine:

 ssh cmslpc42.fnal.gov 

Then do:

curl https://raw.githubusercontent.com/CMSCompOps/WmAgentScripts/master/StoreResults/ftsuser-transfer-submit-list > ftsuser-transfer-submit-list 
source /uscmst1/prod/grid/gLite_SL5.csh
voms-proxy-init -voms cms:/cms/Role=production -valid 192:00
ftsuser-delegation
./ftsuser-transfer-submit-list FROMDCACHE ~/lists/<Number of the list>.txt

This will show an output like:

[luis89@cmslpc42 ~]$ ./ftsuser-transfer-submit-list-luis FROMDCACHE ~luis89/lists/50543.txt 
Using proxy at /tmp/x509up_u48313


  "cred_id": "4ecc44f421bfa9b5", 
  "job_id": "7a6c044a-e03c-11e3-af2d-782bcb2fb7b4", 
  "job_state": "STAGING", 
  "user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lcontrer/CN=752434/CN=Luis Carlos Contreras Pasuy", 
  "vo_name": "cms", 
  "voms_cred": "/cms/Role=production/Capability=NULL /cms/Role=NULL/Capability=NULL /cms/uscms/Role=NULL/Capability=NULL"
  "cred_id": "4ecc44f421bfa9b5", 
  "job_id": "7bf02936-e03c-11e3-a8ea-782bcb2fb7b4", 
  "job_state": "STAGING", 
  "user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lcontrer/CN=752434/CN=Luis Carlos Contreras Pasuy", 
  "vo_name": "cms", 
  "voms_cred": "/cms/Role=production/Capability=NULL /cms/Role=NULL/Capability=NULL /cms/uscms/Role=NULL/Capability=NULL"


Ongoing transfers can be monitored through the web interface:
    https://cmsfts3-users.fnal.gov:8449/fts3/ftsmon/

To track the transfers you has to use "job_id" at the monitor web interface (https://cmsfts3-users.fnal.gov:8449/fts3/ftsmon/#/).

Note: Documentation about this transfers can be found at tinyurl.com/ftsuser

  • When the workflow is done, data from cmssrmdisk.fnal.gov has to be removed. This transfers are only temporary, then send a mail to FNAL site support with a list of files to be removed (the same list you use in the last step)

StoreResults Validation

The standard procedure to validate StoreResults is:

  • Download the sample json file to be injected to ReqMgr from this link this link
  • Inject the json file into ReqMgr (Please read section 3 first: https://github.com/dmwm/WMCore/wiki/All-in-one-test), change ReqMgr url to the validation one i.e. https://cmsweb-testbed.cern.ch/reqmgr
  • Assign the workflow:
    • you may set the whitelist to T1_US_FNAL, but it is not compulsory, the agent will find where the data is located
    • PLEASE change the ProcessingString from Summer12_DR53X_PU_S10_START53_V7A_v1_TLBSM_53x_v3_99bd99199697666ff01397dad5652e9e to ValidationTest _Summer12_DR53X_PU_S10_START53_V7A_v1_TLBSM_53x_v3
    • You may subscribe the data wherever you want, but just be careful: StoreResults service should never be subscribed to Tape (reason is users dont have tape quota)
  • Depending on the merge parameters you choose, it will create different number of jobs (~22 if default). The workload has to be basically merge jobs, and the corresponding logCollect and Cleanup jobs.
  • Jobs will run fast (if FNAL have available slots, they should complete in a couple of hours max)
  • All jobs should succeed, non custodial subscription should be checked. Also check that the output dataset is uploaded to DBS3 (this is very important! StoreResults workflows are supposed to read from local DBS url, and upload the output to global DBS). If all is fine, then StoreResults workflow is ok.
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg AssignExample.jpg r1 manage 547.9 K 2014-06-04 - 22:17 LuisContreras  
PNGpng store_results1.png r1 manage 225.4 K 2014-08-27 - 14:41 JulianBadillo  
PNGpng store_results2.png r1 manage 89.5 K 2014-08-27 - 14:42 JulianBadillo  
PNGpng storeresults_flow.png r1 manage 55.0 K 2014-08-27 - 15:29 JulianBadillo  
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r18 - 2014-08-29 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback