StoreResults Operations

-- LuisContreras - 28 Feb 2014 -- JulianBadillo - 27 Aug 2014

Introduction

StoreResults workflows are basically merge task workloads. The purpose of this service is to elevate a user dataset to global DBS and PhEDEx. This will make that dataset available for transferring anywhere across CMS sites grid. This service is based on Savannah right now, but will be migrated to ReqMgr 2 soon.

Basically a StoreResults wokflow is composed of the following steps:

  1. Get a ticket on GGUS
  2. Migrate dataset from a local dbs to global dbs.
  3. Create and assign a StoreResults workflow in ReqMgr
  4. Wait for the workflow to complete
  5. Announce the workflow
storeresults_flow

StoreResults workflows read data from a local dbsUrl. This creates a problem when uploading a dataset to global: The parent cant be found on global. In order to avoid parentage problems, before a workflow can run the input dataset must be migrated to global DBS. This transfer uses DBSMigration service. Input dataset are commonly located at T3s, T2s and EOS. To elevate datasets at FNAL, It might be necessary to move the data from cmssrm.fnal.gov to cmssrmdisk.fnal.gov in order to allow the agent to read files (If the user dataset was produced before the disk/tape separation).

GGUS ticket

StoreResults users send requests through GGUS https://ggus.eu/?mode=ticket_cms, the ticket should have the following information: Creating a Store Results Request

Note For filtering the ggus ticket you can use the "search" option in GGUS, and look for tickets assigned to "CMS Workflows" Support Unit see here:

Each ticket contains the information needed to create a workflow:

  • user dataset
  • local dbs url
  • CMSSW release
  • Physics group.

Look an example ticket here: 110773

You can manually create a request with reqmgr.py.

Before you Start

  • install mechanize libs:
    • unless you have root access or sudo access to an agent machine, it is advised to install this libraries on your local directory.
    • download the mechanize source code from http://wwwsearch.sourceforge.net/mechanize/download.html
    • untar the library and install it, use the --user option to install in your local user folder.
       tar -xf mechanize-0.2.5.tar.gz cd mechanize-0.2.5 python setup.py install --user 
  • install beautifulsoup following the same steps:
  • You will need also WMAgent python environment and a valid VOMS proxy: Creating a Proxy
  • Load the installed libraries (from you user folder) into the python Path:
     PYTHONPATH=$PYTHONPATH:~/.local/lib/python2.6/site-packages/ 
  • Optional You may create a separate storeResults folder to keep scripts and ticket json files:
     
       mkdir /data/srv/wmagent/storeResults
       mkdir /data/srv/wmagent/storeResults/Tickets
       
  • Get the following scripts from https://github.com/CMSCompOps/WmAgentScripts/tree/master/StoreResults:
    • MigrationToGlobal.py: Migrates the dataset in DBS
    • RequestQuery.py: Creates the json file with the workflow information.
    • createStoreResult.py: Wraps up both and executes.
  • Note: MigrationToGlobal.py and RequestQuery.py are centrally shared in WMCore repository WMore GitHub, but we keep a local copy in WmAgentScripts.

Handle Savannah Requests - DEPRECATED

  • NOTE: This instructions don't apply anymore since Savannah is retired
  • Set up the environment to run the scripts:
       source /data/admin/wmagent/env.sh
       source /data/srv/wmagent/current/apps/wmagent/etc/profile.d/init.sh 
       cd ~/storeResults/
       
  • Open a python interactive console
     python Python 2.6.8 (unknown, Nov 20 2013, 13:07:46)  [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 
  • Type in the python console the follwing instructions, replace the %USER and %PASSWORD strings with your info:
       from RequestQuery import RequestQuery
       ticket = 110773
       input_dataset = '/TT_scaleup_CT10_TuneZ2star_8TeV-powheg-tauola/jpilot-Summer12-START53_V7C_FSIM-v1_TLBSM_53x_v3-3eec1c547e1536755bef831bcbf18d7a/USER'
       dbs_url = 'phys03'
       cmssw_release = 'CMSSW_5_3_8_patch3'
       group_name = 'B2G'
       rq = RequestQuery({'ComponentDir':'/home/cmsdataops/storeResults/Tickets'}) 
       report = rq.createRequestJSON(ticket, input_dataset, dbs_url, cmssw_release, group_name)
       #You can also print the report
       rq.printReport(report)
       
  • 'report' is going to be used in the next section so don't quit the python interpreter.
  • The output of this looks like:
       >>> report = rq.createRequestJSON(ticket, input_dataset, dbs_url, cmssw_release, group_name)
    Processing ticket: 110773
                  Ticket  json  local DBS                                              Sites                                           se_names
    -------------------- ----- ---------- -------------------------------------------------- --------------------------------------------------
                  110773     y     phys03                                                                                         T3_US_FNALLPC
    
       
  • This contains the following information:
    • Ticket: the identification number of the ticket from GGUS
    • json: If the json file for the given ticket was created or not
    • Local DBS: the origin DBS location of the dataset
    • Sites: Site where the input dataset was created (matches from se_name). This is going to be useful if you assign manually the workflow in ReqMgr.
    • se_names: the storage element for the given dataset

Notes

  • The JSON file is generated on the Tickets directory as Ticket_TICKETNUM.json.
  • 'ComponentDir' is the location where json files are going to be saved, you may use this folder as default if it does not exist:

Migrate Datasets to Global DBS

  • Run createStoreResults.py with the following information, that should come in the ticket.
    python createStoreResults.py TICKET DATASET DBS_URL CMSSW_RELEASE GROUP_NAME
    TICKET: The ticket # in GGUS, could be any number, this is used only for tracking.
    DATASET: The input dataset, it has to be located at the same Tier-2 which is used to finally hold the group data in /store/results/
    DBS_URL: For example: "phys01" for https://cmsweb.cern.ch/dbs/prod/phys01/DBSReader.
    CMSSW_RELEASE: which should be used for merging step. In general, use always the version used for the dataset production, if the version is outdated you need to check which is the closest version available.
    GROUP_NAME: The physics group (HIN, HIG, SUS, etc.) requesting the migration, this will determine the subdirectory below /store/results/ and sets the appropriate Phedex accounting group tag. 
       
  • Wait until the migrations are done. The output of this looks like:
    Migrate: from url https://cmsweb.cern.ch/dbs/prod/phys03/DBSReader dataset: /TT_scaleup_CT10_TuneZ2star_8TeV-powheg-tauola/jpilot-Summer12-START53_V7C_FSIM-v1_TLBSM_53x_v3-3eec1c547e1536755bef831bcbf18d7a/USER
    Migration submitted: Request 105003
    Timer started, timeout = 600 seconds
    Querying migrations status...
    Migration to global succeed: /TT_scaleup_CT10_TuneZ2star_8TeV-powheg-tauola/jpilot-Summer12-START53_V7C_FSIM-v1_TLBSM_53x_v3-3eec1c547e1536755bef831bcbf18d7a/USER
    All migration requests are done
         Savannah Ticket    Migration id          Migration Status                                                                                                                                                Dataset
    -------------------- --------------- ------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------
                  110773          105003                successful                  /TT_scaleup_CT10_TuneZ2star_8TeV-powheg-tauola/jpilot-Summer12-START53_V7C_FSIM-v1_TLBSM_53x_v3-3eec1c547e1536755bef831bcbf18d7a/USER
       
  • Once the migrations are successful, the script will submit requests to ReqMgr.

Some Notes and Remarks!

  • The script waits 5 min until all migrations are completed, this is usually enough to finish several migrations, however if some of them take longer, run again the script to check the status (it will submit the migration request only once).
    • You can increase the waiting time of the script if needed.
    • All failed migration requests are deleted from the migration service.
  • If you get an error message like:
    Error on ticket 1111111 due to ScramArch mismatch
    You are using and outdated version of CMSSW, you have to check and use the closest version available.

Submit requests to ReqMgr

  • Assign the request only if the migration was successful.
  • You should use reqmgr.py and the JSON file created on the previous steps.
  • Create the request:
    python WmAgentScripts/reqmgr.py -u https://cmsweb.cern.ch -f ./storeResults/Tickets/Ticket_110773.json -j {"createRequest":{"RequestString":"StoreResults_110773_v1","Requestor":"jbadillo","ProcessingVersion":1}} --createRequest 
  • You should see an output like this:
    Processing command line arguments: '['WmAgentScripts/reqmgr.py', '-u', 'https://cmsweb.cern.ch', '-f', './storeResults/Tickets/Ticket_110773.json', '-j', '{"createRequest":{"RequestString":"StoreResults_110773_v1","Requestor":"jbadillo","ProcessingVersion":1}}', '--createRequest']' ...
    ....
    INFO:root:Loading file './storeResults/Tickets/Ticket_110773.json' ...
    ....
    INFO:root:Create request 'jbadillo_StoreResults_110773_v1_150121_121546_1955' succeeded.
    INFO:root:Approving request 'jbadillo_StoreResults_110773_v1_150121_121546_1955' ...
    INFO:root:Request: PUT /reqmgr/reqMgr/request ...
    INFO:root:Approve succeeded.
       
  • Take note of the name of the workflow created.
  • Assign the workflow using ReqMgr to:
    • team: step0
    • sites: sites where the user dataset is*.
    • Allways check the "Trust site list" to allow xrootd.
    • If (and only if) the dataset is at a T2 or T3, you can choose "Non-custodial subscription" to the given site.
AssignExample.jpg

Notes:

  • Increase the version number when assigning retries for the same ticket.
  • If you have not injected a json file to ReqMgr before, read section 3 at: https://github.com/dmwm/WMCore/wiki/All-in-one-test
  • If the input data is at T3_US_FNALLPC (cmseos.fnal.gov), assign to T1_US_FNAL and enable xrootd option. The jobs will run at the tier1, not at the tier3 (cmseos.fnal.gov cannot run jobs)
  • PhEDEx subscriptions: Data should only be subscribed to Disk, never to Tape (for T1 sites). The reason is that users don't have quota for Tape. Then, always pick "Non-Custodial Sites". The other parameters can be default (Subscription Priority = Low, Custodial Subscription Type = Move). This also creates a problem when subscribing to T1_XX_Site_Disk. The agent cant subscribe automatically to Disk only. so subscription request should be done manually for T1 sites (This problem will be solved soon).

Announcing

The workflow team is the responsible for announcing StoreResults workflows, so once the workflows are closed out:

  • Set the output dataset status to VALID:
       python DBS3SetDatasetStatus.py -d $DATASET -s VALID -r False
       
  • Set the workflow to "announced", you can do that manually with request manager or with announceWorkflows.py

Closing tickets

  • If everything with the workflow is ok, when it is completed just reply to the ticket telling: the name of the elevated dataset (output dataset of the workflow), and site where the dataset is subscribed (PhEDEx), something like the following:
    Hi,
    The elevated dataset is:
    /TT_scaleup_CT10_TuneZ2star_8TeV-powheg-tauola/StoreResults-Summer12_START53_V7C_FSIM_v1_TLBSM_53x_v3_3eec1c547e1536755bef831bcbf18d7a-v1/USER
    A replica available to transfer within the CMS grid can be found in PhEDEx at: T1_US_FNAL_Disk
    Please check that everything is ok and let us know if there is a problem.
    Thanks,
    Workflow Team
       
  • Then change the ticket status to "solved".
  • If there is a problem, reply the ticket with a brief explanation of the problem and possible solutions. This is an example: https://savannah.cern.ch/task/?51219

Transfer files from cmssrm.fnal.gov to cmssrmdisk.fnal.gov

NOTE: Only FNAL Luis can do this, ask him if there you need to transfer files from Tape to Disk (For old datasets).

/store/user data at cmssrm.fnal.gov is not available for the agent to read. Data has to be copied to cmssrmdisk.fnal.gov before the workflow can run. First, you have to create a list of files from the dataset.

  • Create a directory where you can save the lists, and go there. i.e

 
mkdir /tmp/fileListsFNAL
cd /tmp/fileListsFNAL

  • Create the lists by doing:
source ~WmAgentScripts/setenvscript.sh
python ~WmAgentScripts/StoreResults/transferFiles_FNAL.py [dbslocal] [ticket] [dataset]

You find dbslocal, ticket and dataset from the reports that print out RequestQuery and MigrationToGlobal

Now copy the the lists to a FNAL lpc machine (i.e. cmslpc42), login to that machine:

 ssh cmslpc42.fnal.gov 

Then do:

curl https://raw.githubusercontent.com/CMSCompOps/WmAgentScripts/master/StoreResults/ftsuser-transfer-submit-list > ftsuser-transfer-submit-list 
source /uscmst1/prod/grid/gLite_SL5.csh
voms-proxy-init -voms cms:/cms/Role=production -valid 192:00
ftsuser-delegation
./ftsuser-transfer-submit-list FROMDCACHE ~/lists/<Number of the list>.txt

This will show an output like:

[luis89@cmslpc42 ~]$ ./ftsuser-transfer-submit-list-luis FROMDCACHE ~luis89/lists/50543.txt 
Using proxy at /tmp/x509up_u48313


  "cred_id": "4ecc44f421bfa9b5", 
  "job_id": "7a6c044a-e03c-11e3-af2d-782bcb2fb7b4", 
  "job_state": "STAGING", 
  "user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lcontrer/CN=752434/CN=Luis Carlos Contreras Pasuy", 
  "vo_name": "cms", 
  "voms_cred": "/cms/Role=production/Capability=NULL /cms/Role=NULL/Capability=NULL /cms/uscms/Role=NULL/Capability=NULL"
  "cred_id": "4ecc44f421bfa9b5", 
  "job_id": "7bf02936-e03c-11e3-a8ea-782bcb2fb7b4", 
  "job_state": "STAGING", 
  "user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lcontrer/CN=752434/CN=Luis Carlos Contreras Pasuy", 
  "vo_name": "cms", 
  "voms_cred": "/cms/Role=production/Capability=NULL /cms/Role=NULL/Capability=NULL /cms/uscms/Role=NULL/Capability=NULL"


Ongoing transfers can be monitored through the web interface:
    https://cmsfts3-users.fnal.gov:8449/fts3/ftsmon/

To track the transfers you has to use "job_id" at the monitor web interface (https://cmsfts3-users.fnal.gov:8449/fts3/ftsmon/#/).

Note: Documentation about this transfers can be found at tinyurl.com/ftsuser

  • When the workflow is done, data from cmssrmdisk.fnal.gov has to be removed. This transfers are only temporary, then send a mail to FNAL site support with a list of files to be removed (the same list you use in the last step)

StoreResults Validation

The standard procedure to validate StoreResults is:

  • Download the sample json file to be injected to ReqMgr from this link this link
  • Inject the json file into ReqMgr (Please read section 3 first: https://github.com/dmwm/WMCore/wiki/All-in-one-test), change ReqMgr url to the validation one i.e. https://cmsweb-testbed.cern.ch/reqmgr
  • Assign the workflow:
    • you may set the whitelist to T1_US_FNAL, but it is not compulsory, the agent will find where the data is located
    • PLEASE change the ProcessingString from Summer12_DR53X_PU_S10_START53_V7A_v1_TLBSM_53x_v3_99bd99199697666ff01397dad5652e9e to ValidationTest _Summer12_DR53X_PU_S10_START53_V7A_v1_TLBSM_53x_v3
    • You may subscribe the data wherever you want, but just be careful: StoreResults service should never be subscribed to Tape (reason is users dont have tape quota)
  • Depending on the merge parameters you choose, it will create different number of jobs (~22 if default). The workload has to be basically merge jobs, and the corresponding logCollect and Cleanup jobs.
  • Jobs will run fast (if FNAL have available slots, they should complete in a couple of hours max)
  • All jobs should succeed, non custodial subscription should be checked. Also check that the output dataset is uploaded to DBS3 (this is very important! StoreResults workflows are supposed to read from local DBS url, and upload the output to global DBS). If all is fine, then StoreResults workflow is ok.
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg AssignExample.jpg r1 manage 547.9 K 2014-06-04 - 22:17 LuisContreras  
PNGpng store_results1.png r1 manage 225.4 K 2014-08-27 - 14:41 JulianBadillo  
PNGpng store_results2.png r1 manage 89.5 K 2014-08-27 - 14:42 JulianBadillo  
PNGpng storeresults_flow.png r2 r1 manage 77.0 K 2015-01-21 - 14:36 JulianBadillo  
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2015-10-09 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback