5.7 Data Analysis with CMS Connect


Introduction

CMS Connect is a service designed to provide a Tier3-like environment for condor analysis jobs and enables users to submit to all resources available in the CMS Global Pool. It is a complementary service to CRAB.

  • Use CRAB for e.g large scale datasets processing via cmsRun
  • Use CMS Connect for user-defined scripts via condor for late-analysis non-cmsRun jobs like e.g: making histograms, plots, analyzing trees, etc.
For users, interacting with CMS Connect is similar to interacting with a private cluster or e.g the LPC CAF via condor, the main difference is that jobs do not run on a local pool but are sent to Tier Resources available in the CMS Global Pool, hence it does not provide a shared-filesystem infrastructure or include your CERN/FNAL AFS Home area for example.

Service Website:

CMS Connect website: https://connect.uscms.org/

Tutorials

CMS Connect Tutorial - 17 June 2016 at Fermilab, U.S.A.

Running gridpacks with CMS Connect

Documentation

Prerequisites

Just like CRAB, CMS Connect submits to the Grid (LCG), so you need to apply for certificates and sign it to the CMS VOMS server. You can follow Section 5.1 to get a CERN CA and register to the CMS VO.

QuickStart

This is a quick start page which should take only a few minutes to complete. It will show how to:

  • Sign-up to the service
  • Check and setup your proxy certificates
  • Submit your jobs

Sign-up to CMS Connect

First, you will need to register to CMS Connect in order to get an account to the submission machine. Go to the registration site and follow instructions there. Once registered you will be authorized to use login.uscms.org (the Condor submit host) authenticating with your User ID (netid).

After approval, you will need to create and upload your public ssh-key to your account.

Create and Upload SSH Keys

Step 1: Generating SSH Keys

We will discuss how to generate a SSH key pair on both Unix-based and Windows.

Please note: The key pair consist of a private key and a public key. Keep the private key on machines that you have direct access to, i.e. your local machine (your laptop or desktop). Unix-based operating system (Linux/Mac)

On your local machine: Generate ssh-keys

mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t rsa

The last command will produce a prompt similar to

Generating public/private rsa key pair.
Enter file in which to save the key (/home/<local_user_name>/.ssh/id_rsa):

Unless you want to change the location of the key, continue by pressing enter. Now you will be asked for a passphrase. Enter a passphrase that you will be able to remember and which is secure:

Enter passphrase (empty for no passphrase):
 Enter same passphrase again:

When everything has successfully completed, the output should resemble the following:

Your identification has been saved in /home/<local_user_name>/.ssh/id_rsa.
 Your public key has been saved in /home/<local_user_name>/.ssh/id_rsa.pub.
 The key fingerprint is:
 ae:89:72:0b:85:da:5a:f4:7c:1f:c2:43:fd:c6:44:38 myname@mymac.local
The key's randomart image is:
+--[ RSA 2048]----+
 |                 |
 |         .       |
 |        E .      |
 |   .   . o       |
 |  o . . S .      |
 | + + o . +       |
 |. + o = o +      |
 | o...o * o       |
 |.  oo.o .        |
 +-----------------+

Windows Putty

Open the PuTTYgen program. For Type of key to generate, select SSH-2 RSA. Click the Generate button. Move your mouse in the area below the progress bar. When the progress bar is full, PuTTYgen generates your key pair. Type a passphrase in the Key passphrase field. Type the same passphrase in the Confirm passphrase field. You can use a key without a passphrase, but this is not recommended. Click the Save private key button to save the private key. Warning! You must save the private key. You will need it to connect to your machine. Right-click in the text field labeled Public key for pasting into OpenSSH authorized_keys file and choose Select All. Right-click again in the same text field and choose Copy.

Follow the instructions here to generate keys:

https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#platform-windows

Step 2: Add the public SSH key to login node
CMS Website

To add your public key to the Globus Online interface:

Go to http://connect.uscms.org

Go to "Profile"

Click on "Edit Profile" and add your key in the following box:

The key is now added to your profile in Globus Online. This will automatically be added to the login nodes within a couple hours.

Troubleshooting
Permission denied (publickey)

If SSH returns the error

Permission denied (publickey).

This most likely means that the remote permissions are too unconstrained. Please execute:

chmod go-w ~/
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

To verify access

$ ssh netid@login.uscms.org 

Setting up your proxy certificates

You will need a valid VO CMS grid proxy certificate in order to be able to submit your jobs.

  • If you don't have a valid grid proxy certificate, you need to apply for it. You can find more information on how to setup your proxy certificates here.
  • If you already have a valid grid proxy certificate installed another remote machine like lxplus.cern.ch, you can simply copy them to the CMS Connect login machine. Just execute the following command from login.uscms.org and enter the username and password to access that remote machine.
$ copy_certificates

<b>=============================================================================</b>

This script checks if you have globus certificates or lets you copy them from another machine otherwise (default: lxplus.cern.ch)

      
         $ NOTE: New certificates need to be requested first. Follow this Twiki for that:

https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookStartingGrid#ObtainingCert

<b>=============================================================================</b>

Check for certificates in /home/yourusername/.globus

...

Couldn't find any certificates. Copying certificates from another machine Note: This requires certificates to be under the standard $HOME/.globus location ...

Enter hostname of machine to login: lxplus.cern.ch Enter username for lxplus.cern.ch: yourusername

Warning: Permanently added the RSA host key for IP address '188.184.70.205' to the list of known hosts.

Password:

usercert.pem 100% 3526 3.4KB/s 00:00 userkey.pem 100% 2009 2.0KB/s 00:01

All Done... You can execute the following to initialize your proxy:

voms-proxy-init -voms cms -valid 192:00 

Submitting your Jobs

  • You can now start submitting your condor jobs. Your jobs will be sent to all available CMS Tier Sites by default, but you can specify the CMS Sites you desire to use following this section. Jobs will be reported to the CMS Dashboard.
  • If you don't have experience submitting condor jobs, you can read the following Quick Tutorial.
  • To learn how to pack a CMSSW release and use it in your workflow, a CMSSW analysis example can be found here: CMSSW Analysis Example.

For example, you can do the following to submit an example job to all US and non-US Tier Sites in the Global Pool:


$ tutorial quickstart

Installing quickstart (master)...

Tutorial files installed in ./tutorial-quickstart.

Running setup in ./tutorial-quickstart...

$ cd tutorial-quickstart

$ condor_submit tutorial01.submit 

Additional features

Requesting different Operative Systems

The CMS Global Pool is integrated with singularity, a container solution that allows requesting different operative systems via containers if necessary. For example, you can add:

+REQUIRED_OS = "rhel7"

to your submit file to run jobs under RedHat 7 on Sites supporting Singularity.

For more information on how to use singularity on CMS Connect, click here.

Submitting GPU jobs

To request a GPU slot for a job, you can simply add the attribute request_gpus (as well as request_cpus and request_memory) to the job.

Currently, a job can only use 1 GPU at the time.
request_gpus = 1
+RequiresGPU=1
request_cpus = 1
request_memory = 2 GB

Note the number of GPU resources in CMS is still limited at present, so the matching can take longer than regular (cpu) jobs.

It is currently not possible to specify exactly what type of GPU you want, but you can match on for example CUDA compute capability. For example, use the following requirements expression in your job: requirements = CUDACapability >= 3

For information about submitting GPU jobs to the Global Pool via CMS Connect, see think link.

Using TensorFlow (CPU and GPU)

OSG-built TensorFlow containers for CPU/GPUs are based on Ubuntu 16.04. You can follow the example below in order to use xrdcp and stashcp from inside these containers:

# Load the software
$ export LD_LIBRARY_PATH=/opt/xrootd/lib:$LD_LIBRARY_PATH
$ export PATH=/opt/xrootd/bin:/opt/StashCache/bin:$PATH

Now, to copy a file inside /stash/user/khurtado/work/test.txt with either one of these tools, you can do:

$ xrdcp root://stash.osgconnect.net:1094//user/khurtado/work/test.txt .
$ stashcp /user/khurtado/work/test.txt .

Selecting Sites

In order to run jobs on specific CMS Sites, you need to provide a list of resources separated by a comma using the HTCondor DESIRED_Sites ClassAd.

For example, to run a job on T2_US_Purdue and T2_US_UCSD only, you would use the following line in your submit file:

+DESIRED_Sites="T2_US_Purdue,T2_US_UCSD"

Setting Sites on your shell session

If you don't set +DESIRED_Sites in your submission file, all US Tier-2 and Tier-3 Sites will be used by default. You can change the default behavior on your shell session with by sourcing /etc/ciconnect/set_condor_sites.sh

# Usage: source /etc/ciconnect/set_condor_sites.sh "<pattern>"
# Examples:
#           - All Sites:       set_condor_sites "T*"
#           - T2 Sites:        set_condor_sites "T2_*"
#           - Tier US Sites:   set_condor_sites "T?_US_*"
$ source /etc/ciconnect/set_condor_sites.sh "T*"
 
    All Done
    Note: To verify your list of sites, simply do:
 
    echo $CONDOR_DEFAULT_DESIRED_SITES
    NOTE: Remember that condor submission files with +DESIRED_Sites
    NOTE: will give priority to that over $CONDOR_DEFAULT_DESIRED_SITES
 
    $CONDOR_DEFAULT_DESIRED_SITES has been set to:
 
    T3_BY_NCPHEP,T2_IT_Bari,T3_US_Baylor,T2_CN_Beijing,T3_IT_Bologna,T2_UK_SGrid_Bristol,T2_UK_London_Brunel,T1_FR_CCIN2P3,T2_FR_CCIN2P3,T0_CH_CERN,T2_CH_CERN,T2_CH_CERN_AI,T2_CH_CERN_HLT,T2_ES_CIEMAT,T1_IT_CNAF,T3_CN_PKU,T2_CH_CSCS,T2_TH_CUNSTDA,T2_US_Caltech,T3_US_Colorado,T3_US_Cornell,T2_DE_DESY,T2_EE_Estonia,T3_RU_FIAN,T3_US_FIT,T1_US_FNAL,T3_US_FNALLPC,T3_US_Omaha,T2_US_Florida,T2_FR_GRIF_IRFU,T2_FR_GRIF_LLR,T2_BR_UERJ,T2_FI_HIP,T2_AT_Vienna,T2_HU_Budapest,T3_GR_IASA,T2_UK_London_IC,T2_ES_IFCA,T2_RU_IHEP,T2_BE_IIHE,T3_FR_IPNL,T3_IT_Napoli,T2_RU_INR,T2_FR_IPHC,T2_RU_ITEP,T2_GR_Ioannina,T3_US_JHU,T2_RU_JINR,T1_RU_JINR,T2_UA_KIPT,T1_DE_KIT,T2_KR_KNU,T3_KR_KNU,T3_US_Kansas,T2_IT_Legnaro,T2_BE_UCL,T2_TR_METU,T2_US_MIT,T2_PT_NCG_Lisbon,T2_PK_NCP,T3_TW_NCU,T3_TW_NTU_HEP,T3_US_NotreDame,T2_TW_NCHC,T2_US_Nebraska,T3_US_NU,T3_US_OSU,T3_ES_Oviedo,T3_UK_SGrid_Oxford,T1_ES_PIC,T2_RU_PNPI,T3_CH_PSI,T3_IN_PUHEP,T3_IT_Perugia,T2_IT_Pisa,T3_US_Princeton_ICSE,T2_US_Purdue,T3_UK_London_QMUL,T1_UK_RAL,T2_DE_RWTH,T3_US_Rice,T2_IT_Rome,T3_US_Rutgers,T2_UK_SGrid_RALPP,T2_RU_SINP,T2_BR_SPRACE,T2_PL_Swierk,T3_US_MIT,T3_US_NERSC,T3_US_SDSC,T3_CH_CERN_CAF,T3_HU_Debrecen,T3_US_FIU,T3_US_FSU,T3_US_OSG,T3_US_TAMU,T2_IN_TIFR,T3_US_TTU,T3_IT_Trieste,T3_US_UCR,T3_US_UCD,T3_US_UCSB,T2_US_UCSD,T3_UK_ScotGrid_GLA,T3_US_UMD,T3_US_UMiss,T3_CO_Uniandes,T3_KR_UOS,T2_MY_UPM_BIRUNI,T3_US_PuertoRico,T3_BG_UNI_SOFIA,T3_UK_London_UCL,T2_US_Vanderbilt,T2_PL_Warsaw,T2_US_Wisconsin,T3_MX_Cinvestav

Site Lists:

A list of all CMS Sites can be obtained via get_condor_sites.

$ get_condor_sites
 
    Usage: get_condor_sites <pattern>
    -----------------------------------
    Examples:
 
    - All Sites:       get_condor_sites T*
    - T2 Sites:        get_condor_sites T2_*
    - Tier US Sites:   get_condor_sites T?_US_*

Here is a list of the current Sites available, followed by some code block examples for DESIRED_Sites that you can use in your job submission files:

Tier-2 Resources: T2_US_MIT,T2_US_Florida, T2_US_Purdue, T2_US_UCSD, T2_US_Vanderbilt, T2_US_Wisconsin, T2_US_Caltech, T2_US_Nebraska

Tier-3 Resources: T3_US_Baylor, T3_US_Colorado, T3_US_Cornell, T3_US_FIT, T3_US_FIU, T3_US_NotreDame, T3_US_Rutgers, T3_US_TAMU,T3_US_TTU, T3_US_UCD, T3_US_UCR, T3_US_PuertoRico, T3_US_UCSB, T3_US_UMD, T3_US_UMiss, T3_US_OSU

Reporting to CMS Dashboard

Condor jobs submitted via CMS-Connect will be automatically reported to CMS Dashboard, in a similar way CRAB does. A basic report that doesn't require any particular action from the user is done by default, but users are encouraged to provide a few parameters in their submission workflows in order to do handle e.g: stage-in, stage-out and full error code management in the report.

The reporting procedure is done in 2 steps:

  1. Report from the Submission Machine:
    The whole task is registered and sent to Dashboard from the submission machine while using condor_submit.
  2. Report from the Worker Node
    Each job is reported once it is assigned to an available machine and executed from it.
    As opposed to regular CRAB workflows, users define their own submission scripts in CMS-Connect (as in any regular condor workflow). Due to this fact, tasks like stage-out, stage-in and error code management are implemented and handled by each user. For this reason, only a few parameters are reported by default, without the need of any further action from the user.

Basic Report (Default)

The basic report is handled by CMS-Connect wrappers and there no user-side action is required for it. This report includes the following:

  • Start and End time of report
  • Executable CPU and WallClock time
  • Executable exit code
    Please, notice that if the user submits a wrapper on top of the executable, the wrapper exit code and times will be reported, unless the user specifies such values (see Advanced Report).
  • Hostname of machine where the job was executed
  • Computing Element Name
Please, see the Advanced Report in order to report stage-in/stage-out times and exit codes, number of events in the job or to override some of the default parameters.

Full Report

The following parameters can be specified by the user in order to report more advanced parameters from the worker node to the CMS Dashboard. The only requirement is to print out such parameters in the format:

Parameters report format

PARAMETER = VALUE

# Example: Print this out at the end of your job to report the number of events on it.

CMS_DASHBOARD_N_EVENTS = 5000

The following table provides a list of the parameters than can be reported from the user side and the default values for the basic report case.

Parameters Description
Parameters Description
CMS_DASHBOARD_N_EVENTS Number of events in the job. Default: 0
CMS_DASHBOARD_EXE_WC_TIME Executable wall clock time. Default: Condor executable WC time.
CMS_DASHBOARD_EXE_CPU_TIME Executable CPU time. Default: Condor executable CPU time.
CMS_DASHBOARD_EXE_EXIT_CODE

Executable exit code. Default: Condor Executable exit code.

Note: The user might want to override the default values for EXE_WC_TIME, EXE_CPU_TIME and EXE_EXIT_CODE in cases where e.g the Condor Executable is just a user wrapper running the actual executable.
CMS_DASHBOARD_STAGEOUT_SE Storage Element name. Default: unknown.
CMS_DASHBOARD_STAGEOUT_EXIT_CODE Stage out exit code.
CMS_DASHBOARD_STAGEOUT_TIME Stage out exit time.
CMS_DASHBOARD_JOB_EXIT_CODE Job Exit code. Default: Executable exit code.
User can report their own job exit codes to handle the overall completion state of the job.
CMS_DASHBOARD_JOB_EXIT_REASON Job Exit Reason. Default: Empty
You can follow this Twiki link to find more information about job monitoring with CMS Dashboard.

Historical View Example

Example CMS-Connect jobs reported to Dashboard.

Using the Connect client

The connect client allows you to submit jobs from your laptop or local cluster to the Global Pool via CMS Connect. This is provided by OSG and the full documentation can be found here.

Note: You don't need to install the connect client on login.uscms.org to submit jobs. The client is only useful if you want to submit from e.g: cmslpc-sl6.fnal.gov, lxplus.cern.ch, your laptop, etc.

Option 1: Using the client from CVMFS

To use the client from cvmfs, download and source the client script for your shell (tcsh and bash/zsh supported), as shown below:

* Important note: The client uses python modules that might be incompatible with your CMSSW release. * If for any reason you need to cmsenv a CMSSW release before submitting jobs, please use install the client instead.


git clone https://github.com/CMSConnect/cmsconnect-client

cd cmsconnect-client

source cmsconnect_client.sh

# or for tcsh

source cmsconnect_client.tcsh 

Option 2: Installing the client locally

A CMSSW environment is not needed for submitting jobs, unless you are planning to e.g: inherit the environment from the submit node to the worker nodes. In order to use the CMS Connect client within the environment of a CMSSW release, the client needs to be installed under that release to guarantee python compatibility. You can follow the commands below for that:


# First, cmsenv a CMSSW release providing python2.7. For example:

$ source /cvmfs/cms.cern.ch/cmsset_default.csh

cmsrel CMSSW_7_1_28; cd CMSSW_7_1_28; cmsenv

# Now, install the client

$ wget -O - https://raw.githubusercontent.com/CMSConnect/cmsconnect-client/master/cmsconnect_client_install.sh | bash 

Please notice that if you change to a different CMSSW release (e.g from 8.x to 9.x), you might need to reinstall the client.

Using the client locally

Once installed, you just need to source the client script (examples for tcsh and bash):


# For tcsh

source ~/software/connect-client/cmsconnect_client.tcsh

# For bash:

source ~/software/connect-client/cmsconnect_client.sh 

Setting up the account the first time

To setup the connect client with cms connect the, you can type:


$ connect setup username@login.uscms.org 

Initializing proxy certificates

We will need to initialize our proxy certificates in the CMS Connect node in order to submit jobs.

$ connect shell
$ export HOME=/home/$USER
$ voms-proxy-init -voms cms -valid 192:00
$ exit

Submitting jobs

To verify your proxy life, you can then type:


$ connect shell voms-proxy-info -all 

Use the git repo to get a submit example from the tutorial:


$ git clone https://github.com/CMSConnect/tutorial-quickstart
$ cd tutorial-quickstart

You will need to add your project name to the submit file (tutorial01.submit).

+ProjectName="cms.org.yourinstitution"

to see your default project you can type:


$ connect shell cat /home/\$USER/.ciconnect/defaultproject 

To submit your job, you can type:


$ connect submit tutorial01.submit

To see the queue:


$ connect q 

Once your job is done in the queue, you can pull the output via:


$ connect pull 

Note: In case of the tutorial01 example, you will see "job.output, job.err" being transferred.

Additional Documentation

You can find additional documentation at:

https://twiki.cern.ch/twiki/bin/view/Main/SwGuideCMSConnect

Contacts

Review status

<!-- Add your review status in this table structure with 2 columns delineated by three vertical bars -->

Reviewer/Editor and Date (copy from screen) Comments
Kenyi Hurtado - 10 June 2016 created documentation v1
<!-- In the following line, be sure to put a blank space AFTER your name; otherwise the Summary doesn't come out right. -->


Responsible: KenyiHurtado

Last reviewed by: Most recent reviewer

Edit | Attach | Watch | Print version | History: r27 < r26 < r25 < r24 < r23 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r27 - 2022-12-01 - KenyiPaoloHurtadoAnampa
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback