5.12 cmssh tutorial
Complete:
Detailed Review status
Goals of this page:
This page is intended to provide you with an overview of cmssh shell, its installation and usage in CMS.
Contents
Introduction
CMS software is quite complex. It requires certain knowledge to install and run it properly. Basically it consists of GRID middleware to perform grid tasks, such as job submission, file transfer, etc., the CMSSW software stack to run CMS software and various web-services to help users to find out their data. The
cmssh project
was developed to simplify the initial burden on end-users to install CMS software. It targeted the following items:
- users should be able easily find their favorite data
- users should be able to copy files transparently to/from SE/local disks without any knowledge of GRID middleware
- users should be able to easily install and run CMSSW releases without any knowledge of CMS packing and distribution tools
- users should be able to perform all of these tasks, including analysis ones, under simple user-friendly shell
The idea was to bring users a shell, like the one you use on any UNIX platform, which will allow to perform aforementioned tasks.
What is it?
cmssh is programmable shell written in python (more precisely in
IPython
). It means that you can
program in your shell using python language. You can do
any python operations, e.g. assignment, functions, loops, conditions, etc. At the same time it works as a normal UNIX shell, e.g. bash, where commands like
cp, mv, mkdir, rmdir just works. Moreover those commands will work with local files as well as with CMS objects, such as LFNs. In other words
cp command will work transparently if you'll give to it local file or LFN and it will be able to copy local file or LFN to your destination (either local disk or remote storage element). Are you excited that you can find your data and copy LFNs to any place in a world under your fingertips? If so, please read on.
Why should I use it?
Well, your daily tasks include (among others): find CMS data, e.g. datasets, run, LFNs (probably via
DAS
), get LFN via
FileMover
, run cmssh software. What if all of these tasks you can do in your shell, on your Mac laptop, without any limitations and using normal UNIX syntax, e.g.
find dataset=*Zee*
cp /store/data/CRUZET3/Cosmics/RAW/v1/000/050/832/186585EC-024D-DD11-B747-000423D94AA8.root .
cmsrun my.cfg
you probably need to do these tasks hundreds of times, right? If your answer is yes your probably would be curious to know that
cmssh can be a rescue, give it a shot and you'll not be disappointed.
Installation
The installation of
cmssh is quite simple. You need to download installer from the web and run it in your environment. But there are two routes: the stand-alone installation mode, e.g. your laptop, and multi-user and/or system-wide mode where CMSSW software is available, e.g. lxplus or your local cluster. We will discuss both of them in the following sections. Let's start with getting the installer. On your UNIX box (Linux, Mac OS X, etc.) you can get it either via
curl,
wget or simply download it from the web. Let's outline
curl and web approach
Download cmssh installer using curl tool
curl -k https://raw.github.com/dmwm/cmssh/master/cmssh_install.py > cmssh_install.py
Download installer script directly from the web
Just point your browser over
here
and save the script under
cmssh_install.py name.
Now, go to some place where you'd like to install this tool, e.g. $HOME/workspace/public on lxplus or $HOME/work on your laptop. Feel free to create any directory you want where you'll place installer and install
cmssh. So we're ready to go, but before that let's outline a few prerequisites.
Prerequisites
The
cmssh installer requires python version 2.6 or above (but not 3.x yet). You can check your python version as simple as
python -V
On lxplus you can use python26, since default version of python is very ancient, it is 2.4 or use one of the python installed within CMSSW, e.g.
source /afs/cern.ch/cms/slc5_amd64_gcc462/external/python/2.6.4-cms/etc/profile.d/init.sh
On Mac OS X you'll need to install Xcode. Please obtain it from Apple Store and install it on your system. It basically installs gcc and other stuff which are required to compile and handle your code.
You can install
cmssh under Linux SLC5 or compatible. The other Linux distribution may or may not work (the limitation actually comes from CMSSW software stack rather then cmssh itself). But due to their broad variety I was unable to test it on all Linux distribution. Feel free to try it out and if it fails please submit
bug report
with full output description as well as name and version of your Linux distribution. This will allow us to install virtual machine with your Linux distribution and debug your problem.
Stand-alone installation (laptop mode)
For this install scenario you'll run
cmssh installer as simple as
python cmssh_install.py --install --dir=$PWD
If you ever need to debug its output you can add -v 1 option. For more options just run
python cmssh_install.py --help
After installation step is done you'll get a nice message where new
cmssh tool is located, e.g.
...
Create vomses area
Create cmssh
Clean-up soft area
Congratulations, cmssh is available at /afs/cern.ch/user/v/valya/workspace/public/soft/bin/cmssh
At this step we're ready to go.
Multi-user mode with existing CMSSW install area
First you need to decide which CMSSW architecture you'll use. For list of available architectures please use
TagCollector
service. Here I'll use
slc5_amd64_gcc462 as an example. I'll also use
/afs/cern.ch/cms as a top area where CMSSW software is located. You'll need to adjust those settings to the ones found on your system. Here I show example of how to install
cmssh on lxplus
python cmssh_install.py --install --dir=$PWD --arch=slc5_amd64_gcc462 --cmssw=/afs/cern.ch/cms --multi-user
Usage
To use
cmssh simply invoke it from your shell, e.g.
my-computer# /path/soft/bin/cmssh
here
my-computer# is a UNIX shell prompt followed by
/path as a PATH where you install cmssh (in an example above it was show in a line
Congratulations, cmssh is available at /afs/cern.ch/user/v/valya/workspace/public/soft/bin/cmssh, so the
/path was
/afs/cern.ch/user/v/valya/workspace/public/).
Upon start-up the
cmssh will verify your GRID certificate, if found (under $HOME/.globus) it will verify permissions of your userkey.pem and usercert.pem and ask your GRID password (if you have one). If everything goes smoothly it will invoke proper voms command to get your proxy setup (at this point you'll see your normal proxy output with your DN etc). Once it is started you'll get the following screen:
Available cmssh commands:
find search CMS meta-data (query DBS/Phedex/SiteDB)
dbs_instance show/set DBS instance, default is DBS global instance
mkdir/rmdir mkdir/rmdir command, e.g. mkdir /path/foo or rmdir T3_US_Cornell:/store/user/foo
ls list file/LFN, e.g. ls local.file or ls /store/user/file.root
rm remove file/LFN, e.g. rm local.file or rm T3_US_Cornell:/store/user/file.root
cp copy file/LFN, e.g. cp local.file or cp /store/user/file.root .
info provides detailed info about given CMS entity, e.g. info run=160915
das query DAS
das_json query DAS and return data in JSON format
dqueue status of download queue, list files which are in progress.
root invoke ROOT
du display disk usage for given site, e.g. du T3_US_Cornell
Available CMSSW commands (once you install any CMSSW release):
releases list available CMSSW releases, accepts <list|all> args
install install CMSSW release, e.g. install CMSSW_5_0_0
cmsrel switch to given CMSSW release and setup its environment
arch show or switch to given CMSSW architecture, accept <list|all> args
scram CMSSW scram command
cmsRun cmsRun command for release in question
Available GRID commands: <cmd> either grid or voms
vomsinit setup your proxy (aka voms-proxy-init)
vomsinfo show your proxy info (aka voms-proxy-info)
Query results are accessible via results() function:
find dataset=/*Zee*
for r in results(): print r, type(r)
Help is accessible via cmshelp <command>
To install python software use pip <search|(un)install> <package>
cms-sh|1>
I hope that output explains itself. You got set of command examples, their description and cmssh prompt. Under the
cmssh prompt you can start placing your normal commands, like ls, cp, mkdir, etc. In addition you can use all listed commands, e.g.
find. At the end you got a
cms-sh|1> prompt which shows that cmssh is ready for its first command. Once you'll start placing commands the number will be incremented accordingly to keep track of your commands which can be used later, e.g. for re-play or reference. For example
cms-sh|1> ls
soft
stuff
tests
cms-sh|2> a=1
cms-sh|3> print a
1
cms-sh|4>
here I run simple
ls command to list files in my local directory, then I made
a=1 assignment and print it out. Remember
cmssh is a python shell, all python command will work, e.g.
cms-sh|4> import os
In the examples above, you can notice that number in
cmssh prompt is incrementing. It shows which command you execute. Later it can be used to re-play your history, etc. Without further due, I provide a simple set of commands which you can execute under
cmssh shell and get the felling what it can do:
# search for some data
find dataset=*CRUZET3*RAW
for r in results(): print r, type(r)
# info about file/dataset/run
ls /Cosmics/CRUZET3-v1/RAW
info /Cosmics/CRUZET3-v1/RAW
find file dataset=/Cosmics/CRUZET3-v1/RAW
find site dataset=/Cosmics/CRUZET3-v1/RAW
find run=160915
info run=160915
for r in results(): print r.initLumi, type(r.initLumi), r.DeliveredLumi, type(r.DeliveredLumi)
# list/copy LFN to local disk
ls /store/data/CRUZET3/Cosmics/RAW/v1/000/050/832/186585EC-024D-DD11-B747-000423D94AA8.root
cp /store/data/CRUZET3/Cosmics/RAW/v1/000/050/832/186585EC-024D-DD11-B747-000423D94AA8.root .
ls -l
# SE operations, e.g. list its content, create/delete directory, etc.
du T3_US_Cornell
ls T3_US_Cornell
ls T3_US_Cornell:/store/user/valya
mkdir T3_US_Cornell:/store/user/valya/foo
ls T3_US_Cornell:/store/user/valya
rmdir T3_US_Cornell:/store/user/valya/foo
ls T3_US_Cornell:/store/user/valya
# copy local file to SE
cp 186585EC-024D-DD11-B747-000423D94AA8.root T3_US_Cornell:/store/user/valya
ls T3_US_Cornell:/store/user/valya
ls -l
rm 186585EC-024D-DD11-B747-000423D94AA8.root
# copy LFN from SE to local disk
cp T3_US_Cornell:/store/user/valya/186585EC-024D-DD11-B747-000423D94AA8.root .
ls -l
# delete file on SE
rm T3_US_Cornell:/xrootdfs/cms/store/user/valya/186585EC-024D-DD11-B747-000423D94AA8.root
ls T3_US_Cornell:/store/user/valya
# copy LFN to SE area
cp /store/data/CRUZET3/Cosmics/RAW/v1/000/050/832/186585EC-024D-DD11-B747-000423D94AA8.root T3_US_Cornell:/store/user/valya
ls T3_US_Cornell:/store/user/valya
rm T3_US_Cornell:/xrootdfs/cms/store/user/valya/186585EC-024D-DD11-B747-000423D94AA8.root
ls T3_US_Cornell:/store/user/valya
# copy multiple files
cp /store/data/CRUZET3/Cosmics/RAW/v1/000/050/832/186585EC-024D-DD11-B747-000423D94AA8.root . &
cp /store/data/CRUZET3/Cosmics/RAW/v1/000/050/796/4E1D3610-E64C-DD11-8629-001D09F251FE.root . &
dqueue
# copy user file from T1 tier
cp T1_US_FNAL_Buffer:/store/user/neggert/TT_TuneZ2_7TeV-mcatnlo/MCTSusy_Skim_Mar2012/7b5af1bfe3424f60f0db5b5f14cf327a/MCTSusySkimMar2012_591_1_cSX.root .
# copy lfn from SE to SE
cp T1_US_FNAL_Buffer:/store/user/neggert/TT_TuneZ2_7TeV-mcatnlo/MCTSusy_Skim_Mar2012/7b5af1bfe3424f60f0db5b5f14cf327a/MCTSusySkimMar2012_591_1_cSX.root T3_US_Cornell:/store/user/valya
# look-up available releases
releases
# install CMSSW release
install CMSSW_5_0_1
# switch to installed release
cmsrel CMSSW_5_0_1
# run cmsRun job
cmsRun runevt_cfg.py
# usage of magic functions
# show how to access docstrings
edit test.py
ip = get_ipython()
ip.magic_find("dataset=*Zee_M20*")
for r in results(): print r, type(r)
Commands
The
cmssh is a programmable shell written in python. It means that you can program anything using python language. For example, let's perform simple tasks
cms-sh|5> import os
cms-sh|6> for k, v in os.environ.items(): print k, v
_ /Users/vk/CMS/test_cmssh/soft/install/bin/ipython
....
Here we import os python module and made a simple loop to look-up environment variables. Pretty neat. But
cmssh can help you more with python, e.g. if you'll do
cms-sh|7> os.walk?
and hit return it will list all documentation about os.walk function from os python module. if you'd like to see its code you can do the following:
os.walk??
But what if you don't know which functions/methods are available in your python module. In this case just use tab completion. For example type os. and hit the tab, you'll get a full list of functions available in os python module. Here is what I did
cms-sh|7> os.
Display all 202 possibilities? (y or n)
os.EX_CANTCREAT os.WNOHANG os.geteuid os.sep
os.EX_CONFIG os.WSTOPSIG os.getgid os.setegid
os.EX_DATAERR os.WTERMSIG os.getgroups os.seteuid
...
I hope you'll enjoy this feature. It works with system modules or your local ones once you import your python code.
Meanwhile, there are two types of help you can get from
cmssh, the python help is available as
help(os)
where you provide some python module you want to get help with, in this case it was
os module. And the second help is cmssh specific one which you'll get by using cmshelp command, e.g.
cmshelp find
In this example, we invoked cmssh help for
find command. There are much more power under
cmssh which you can imaging. The list of available commands is available if you type
lsmagic. All the command started with percentage can be used directly in your shell, e.g. find, grep, mkdir, etc. The cell magic commands are the ones which will allow you to place code snippets underneath of the command. Let's explore how you'll execute series of commands under your usual UNIX shell
lsmagic
.... # here you'll see list of all magic commands
# for demonstration I'll use %%! command
cms-sh|4> %%!
...: hostname -f
...: pwd
...: ls | wc
...:
Out[4]:
['mr46.lns.cornell.edu',
'/Users/vk/CMS/test_cmssh',
' 6 7 56']
So what happened here? I invoked the cell magic command which allows to run series of commands under my UNIX shell. Then I typed three commands, hostname -f, pwd and ls | wc and hit enter. The output is a python list object which contains a list of outputs from my UNIX shell commands.
At this point it is up to you to explore all available commands. Go for it!
Advanced features
The
cmssh is very powerful tool. It can do the following tasks:
- find CMS data
- copy any LFN from/to local disk/remote storage element
- you can program any python code and run it right away
- you can install any python package in your local install area
- you can use matplotlib/numpy/ROOT packages
- you can use R statistical language
if it is installed on your system
- you can run cmssh in notebook mode
- your imagination should never stop under cmssh, since it allows you to program and utilize python in its full power
Here I'll discuss only a few of the topics listed above. How to install python packages and notebook feature.
Install 3d party python packages
You probably heard about
PyPI
, right? Shortly, it is python repository of python packages. Under
cmssh you can do the following (
please note that this will work only if you are the owner of your cmssh installation, all UNIX ownership still applies):
cms-sh|5> pip search simpleyaml
simpleyaml - YAML parser and emitter for Python
cms-sh|6> pip install simpleyaml
Downloading/unpacking simpleyaml
Running setup.py egg_info for package simpleyaml
Installing collected packages: simpleyaml
Running setup.py install for simpleyaml
Successfully installed simpleyaml
Cleaning up...
cms-sh|7> import simpleyaml
cms-sh|8> s="""
...: name: foo
...: type:
...: - int
...: - float
...: """
...:
cms-sh|9> simpleyaml.load(s)
Out[9]: {'name': 'foo', 'type': ['int', 'float']}
Here I did a few steps. I searched for a package called
simpleyaml, I installed this package under my
cmssh and I imported this package right away into
cmssh. Then I created my string and loaded it via simpleyaml to get python dict. Very simple and very powerful approach. You can search and install
any python package and start using it right away.
cmssh notebook
You can run
cmssh under your browser. You may wonder why do I need that? Imaging that you're doing some project. You probably will run quite a lot of commands, create code, make plots, etc. What if you want to bookkeep all your steps? Make annotations, comments, plots. And you want to re-play all your results back or better you want to send them over to someone else without explaining how to you did all steps. This is the use case for cmssh notebook. So you can invoke it as simple as
cmssh notebook
For details I refer you to watch this
video
.
More information
Review status
--
ValentinKuznetsov - 13-Jul-2012