ISU Tier 3 User's Guide
All of the information in this section is intended for users - if something needs to be clarified please tell your admins (currently Dan Pluth and Michael Werner). If you are looking for information on our T3 management, it is located
here.
Creating a New Account
Any admin can create a user account for you with any username you would like, however for ease of use interacting with ATLAS systems it is recommended that your Tier 3 username be the same as your CERN username.
Logging In
To login to the ISU Tier 3 simply ssh into one of the
interactive nodes (
hep-int1.physics.iastate.edu or hep-int2.physics.iastate.edu).
ssh -Y <username>@hep-int1.physics.iastate.edu
The [-Y] in this command is
Xforwarding which allows you to open and use GUIs through the connection (primarily used for opening a ROOT TBrowser).
Changing Your Password
To change your password when logged into the T3, simply use the command :
passwd
Changing password for user mdwerner.
Enter login(LDAP) password:
You will be asked to enter your current password, followed by your new password twice. You must choose a password that is at least 8 characters in length and which isn't contained in a local dictionary.
Setting Up the ATLAS Environment
In order to setup the ATLAS Environment, you should create a "setup.sh" script with the following lines :
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
setupATLAS
Setting Up the ATLAS Style for plotting
- Step 1 : Download the atlasstyle.tar file at the bottom of the page and copy it to the machine you wish to work on.
- Step 2 : Extract it into the home directory:
cd ~/
tar xvf /path/to/atlasstyle.tar
- Step 3 : Test that it is now configured appropriately by running root. You should see:
Applying ATLAS style settings...
SSH Keys
Sometimes it is annoying to have to enter your password every time you login to a machine. To allow the T3 to "remember" your computer, do the following :
- Step 1 : Create the RSA Key Pair
- On the client machine, run the command
ssh-keygen -t rsa
- Step 2 : Store the Keys and Passphrase
- Step 3 : Copy the public key
I'm Tired of remembering different usernames
Sometimes we are forced to have different usernames on our local desktops from our T3 usernames or lxplus usernames. You can tell your local machine to remember your username which corresponds to which destination by editting your
~/.ssh/config file. If you don't already have one you can simply create an empty one, and then add the following lines to it :
- Host lxplus.cern.ch
- ForwardX11 yes #This enables GUIs like a TBrowser
- User <lxplus_username>
Now rather than typing
ssh -Y <lxplus_username>@lxplus.cern.ch
you can simply type
ssh lxplus.cern.ch
and you will get the same behavior.
ISU VPN
Currently it is possible to connect to the T3 directly, however in the near future we may be required to connect to the T3 via ISU's VPN if outside of campus. You can download the ISU VPN client for free after logging in
here
. Instructions for use on your operating system can be found
here
. Contact your site administrator for help.
Setting File/Directory Permissions
By default, your T3 home directory is private and your colleagues will not be able to read your files. If you want to allow them to copy something or simply view a file or directory, you will need to change the permissions on it.
It is important to note that in order for a file to turn up on a search, or to copy a file,
execute permissions are required on the containing directory, not just read permissions. To allow a colleague to copy a file from a directory please follow the recommendation below.
Shorthand :
User (u), Group (g), Other (o), All Users (a)
Read (r), Execute (x), Write (w)
To view the permissions on a file, use the command
ls -la myFile
Forwarding system mail
Some programs (such as HTCondor) can be configured to send messages to the user who launches jobs. To forward those messages to your email account simply make a file in your home directory
.forward. Note that our T3 is only configured to send mail to
iastate.edu addresses.
Recommendation
To simply allow all T3 users to read and copy from a directory, issue the following command on the directory (the -R is recursive, and only need be applied if there are subdirectories to which you also want to allow access).
chmod [-R] u=rwx,g=rx,o=r myDir
This command gives the user (you) read, write, and execute permissions, it gives all group members (users of our T3) read and execute permissions, and any other users have only read permissions.
Remove Permissions
If you find the need to remove permissions already set on a file, simply use
chmod [-R] g-rwx,o-rwx myDir
The (-) sign removes permissions, a (+) sign will add them.
Setting your Grid Certificate
Before you can submit jobs to the grid you will need a Grid Certificate (which must be renewed every year) - instructions are
here.
Through the instructions here you will get a certificate of the form 'mycert.p12'.
You need to convert your certificate into the correct form using:
> openssl pkcs12 -in mycert.p12 -clcerts -nokeys -out usercert.pem
> openssl pkcs12 -in mycert.p12 -nocerts -out userkey.pem
> chmod 600 userkey.pem
> chmod 400 usercert.pem
then move these two files to the .globus directory (If you haven't got one then mkdir ~/.globus). You probably need to remember two passwords, one for the original certificate and one for the converted one. When you are producing userkey.pem, you must specify a PEM pass phrase or your the coms-proxy-init will fail! If all is well try:
> voms-proxy-init -voms atlas
this will give something like this:
Contacting voms2.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "atlas"...
Remote VOMS server contacted succesfully.
Created proxy in /tmp/...
Your proxy is valid until Wed Jul 06 02:41:38 CEST 2016
For writing supporting notes, publications, and eventually your thesis, you will need to use
LaTeX.
LaTeX is a programming language which, when compiled, produces a .pdf of the paper. Its greatest benefit is that references to tables, figures, sections, etc are automatically updated each time you compile - and replacing a plot is as simple as replacing the file in your "Plots" directory...making it very easy to reproduce the document with a full set of new plots with very little work.
Most ATLAS users use
TeXstudio (
http://www.texstudio.org/
) to write their
LaTeX documents (it is much easier to edit them within such an integrated development environment which recognizes the syntax, and this IDE is platform independent so you can move from one machine to another with ease).
For ATLAS papers you will need to fork the
AtlasLatex repository into your own :
https://gitlab.cern.ch/atlas-phys/AtlasLatex
. Note that this project has some dependencies, so you may need to install additional packages as described here (
https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/PubComLaTeXFAQ).
Journals in which to publish each have their own requirements/templates. APS journals use
RevTeX
(which is an add-on to an existing
TeXLive
installation).
For your ISU thesis you will need to download the ISU thesis template here (
https://www.grad-college.iastate.edu/thesis/thesis_template/
)
I will post
LaTeX help links here if I find them useful enough :
Feynman Diagrams
At some point you'll surely need to create your own Feynman diagrams. There are many packages which do this in
LaTeX (which may be lucrative), however in my experience those packages require special compilation steps which you can't guarantee will be an option when uploading your
LaTeX code elsewhere. The most efficient choice is to simply make the diagram once and export as a .pdf or .eps to be included in documents as an image.
One easy tool to create diagrams is on
ShareLaTeX
. The site is free (though requires you to make an account). There you can create your Feynman diagram completely within a web browser and then simply save the output.
The template is a good place to start (very start of the document). If you want to make a standalone pdf rather than a full page document, replace:
\documentclass{article}
with:
\RequirePackage{luatex85}
\documentclass{standalone}
Here is an example for a leptonically decaying WZ process:
\RequirePackage{luatex85}
\documentclass{standalone}
\usepackage{tikz}
\usepackage[compat=1.1.0]{tikz-feynman}
\begin{document}
\feynmandiagram [horizontal=a to b] {
i1 [particle=\(q\)] -- [fermion] a -- [fermion] i2 [particle=\(q\)],
a -- [boson, edge label=\(W^{+}\)] b,
if1 [particle=\(W^{+}\)] -- [boson] b -- [boson] if2 [particle=\(Z\)],
if1 -- [opacity=0] if2,
ff1 [particle=\(l^{+}\)] -- if1 -- ff2 [particle=\(\nu\)],
ff3 [particle=\(l^{+}\)] -- if2 -- ff4 [particle=\(l^{-}\)],
ff2 -- [opacity=0] ff4,
ff1 -- [opacity=0] ff2,
ff3 -- [opacity=0] ff4 };
\end{document}
CDS
Eventually you will need to upload a copy of your draft to the CERN Document Server (CDS) :
https://cds.cern.ch/submit?ln=en
.
Once your draft is uploaded you can request approval through the CDS interface.
Using Git
What is Git
Git is a distributed revisioning software. It allows you to create a local repository for your code that is easily synced with remote servers, or with other collaborators.
Why use Git
- CERN is transitioning to use git for all its repos.
- It has a much better merging algorithm than SVN
- Once you get the hang of it, its very convenient for making quick changes and revisions.
How to get started
- Go to your project
- First checkout the latest version of git (this is important, as the older version used by default causes many issues)
lsetup git
- type
git init
CERN has its own flavor of gitlab that is available for all users
https://gitlab.cern.ch/
In the top right corner, after you log in, there should be a plus sign. Click here and then you'll be guided through creating a new repo.
Commands
Git maintains a local repository with your code and also an additional local repository that mirrors the state of the remote repository. By keeping a copy of the remote repository locally, git can figure out the changes needed even when the remote repository is not reachable.
-
git pull
This command synchronizes your local copy with the gitlab repository (or wherever you specify to pull from). (Equivalent to svn update).
-
git fetch
This command synchronizes the mirror of the remote repository, but does not change your code.
- You can tell git to never add certain files by adding the files to the .gitignore file. (If this file does not exist in your project you can simply create it - each line can be a rule which git will check against before adding a file).
-
git remote -v
You can check what the remote repo is set to using
Sometimes you need to rely on another code base which is regularly updated, and you would like your package to be able to work with the updates. To do this you can add a
git submodule.
- The following command adds a CxAODFramework _LLP package of mine as a submodule of the current project
git submodule add <ssh://git@gitlab.cern.ch:7999/miwerner/CxAODFramework_LLP.git>
.
- All submodules can be updated simultaneously with the command
git submodule foreach git pull origin
- When you clone a git repository you must specify the option --recursive to checkout all submodules as well.
- If you need to checkout submodules for the first time, run the command
git submodule update --init --recursive
To view the changes between various commits you can simply type
This is organized with the most recent version at the top, and older commits further down the list. To revert to a previous commit simply use the command
To view all differences between your current release and a previous commit, use the commmand
To create a tagged release, use the following command.
To view all available tags in a project,
To view more information on a specific tag, use
To checkout a specific tag of a project, use
To revert changes made to a file (Note that there is a space after the --),
SSD (Compilation Space)
Compiling large amounts of code on the NFS can be quite slow and hog the network resources. For this purpose we installed a 2 TB SSD onto hep-int2 which can be accessed under
/ssh/
. Users are welcome to use this drive for compilation jobs and temporary storage, however it should not be used for long term data storage as it is not in a RAID configuration and any drive failure will result in complete data loss.
ATLAS analysis requires the use of
AnalysisBase software to process xAOD data files. A good tutorial can be found
here.
Most of us here at ISU are using the
CxAODFramework (or some personally modified version of it) to do our analyses. The framework makes it easier for us to ensure that we are using all of the same settings/calibrations/etc as other analyzers - as well as ensuring that we don't have to deal with the difficulties of coding our own framework. If you need to modify the framework to suit your own purposes you can find information on each of the calibration/selection tools
here.
The TWiki page for the
CxAODFramework is
here. As described on that TWiki page you will clone the
FrameworkSub directory which contains an automated script that will checkout the rest of the package for you.
Submitting Jobs to the Grid
Through PANDA we can submit jobs to the CERN computing grid from our environment. This is very convenient and avoids having to copy large amounts of code/files to lxplus. First, make sure that your grid certificates are copied into your .globus directory on our T3 - then you can simply run
lsetup panda and submit jobs to the grid as you would normally.
Panda Commands
The most convenient way to monitor your Panda jobs is through the web interface (look from
http://bigpanda.cern.ch/
). However you can open an environment to check on/change the status of your jobs with the
pbook command.
pbook
This program will begin by grabbing the status of all your jobs and displaying them to the screen like so :
INFO : Synchronizing local repository ...
INFO : Got 0 jobs to be updated
INFO : Synchronization Completed
INFO : Done
Start pBook 0.5.70
>>>
This can take a few minutes if you have a lot of jobs or if it has been a long time since you've checked last. To get a list of commands, type
help() (The parenthesis is important). To list all of the jobs, type
show(). To retry a job, type
retry(). It is very common for a small fraction of the sub jobs to fail (a job may show up as finished with only 90% complete, for instance), and is often corrected by simply 'retry'ing the job as shown below.
>>> retry(5413)
INFO : Getting status for TaskID=9419900 ...
INFO : Updated TaskID=9419900 ...
INFO : ID=5413 is composed of JobID=5413
INFO : command is registered. will be executed in a few minutes
Alternatively, if a large fraction of jobs are failing and retry() hasn't been successful, the problem may be the site at which the jobs are running. To transfer the jobs to another site, use the command
killAndRetry() instead.
Getting Help With Grid Issues
If you are having issues with your jobs on the grid (and they work perfectly fine locally) you should know the two main locations for getting help :
Rucio (Downloading Files from the Grid)
Before you can use rucio you must set it up with the command
lsetup rucio
and have a valid grid certificate. Run
voms-proxy-init -voms atlas
and enter your grid password.
Commands
The majority of our T3 disk space is distributed on the workers. The
XRootD system allows this distributed storage to be accessed through a single mount point. Any long-term large files should be stored on xrootd.
Our
XRootD is mounted at
/mnt/xrootd and you can interact with your files on this mount as you would normally (they will be located under /mnt/xrootd/
).
Basic Operations
If you are copying a large amount of data to xrootd, you can speed up the copying operation by running with the native command xrdcp used as follows :
xrdcp [-C md5] -R <directory> root://head2//local/xrootd/a/<username>/<directory>
I highly recommend running this command with the -C md5 option. This performs a check that the file was copied correctly and will report an error to you if it has not. Additionally if a file has already been copied to xrootd then this ensures that the file is indeed the same (and doesn't just have the same name). Otherwise if you attempt to copy a file with the same name the copy will simply move on.
All other operations (ls, mkdir, etc) are much simpler to use normally through the mount (i.e. ls /mnt/xrootd//), however it is also possible to use the xrootd command xrdfs as follows :
xrdfs root://head2 ls /local/xrootd/a/<username>/<directory>
OR
xrdfs root://head2 mkdir /local/xrootd/a/<username>/<directory>
Sometimes you will get an error stating that a directory already exists (indicating that the directory exists on some workers but not others). To force a command to run despite this simply add the -f flag. (If it happens often be sure to inform your site administrator).
In xAODs
To pass a file in xrootd to an xAOD algorithm, simply use the DiskListXRD class in place of the usual DiskListLocal. A short example is shown below.
#include "SampleHandler/DiskListXRD.h"
...
SH::SampleHandler sampleHandler;
SH::DiskListXRD list("head2", "/local/xrootd/a/<username>/<directory>/");
SH::ScanDir().scan(sampleHandler, list);
A preferable alternative to this approach (due to its ease of use) is DiskListLocal, along with our mount of the xrootd system as follows :
#include "SampleHandler/DiskListLocal.h"
...
SH::SampleHandler sampleHandler;
std::string xrdprefix="/local/xrootd/a";
std::string mnt_dir(dataset_dir); //dataset_dir is a string pointing to the file in the xrd system (/local/xrootd/a/<username>/<directory>)
mnt_dir.erase(0, xrdprefix.size());
SH::DiskListLocal list("/mnt/xrootd/"+mnt_dir, "root://head2/"+dataset_dir);
SH::ScanDir().scan(sampleHandler, list);
Data Duplication
The home directory has a RAID setup with redundancy, however space is limited. The xrootd system has considerably more space however, it is not a storage system, it is 'production' level. There is no data duplication inherent to the xrootd system. This means that if a xrootd drive fails with your data on it, the data is gone.
Grid Duplication
To mitigate this type of loss, it is advisable to upload critical data to the grid. This can be done with rucio, however finding the appropriate grid destination can be challenging. In general try to upload to SLAC and BNL. SCRATCHDISK is for short-term storage, while LOCALGROUPDISK is for long-term storage. SCRATCHDISK space is in general easier to come by.
rucio upload --rse SLACXRD_SCRATCHDISK user.{username}:{dataset_name} {dir_to_upload}
This will upload all files in dir_to_upload to a single dataset. There are many more upload commands and other relevant details which can be found here:
http://rucio.cern.ch/client_howto.html
It may also be useful to find a full list of rse locations. Only US cites will allow US users to upload, keep this in mind while attempting to find appropriate rses.
rucio list-rses
EOS Duplication
Alternatively, though it may be far less convenient, the CERN lxplus system has a large EOS filesystem which is useful for storing data. EOS is a type of xrootd system, which means it supports the same types of operations that xrootd does. You must setup a kerberos ticket first however using 'kinit'. To upload from the tier3 to lxplus's eos use the following xrdcp command:
xrdcp {file} root://eosatlas.cern.ch//eos/atlas/user/{usernamefirstletter}/{username}/{dir}
The recursive flag should also work for this kind of copy, the same as used on the tier3 locally. A remote xrdcp seems to be susceptible to hanging; if a copy does not complete, retry with the checksum flag to recopy partially copied files.
Interestingly, you can also run using samplehandler directly off of the lxplus eos, however...this is exceptionally slow. Use FAX instead for grid data.
AMI
AMI
is an online tool used to browse available datasets on the grid.
To retreive all of the metadata for your MC samples (cross sections, k factors, generator filter efficiencies, etc) simply place each of your datasets on a line in a text file and run the following command in the ATLAS environment
acmSetup AthAnalysis, 21.2.9
lsetup pyAMI
getMetadata.py –inDsTxt=datasets.txt
Condor
Condor is used for distributed processing. We have 10 workers on our T3 (each of which has 16 cores). Through condor these processors can be used in tandem to complete jobs much faster than running them serially. The condor system allows us to run up to 140 jobs in parallel (14 on each worker node). It is very common when many jobs are being submitted at once for the condor service to output a message such as
-- Failed to fetch ads from: <192.168.1.1:9230?addrs=192.168.1.1-9230> : hep-int1.physics.iastate.edu
SECMAN:2007:Failed to end classad message.
These messages will stop once all jobs have been added to the queue. Until that time any condor commands (such as condor_q) will just output this message over and over.
Basic Commands
To check the condor queue (to see what jobs are currently running), use the command
condor_q
For more detailed information, use
condor_q -analyze <jobId>
To remove jobs from the queue, use the command
condor_rm <jobId>
To remove all jobs you have running you can instead run
condor_rm <username>
Sometimes there is a 'hiccup' in the system and a job will get put into held status. These jobs can be retried with the command
condor_release <username/jobID>
Retrying Failed Jobs
To determine which jobs have failed, check the submitDir/fetch/ directory for any files named fail-*. You can check the logfiles for those jobs in submitDir/submit/log-*.out (or .err). If you are ready to resubmit the jobs that have failed, first remove the submitDir/fetch/fail-* files then run
condor_submit <submitDir>/submit/submit
. This will put all jobs back in the queue and then quickly remove those already marked as done or fail in the submitDir/fetch/ directory.
In xAOD/AnalysisBase
To use condor to run your jobs you can simply use the CondorDriver instead of the typical DirectDriver. Example Usage :
#include "EventLoop/CondorDriver.h"
...
EL::Driver* driver = 0;
driver = new EL::CondorDriver;
((EL::CondorDriver*)driver)->shellInit = "export PATH=${PATH}:/usr/sbin:/sbin";
...
driver->submit(job, submitDir);
In some circumstances it may be preferable to instead use the command
driver->submitOnly(job, submitDir);
The submit function will cause a failure if any subjob fails, while the submitOnly function will allow other jobs to finish normally in the event that some fail.
Alert script
If you want a simple script that will email you when your condor jobs finish, you can use the following:
/export/home/dpluth/scripts/Condor_alert.sh
Just copy it to wherever you like and edit the email address. You invoke it with the argument of the submit directory.
I prefer to fork it from my job submission script. This way it automatically runs when I run a condor job, and I receive an email with any jobs that may have failed.
For example in bash:
exec "$ROOTCOREBIN"/../Seesaw_xAOD/Run/Condor_alert.sh /export/home/dpluth/work/run2_skimmed/"$submitdir" &
Or in root:
std::string rootdir(getenv("ROOTCOREBIN"));
std::string const command = rootdir + std::string("/../Seesaw_xAOD/Run/Condor_alert.sh /export/home/dpluth/work/run2_skimmed/submitDir &");
system( command.c_str() );
Moving To CERN
Moving out to CERN (even for a couple of months) requires a bit of paperwork beforehand. I've tried to jot down the main points so that your trip can go smoother than mine.
- Things to do Before you leave
- Begin by looking for housing (the dates for everything else will be more flexible). A good resource for this is the secretariat link
- Book flights through Linda (she'll have you contact T&T with the details - just keep her CC'd on everything for the finances). Be sure to also keep all receipts from your travel days (hotels, taxis, food, etc) as you may get reimbursed later.
- Have Jim fill out the pre-registration online link
. He'll need a lot of information from you - it'll be best to send him a copy of your passport (in pdf form) and your current address. You'll also need to know your CERN user ID which you can find by logging in here
- Getting to CERN from the airport
- When you fly to Geneva you get a free pass to ride on public transit - however you need to be sure to pick up your ticket before you leave the baggage claim area.
- The Y (pink) bus goes directly from the airport to CERN - and the stop is marked outside the airport (you will see many city buses stopping there). No need to hand your ticket to the driver - simply board the bus and present the ticket if you're ever asked.
- If you accidentally fail to get your free ticket, you can purchase one at a kiosk (every bus/tram stop in Geneva has one). You can get an all-day pass for 10 CHF (works on all modes of transit). The kiosks accept credit cards.
- Upon arriving at CERN
- Arrive at CERN reception (Building 33) - this is right next to the bus/shuttle stop and across the street from the globe so you won't miss it. Talk to the receptionist and get a temporary pass (just a stamped piece of paper). They will also give you a map which will help you find your way around the campus.
- Go to building 510 and enter the User Registration Office (Note that as of May 2017 the office hours are 8am-12pm & 2pm-5pm MTRF, but are only open 2pm-5pm on Wednesdays). There they will make a copy of your passport and make sure that Jim filled out all the paperwork needed.
- Next get your access card at building 55.
- Once you have your access card go to the ATLAS secretariat (on the fourth floor of building 40) - there you will fill out an application for a key to the ISU office which takes a day to be approved.
- Finally you can go to ISU's office in building 304/R-006.
- While at CERN
- There is a cafeteria not far from the office. Prices for meals there vary but they are generally around ~10CHF. They take swiss francs only - you cannot use a credit/debit card to pay. There are ATMs in front of the cafeteria so you can withdraw money if you need it.
- The number 18 tram stops right at CERN and is the most convenient way to get to downtown Geneva for sightseeing.
-- MichaelDavidWerner - 2016-09-02</verbatim>