-- ClintRichardson - 21 Mar 2014

Boston University Tier 3 Information

Log-in Info

To get a login to the machines email Saul Youssef asking for an account. Once you have an account you can login to the following nodes:

cm0.bu.edu

ne1.bu.edu

ne2.bu.edu

ne3.bu.edu

ne4.bu.edu

Note that currently only cm0.bu.edu, ne1.bu.edu, and ne4.bu.edu are ours alone. You might notice some ATLAS grid jobs running on the other machines, but you jobs will take priority over theirs. Also, in the next few weeks this should change and we should have sole access to all the above machines.

CMSSW

Currently, the versions of CMSSW which are installed locally are:

  • CMSSW_5_3_11
  • CMSSW_6_2_0
  • CMSSW_7_1_6
Users can gain access to these using the standard cmsrel command once the following lines have been executed (for ease one might want to add them to their personal ~/.login file)

export VO_CMS_SW_DIR=/atlasgrid/cms/CMSSW_Releases/

export SCRAM_ARCH=slc6_amd64_gcc472 # This works for 5311 and 620, for 716 use export SCRAM_ARCH=slc6_amd64_gcc481

source $VO_CMS_SW_DIR/cmsset_default.sh

For other versions use cvmfs (below).

Service Software

CVMFS

CVMFS is installed on these machines so you can use it to setup any cmssw release, get access to crab, etc. Simply source it like so:

  • source /cvmfs/cms.cern.ch/cmsset_default.sh

xrootd/AAA

xrootd is there, but grid permissions aren't set up yet. What I specifically mean by this is that I haven't put my certificate on these machines yet a nd so have no way of knowing it will work until then.

CRAB

CRAB works and is installed. I think technically it is installed on the tier-2 through their installation of cvmfs. It can be setup using:

source /cvmfs/cms.cern.ch/crab/crab.(c)sh

I haven't setup a local place to store output of grid jobs so while one can submit grid jobs from here there is no place to store the output. You could just store it at fnal for now, but probably better to wait until I set this up and test it.

Sun Gride Engine (SGE)

SGE is our local batch system. It submits jobs to the ATLAS tier-2 worker nodes. The command to submit one's job is:

qsub -q tier3 <yourjob.sh>

NOTE: I haven't tried this yet so NO PROMISES that it will work! I will update after successful attempt smile

Kerberos

Saul has set up kerberos authentication with fermilab so any user should be able to get a kerberos ticket and easily transfer data/files/scripts between fermilab and the tier 3.

git

git appears to be installed. I haven't attempted to link it to my github account yet. I also haven't tried using the cms specific git commands. I'm not sure how to install them actually, but this is probably a little lower priority.

Hardware Setup

Component Description

There are five (5) interactive nodes. They are identical except for the cm0 node which is a 'debugging' node (see below for details). We use the ATLAS tier-2 machines as worker nodes and hence the batch system can be considered somewhat independent of our tier-3. However, you will be submitting jobs to it from the interactive nodes and it is best to be aware of exactly what you are doing. Hopefully I'll be able to setup a relative simple and transparent way of submitting the jobs that eases the life of our users.

Worker Nodes

Sometimes these are referred to as \emph{batch nodes}. These are (usually) machines that one cannot use interactively. Their use is controlled by a Local Resource Manager (LRM), for example Condor or SGE. Our worker nodes are actually the ATLAS tier-2 nodes and our LRM is SGE (See below). Note that our interactive nodes are only interactive. Batch jobs do not get submitted to them.

Compute Element

A compute element (CE) is the grid front end for the batch system. I think it allows users of the grid to submit jobs to the local batch system. I'm not sure about setting this up for us as we have a small tier 3 and it would easily be dominated by a few outside submissions from even just CMS members. Maybe set this up and put heavy permissions/prioriy protections on it?

Interactive Nodes

These are the nodes that one logs into, just like logging in cmslpc-sl5.fnal.gov. Each user has an NFS (GPFS? I'm really not sure where /home is mounted, though it IS a network drive in the sense that it is visible from every node...though there might be a different NFS on which CMSSW is installed?? - QUESTIONS FOR SAUL) mounted home directory and also has the possibility of using multi-TB local storage in /data directory. Nodes cm1,2,3,4 have 12 core intel westmere cpus with 24GB of RAM.

An AFS connection is also available and it looks like one can access their cern AFS directory, though I get a permission denied error. It's probably because I don't really know how AFS works - I'll try to find this out an provide a solution.

The cm0 node has faster processors (by 10\%) and 48GB of RAM. It is wonderful for memory intensive jobs! It also has two local 50GB ssd drives mounted under /ssd1 and /ssd2. I'm not sure what jobs would make the use of these. However, if one finds that their jobs are bottle necked by data retrieval one might try running on them.

Storage Element

The storage element (SE) is similar to the CE in that it's primary job is interfacing with the grid. While the CE focuses on managing the jobs submitted from the grid to the local batch system, the SE provides a front end for grid data transfer. I think this works both ways, that is, it provides the solution to transfer the data both to and from the grid either for job submission or retrieval. Probably in most cases it will work for us to retrieve the output of jobs (e.g. retrieving the output of patTuple creation jobs). I think that we have a GPFS system for this purpose, though I'm not sure that it actually interfaces with the grid.

Operating System Info

Most versions of CMSSW53X will only compile on sl(c)5, however some later versions will compile in sl(c)6. However, newer versions of CMSSW (e.g. all > 60X) will only compile in sl(c)6. Further, CMSSW jobs compiled in sl(c)5 will run on sl(c)6. Hence, it should be good/fine to change the worker nodes to sl(c)6 as long as they are only available via the batch system and not also interactive nodes.

GPFS

There is a GPFS available that can store ~2.5 PB (!) of data. I think it will be the best place to store output of grid/batch jobs. In order to access it, make a directory under /gpfs3/bu that has the same name as your username. I will update this with instructions on how to store/retrieve data using gpfs as I figure it out (it's easy enough to do it just interactively but I mean in terms of directing output of grid jobs there/using it as input for batch submission).

Tips From Andrea Bocci

n fact, I don't see any (L1 post-dead-time or HLT) rates in the database after run 233238, which dates back to MWGR10:

ssh bufu-c2e38-36-04.cms
USER=`cat /nfshome0/popcondev/conddb/readOnlyOMDS.xml | grep -A2 "\<CMS_RUNINFO\>" | grep "user" | cut -d'"' -f4`
PASS=`cat /nfshome0/popcondev/conddb/readOnlyOMDS.xml | grep -A2 "\<CMS_RUNINFO\>" | grep "password" | cut -d'"' -f4`
echo 'select * from (select RUNNUMBER from CMS_RUNINFO.HLT_SUPERVISOR_L1_SCALARS group by RUNNUMBER order by RUNNUMBER desc) where rownum <= 10 ;' | sqlplus -s "${USER}/${PASS}@CMS_OMDS_LB"



The .jsndata files are being written:

ssh bu-c2e18-13-01.cms

ls -l /store/lustre/scratch/run233886/run233886_*_streamL1Rates_*.jsndata

but they are not being injected in the database.

So, I guess there is an issue either with the transfer system not calling the injection script, or the script not working properly.

Ciao,
.Andrea


P.S.

please write down these ways to check both the status of the rates in the DB and of the .jsndata on Lustre - and ask DB and DAQ is you can consider these as an "official" way of making such checks.

P.S. simpler and faster queries to check the latest run number for which rates have been injected in the database:


select max(RUNNUMBER) from CMS_RUNINFO.HLT_SUPERVISOR_L1_SCALARS;
select max(RUNNUMBER) from CMS_RUNINFO.HLT_SUPERVISOR_TRIGGERPATHS;

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2015-05-29 - ClintRichardson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback