Running TDAQ on the Grid

  • Introduction
  • Previous Steps To Reproduce Our Work * User * Site Administrator
  • TDAQ Grid Infrastructure * Overview of the problem * Arquitecture * The GridFarm package
  • Running Medium Scale Tests in Manchester

Introduction

Running the full TDAQ or parts of it on the Grid is more tricky than it looks like. Though there have been another approaches up to date (well, I only know about another one), the approach adopted here is to run the online infrastructure as Grid Jobs in a non-firewalled environment. And then with the TDAQ online tools, connect to this basic infrastructure to perform the usual tasks: start, configure, stop, monitor, etcetera.

This is probably the most general solution, and it should satisfy most of the needs. Though the way we have run it still lacks of a full Grid-friendly aproach: yet there are some hacks, like a network shared directory to save the logs, etc.

Probably next steps towards running the TDAQ on the Grid will involve a full integration of the scripts presented in this document with the online infrastructure (automatic large scale testing), being able to pack the full TDAQ SW as a Grid Job (automatic nightly testing), and providing a way to bypass under some circumstances (remote farms).

Previous Steps to Reproduce Our Work

This are the mandatory steps that every user must follow:

1.1) Get a Grid CERTIFICATE

- they say this is like a passport

- http://??

1.2) Become member of atlas Virtual Organization (VO)

- they say this is like a visa in your passport

- http://??

1.3) Get gridfarm SW.

- this is the infrastructure that lets you submit cleanly TDAQ jobs to grid, connect to them, collect the logs, and shutdown and cleanup the infrastructure.

- they are a set of python and bash scripts that are based on XML-RPC

- http://

or at

- /afs/cern.ch/atlas/project/tdaq/cmt/nightly/installed/share/bin/gr*

- /afs/cern.ch/atlas/project/tdaq/cmt/nightly/installed/share/lib/python/FarmTools

- /afs/cern.ch/atlas/project/tdaq/cmt/nightly/installed/share/data/FarmTools

1.4) Get runner SW, or PartitionMaker, or DBGeneration SW, or all of them.

These are the mandatory steps for a site administrator (though you can probably profit from the installation A.Forti did on a Manchester cluster by her):

0.1) Install TDAQ SW.

- http://

- libshift patch under repository root

- setup file

0.2) Configure a Qeue To be completed by Alessandra

note) there is also the posibility to pack all the code into Grid format (not JDL, because it exceedss the limits), but this has not yet been tested. In fact it has not been yet implemented.

TDAQ Grid Infrastructure

TBD by hegoi

Running Medium Scale Tests in Manchester MST@Manchester (part I)

1) Get an account on bohr3712.tier2.hep.manchester.ac.uk or get access to an existing one. For this contact Alessandra <Alessandra.Forti@manchester.ac.uk>

2) Either get a grid certificate and join atlas VO or contact Hegoi <Hegoi.Garitaonandia@cern.ch> to start the pmg_agent_wrappers on the manchester nodes.

3) Login to bohr3712.tier2.hep.manchester.ac.uk (note that from CERN you first need to login to lxplus064.cern.ch, which is the only machine in the manchester bypass list of the firewall)

4) Copy latest stable of tdaq-grid-tools to your home directory and set it up

cp -r /home/hegbi/tdaq-grid-tools-stable ~

cd ~/tdaq-grid-tools-stable

source setup.sh

5.1) If you got your grid certificate and you joined atlas VO:

a) Set up your .globus/ directory

- http://

b) init your grid proxy with extended duration

grid-proxy-init -valid '72:0'

c) now reserve the nodes:

grserve -w 80 -f nodes1 -i ids1 -t 5

This means you wish 80 nodes, you want to store their names under a file named nodes1, and you want the grid ids of the reservation jobs you submited under a file named ids1. The file nodes1 will be later used to connect to spawned servers to perform different tasks. The only use of the file ids1 will be the cleanup.

The '-t' option is to set the timeout. Note that the units are minutes.

If you really want to understand what is going on here read section TDAQ GRID INFRASTRUCTURE

5.2) If you didn't, Hegoi would have done the previous step for you.

You just have to ask him for the file named 'nodes1' in order to connect to the proper pmg_agent_wrappers

Running Medium Scale Tests in Manchester MST@Manchester (part II)

6) Ok now you have to start the TDAQ infrastructure:

For TDAQ-01-04-01

a) verify that you have X enabled

e.g.
type xterm

you only need this if you will

b) setup the environment

source /nfs/gridpp/setup_env.sh

c) generate a partition

TBD

ln -s /bin/false ssh

export PATH=$PWD:$PATH

d) copy the partition to a place visible by all the machines

cp part_whatever.data.xml /nfs/gridpp

e) start the root controller

startlocalinfrastructure.sh

(this will start the general ipc_server and the pmg_agent in the UI)

f) now you can start the rest of the pmg_agents, the ones in the grid

grstartpmgagents -f nodes1

optionally check

g) start the gui

start_partition-gui.sh /wherever/part_whatever.data.xml

if the pmg_agent fails try stoping and restarting the pmg_agents

grkillall -f nodes1

grstartpmgs -f nodes1

h) see what is going on (check the logs)

copy_logs.sh nodes1

then you should check in /nfs/gridpp/logs/$TDAQ_PARTITION/$USER

i)

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng 2358ca4d8786ac81a43debe2db362864.png   manage 1.8 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 29049509de632cc968b7f8db85a83cbb.png   manage 0.4 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 54afc8332b493a8ac9711bb307fb9198.png   manage 0.4 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 5569a98d0fd9909d0a29b596851be784.png   manage 0.4 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 5777ab8c1bb8dbe37fb948f3911233e0.png   manage 0.4 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 6a843a8fce53e6595530d68c97477050.png   manage 0.9 K 2006-07-30 - 09:04 UnknownUser  
PNGpng 9ca61f458c78bb5591d04aaaa14da0e7.png   manage 0.3 K 2006-07-30 - 09:04 UnknownUser  
Texttxt Votes_FavoritMeal.txt   manage 0.8 K 2006-06-28 - 10:32 UnknownUser  
PNGpng aa37f09a43243942684a29c7ea2f8382.png   manage 1.0 K 2006-07-30 - 09:05 UnknownUser  
PNGpng bf17d2b6259bb6065011f86e0bfa5ee0.png   manage 0.8 K 2006-07-30 - 09:04 UnknownUser  
PNGpng cb3f14e3f606c088811f2a0b076086ff.png   manage 0.8 K 2006-07-30 - 09:04 UnknownUser  
PNGpng eb082ec17390dd2bee3bf6b630da5e96.png   manage 0.3 K 2006-07-30 - 09:04 UnknownUser  
Unknown file formatext echo r1 manage 0.2 K 2006-07-07 - 15:40 YousafShah  
Edit | Attach | Watch | Print version | History: r40 < r39 < r38 < r37 < r36 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r40 - 2006-10-10 - PeterJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback