Production 2007

workdir: /afs/cern.ch/sw/arda/install/su3

Contest

Goal: who produces the largest number of iterations (SU3 steps) wins

Prize: to be specified

*Clike here for current score - live!*

Please follow the following rules:

  1. use seeds from your own range.
  2. do the seeds in sequence: s, s+1, s+2, ...
  3. do not repeat the same seed
  4. remember that jobs are sitting in the worker node forever (until they get killed). do not submit 1000 seeds at the same time!

person seed range
Patricia 1001-2000
Andrew 2001 - 3000
Massimo 3001 - 4000
Kuba 4001 - 5000

run instructions

This submits, given the seed, one master job split in 21 ("beta") subjobs.

$ cd /afs/cern.ch/sw/arda/install/su3
$ /afs/cern.ch/sw/ganga/install/4.3.6/bin/ganga
In [1]: execfile('su3helper.py')
In [2]: j = make_su3_job(seed)
In [3]: j.backend=LSF() # or LCG()
In [4]: j.submit()

make_su3_job takes an additional (optional) argument which specifies the desired processing interval (calibration). By default it is set to 1hour (3600s) and probably should not be changed. If you want to play however, then type help(make_su3_job).

Meeting with Philippe 02.07.2007

Application takes a snapshot of the space ( fort.0 1.2MB file) and transforms it into another snapshot. The parameters of the transformation:

  • beta - corresponds to a temperature
  • seed - see tape manu/tape auto parameter
  • ntraj - number of iterations to be performed

If the "tape auto" mode is set then the seed is read/stored automatically in the snapshot files so the runs may be repeated one after another and they form a single chain of transformations. In that case 1 run of the executable with ntraj=N is equivalent to N runs of the executable with ntraj=1 (so the granularity of a run may be adjusted accordingly).

If the "tape manu" mode is set then the seed is read from the input file (and not from the input snapshot file).

Starting from a single mother snapshot file, we can perform transformation for each beta parameter independently. There are around 20 predefined beta parameters so we may have 20 independent parallel computing "streams". Note: in the analysis step (done later by the researcher) the results of a few first iterations will be discarded (a process of stabilization). But this is not important for the execution of the application.

Starting a job on the worker node, we will use the first few iterations as a benchmark, and we will adjust the ntraj parameter accordingly (to balance the rate of generation of the output files). Each time when the executable run is finished, we will send the output files back (and also the snapshot file). In case the job is killed by the batch system, we only loose the last ntraj iterations.

We have to check and deal with multiple processor architectures (AMD, INTEL, 32/64). This is an empirical exercise.

Additionally, we may use more machines in parallel, allowing parallel execution for the same beta parameter with different initial seed. In this case, int the first run of the executable we use "tape manu" parameter to reset the seed and then we use "tape auto" parameter to produce the chain of iterations.

Utilities

The server is up and running on lcgui003. Output goes to: /afs/cern.ch/sw/arda/install/su3/outputserver

File cleanup and backup EXAMPLE:

cd /afs/cern.ch/sw/arda/install/su3/outputserver
echo `find su3 -cmin +120 -type f` > backup.ALL.20-07_2007_15:18.list
../make_tar.py /tmp/backup.ALL.20-07_2007_15:18.tgz backup.ALL.20-07_2007_15:18.list
../clean_files.py backup.ALL.20-07_2007_15:18.list

Commands to run the server:

# make sure that AFS token is vaild and renewed every 24 hours -- depends tty?
limit stacksize 1024
cd /afs/cern.ch/sw/arda/install/su
nohup  /afs/cern.ch/sw/lcg/external/Python/2.5/slc3_ia32_gcc323/bin/python f90_server_mt2.py &

Logs are in: /afs/cern.ch/sw/arda/install/su3/outputserver/su3/outputserver/server.log

Simple xmlrpc file server

Login to lcgui003.cern.ch

  1. cd /afs/cern.ch/sw/arda/install/su3
  2. run server in window 1: ./f90_server.py
  3. run example client in window 2: ./example_script.py
  4. files are transferred to: outputserver directory (this location is hardcoded)

How to change server host and port?

On the client:

setenv F90_SERVER_URL http://myhost.com:9999

On myhost.com run the server like this:

./f90_server.py 9999

Reminder: the firewall will block all the traffic to lxplus nodes even for LSF jobs

-- MassimoLamanna - 11 May 2008

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2008-05-11 - MassimoLamanna
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback