TWiki> ArdaGrid Web>LatticeQCDTeraGrid2010 (revision 10)EditAttachPDF

Scalability with SAGA, Ganga, DIANE through WLCG, TeraGrid and FutureGrid (Cloud) resources

Milestone: talk and demonstration at EGEE User Forum, beginning of April 2010


Work plan to be completed by March 15th 10am C (=5pm CET)

  1. access to TeraGrid and FutureGrid (Cloud) resources
    • KUBA: Ranger - still need to get access
    • KUBA: interactive access to Kraken be processed via a notary
    • OLE, SHANTENU: Kraken - GRAM submission to be investigated and solved
  2. test of openMP application on TeraGrid: how it scales?
    • tests done Abe, QB
    • OLE: test on ranger
    • KUBA: testing connectivity (diane workers): need to verify the worker nodes on Abe, QB, RANGER
  3. GangaSAGA/GRAM submission
    • OLE: improvement the GangaSAGA plugin (file handling + monitoring)
    • OLE: testing of GRAM access to all the machines in TG (submitting from lxplus using GangaSAGA backend)
  4. KUBA: replicate the mother snapshots


Define application level metrics and design and conduct the experiments.

Basic objectives:

  1. interoperability
  2. scale-out:
    • how many of runs we can perform concurrently?
  3. use of DIANE for smart scheduling decisions
    • use the performance measurments and feedback to a) task scheduler (optimal placement of tasks on a set of resources), b) agent factory (optimal influx of new resources)
    • we have one additional parameter to steer the system: number of OMP threads (cores), this poses an extra complication on "squeezing" the tasks into CPU-slots of nodes which are typically fixed size (8)


LQCD 2010 application is a natural continuation of 2008 simulations and links with it in the scope of theoretical physics research.

It requires openMP and at CERN has been compiled with Intel Fortran Compiler version 11.

Code: /afs/

Tarfile available for download (with precompiled executable and ganga helper scritps for local batch submission): wget

Compilation instructions on lxplus: README_lxplus

Timing tests (conducted by the user, PhdF):

46490.739u 914.793s 35:05.14 2251.8%    NCPU=32
25441.683u 368.346s 36:06.08 1191.5%    NCPU=16
20342.263u 147.546s 51:12.31 666.9%     NCPU=8
12567.237u 79.296s 1:00:01.40 351.1%    NCPU=4
7688.528u 26.305s 1:09:12.17 185.8%     NCPU=2
7840.101u 0.416s 2:10:41.77 99.9%       NCPU=1

Timing tests in TG:

OMP_THREAD_NUM queenbee abe
1 90m15s 97m4s
2 54m57s 61m38s
4 49m33s 50m29s
8 51m16s 50m11s

Full machinery: Ganga, SAGA, DIANE

The master is running on lxarda28 in /data/lqcd/apps/output

The worker agents may be submitted from lxplus to TG like this (change the tguser name!):

cd /afs/

diane-env -d ganga ./ --tguser jmoscick --diane-master=./MasterOID 

To submit from another host you need

The steps are then the same as on lxplus, you need to get the and MasterOID files from AFS to your local system.

Using GangaSAGA

The latest 1.4.1 release of SAGA is deployed in:


and will be automatically sourced by script below.

> cd /afs/
> ./myproxy-logon #you may use a different user name here
> source
> ganga

More info how to submit jobs here:

List of GRAM services in TeraGrid:

TeraGrid tests

Web resources:

Interactive access from lxplus (SLC4)

>ssh lxplus4
> cd /afs/
> ./myproxy-logon #you may use a different user name here
Enter MyProxy pass phrase:
A credential has been received for user jmoscick in /tmp/x509up_u979.


> gsissh

[jmoscick@qb1 ~]$ qsub -l nodes=1:ppn=8

Note: QueenBee supports allocation of entire nodes only (ppn=8 always)


System information:

I installed ganga 5.5.0, PBS hello world jobs run fine.

Compiled the app in ~/su3. It looks like there are no limits for stacksize in the system. Here is my application wrapper:

[jmoscick@honest3 ~/su3]$ cat
#!/usr/bin/env bash

#increase stack size if needed
#ulimit -s 50000
ulimit -a
./hmc_su3_omp < $2

Here is a small utility to submit test jobs:

In [100]:execfile("")

In [101]:su3show(jobs(10))
Out[101]: job 10: elapsed time: 1:09:30.595613 for 2 OMP threads

In [102]:su3show(jobs(11))
Out[102]: job 11: elapsed time: 1:09:30.600562 for 4 OMP threads

In [103]:su3show(jobs(12))
Out[103]: job 12: elapsed time: 1:09:30.606223 for 8 OMP threads

-- JakubMoscicki - 08-Feb-2010

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2010-03-24 - JakubMoscicki
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback