The orginal scripts for FLUKA simulation on the GRID were developed by Vadim Talanov and his coleague.

I adapted them for background simulation and made some things more easy.

The starting point for understanding the way of running FLUKA on the GRID can be a TWiki article by Talanov where he gives a hint how to get the GRID certificate and check if it is valid.

The scripts I modified can be obtained from the git repository on the AFS ( /afs/cern.ch/lhcb/software/GIT/curie.git/ in the "grid_next/" )

So what is the main differences between Talanov's scripts and mine?

* mine are modified to use the magnetic field files

* there is no need to split FLUKA input file into 2 pieces

Talanov used two files to insert between them the "RANDOMIZe" card with a different random seed for each job. I use inplace edit ability of the SED utility to fix the random seed and magnetic field maps location.

* Talanov used monothonic sequence of the number as the random seeds. I use a file with primary numbers instead, only if the file is absent scripts reverts to the sequential scheme.

* Additionally I put libg2c.so fortran library into fluka.tar.gz, because some nodes do not have fortran installed (!).

What I have found during the submitting test jobs is that one should not make a very long jobs. There is a big probability for a long job to become stalled.

I launched 20 jobs 3 hours each - 1 job stalled.

80 jobs each 48 hours in average - 1 job failed due float point exception 3 stalled, manually rescheduled jobs completed without problems (except one with FPE).

40 jobs 80 hours each - in 2 days half of the jobs stalled, I resheduled them and submited aditional 10 jobs, in a week only 17 succeded, the rest failed due various reasons. Most of them were "Stalled, pilot not running", but there were also small portion with "Failed to upload, bad credentials".

Once jobs have completed there is need to get the data back and merge all the output files. This section in the Talanov's article is empty. At least the way to get them back is described. If the output data is dumped in the raw fortran binary format one can use some variation of USBREA FLUKA utility. But if the output is in standard ASCII files, the task becomes much more complicated. The merging of the files is complicated by the fact that depending on the FLUKA version the layout can change itself. This is why I wrote a very generic ruby script which treats input files as a sequence of scientific formed numbers interleaved with non-numbers. A list of the files which have the same layout should be given to this averaging script, on the output it will print the average of the all input files. If the numbers in all files at some position are equal they are left as they are to prevent from breaking the layout (i.e. boundaries, timestamps, dates, bin count). The script can be found at the GIT repository as well ( /afs/cern.ch/lhcb/software/GIT/curie.git/ in the "grid_next/merging_results/" ).

-- VasilyKudryavtsev - 10-Sep-2012

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2013-07-31 - VasilyKudryavtsev
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback