UsersTaskForceStDenis

Introduction

This topic leads to other topics regarding work related to the Users Task Force

Talks

Plan

Overall Goals:

  • hello world
  • athena hello world
  • following the workbook but with ganga as well as the Grid instructions imbedded in the workbook for MC generation

Test analysis:

To use the target anlaysis of VBF H with MH=160 for say 1000 events and see how to get this stored and cataloged. (Need to see what the resource impact is).

Also pursue this on the Glasgow computers in parallel to just get events made and in a place where data handling issues are not so severe.

Data handing is a big problem. We clearly have to submit jobs that generate small numbers of events and hence make small files. These then need concatenation. This is hard to do. We cannot store the intermediate files on tape and it seems that this job handling is not well integrated with the data handling. For analysis it needs to be. We also need a tutorial on the data handling issues and the state of the art there.

This as the beginning of a process and we should try to at least get some simulated files to a disk. This may mean having a large disk repository that our output goes to and we don't catalog. Later this should be a remote repository where it is cataloged and the full chain can be executed.

Execution of the Plan

Problems and Issues

This outlines the problems and issues that are pending and have been solved

Pending

Ganga Issues:

  1. It lost track of the jobs I had sitting around and not yet cleaned up. Indeed, the jobs are gone when the initialization was repeated, but a gangadir directory was repeated. How to recover the state?
  2. "Cannot plan: BrokerHelper: no compatible resources" error after submission of Athena AOD analysis to LCG. The error is reported here. The error was reproduced again on Nov 15, 2005 and the sequence of studies of this are reported here.
  3. Submission of Athena Hello World has too many output lines and there is a compile error in stderr. Details here. A second test shows a different error in stderr, and still has too many lines in the stdout.
  4. Unable to kill a job. This is documented in this place.
Grid Issues
  • Unable to get the digitisation phase to work on scotgrid.
  • Full generation with 10.0.1 and VBF physics crashes in the 5 event test.

Fixed but needs New Release for Me to test

  1. I get an error message from j.application.exe='/bin/echo'. As of Nov 7, 2005 Dietrich has made a fix and this needs to be tested when it is available.
  2. As documented in the instructions on how to run Hello World Different responses depending on how I use args to a job:
    • Using the following for a Grid Hello World job gives only Hello as the response:(See
      • j.application.env={'MESSAGE':'Hello World'}
      • j.application.args=['$MESSAGE']
    • For this case, see screen dumps of:
    • Using the following gives the full "Hello World" back as a response.
      • j4.application.args=['Hello World']
    • For this case, see screen dumps of the job object and the output.
    • As of Nov 7, 2005 Dietrich has made a fix and this needs to be tested on a new release.
  3. Cannot find dataset when preparing to run Athena AOD analysis on LSF. The error is reported here. As of Nov 7, 2005 Dietrich has made a fix but need to get release and test.

Solved

  1. Ganga 4.0.0-beta5 requires a .gangrc file -- tried to soft link to .ganga4 and while I can get things going, there is a complaint in the startup. See the screen shot of startup.
    • Solution: Do the ganga initialization again.
  2. The path changes -- i assume I got it right now.
    • Solution: Yes, the new path works. This is in my Ganga Guide.
  3. Cannot select the grid site as in page 7 of the Ganga Overview talk in the Ganga Tutorial. See this screen dump of the attempt. As of 15-Nov 2005, there are two choices for syntax. One of these fails on submission and the other works. We have decided to provide only one of these methods in the documentation -- it is in fact the more straightfoward method.

Wish List

  • Time stamp when state notification like
Ganga.Lib.LCG                      : INFO     job 126 has changed status to Scheduled
occurs.

Questions

Pending

Answered

  • How do I remove some jobs from my repository but keep the ones I want to use to copy and modify?
use j=jobs[126] and then delete j.
  • How do I clean up specific jobs and not delete all of them in a cleanup?
Same ans for the first answer. The cleanup only removes the input and output areas (like those in
inputdir = '/afs/cern.ch/user/s/stdenis/gangadir/workspace/Local/128/input/' ,
outputdir = '/afs/cern.ch/user/s/stdenis/gangadir/workspace/Local/128/output/' ,
It does not remove tar files. You may have other jobs that use them. So you need to decide that or write a script that looks at the job objects, pulls the tar file name from the object and sees if it is the last one that uses a specific tar file for the object you are deleting.

Warnings

  • If you say j=job1.copy then you get a funny error and it really copies the pointer. You must use j=job1.copy() so it makes a copy of the object. Nonetheless, you can say j.submit although you should say j.sumbit(). This is because IPython is less strict. I am not sure this is a good thing. But then I don't like the way ROOT forgives pointers vs objects.
  • Problems with "Job proxy expired" means that the job was around two long.

Two workarounds:

    • Find a site that is less busy
    • Use proxy renewal
You can add a keyword to the configuration MyProxyServer = myproxy.cern.ch

If you upoload a proxy to the myproxy server, the WMS will renew your proxy (and your job can wait even much longer in the queue)

I have not tried it recently, the command to upload a proxy is caled myproxy-init. It should be discussed in the LCG User Guide.

Obtaining Grid credentials

This turned out to be harder than expected, and took up about 3 weeks of time (plus 2 weeks I was away on holiday). I was thwarted from getting my GridPP credentials when, after getting them, I was told by my browser to use the master password, something I had set when I got my laptop and never used since. This would mean revoking the certificate and since our CA had had this problem with the last 3 users he feared losing credibility. I therefore tried to use the Fermilab kerberos certificates. This does not work since the openssl versions had a conflict in the identity of the USERID field and when I registered with the VO, I had a mangled version of the field whereas when i used the credentials to get a proxy, I had an unmangled version.

This was not resolved and it remains the case that Fermilab kerberos credentials cannot be used to access LCG resouces with the ATLAS VO. This problem was not obviously on anyone's goals to solve, so I set this aside and went for registration through GridPP. This certificate was revoked and many many passwords later, I do have a certificate that works.

The experience, problems and attempts are documented here.

hand.gif It seems that at this early stage, it is unwise to frighten CA administrators to the extent that errors made by users cause them to put off users getting certificates.

Running the Full analysis chain without Grid

Running on lxplus

Unable to run the simulation as a batch job. This was traced (9-Nov-05) to a skipevent 10 card. A log of the work to get back to running is here. Progress:

  • Athena Hello World DONE
  • Pythia DONE
  • Simulation DONE
  • Digitization
  • Reconstruction
  • Produce AOD
  • AOD Analysis
  • Ntuple

Running at Glasgow

Done on cdfg.ph.gla.ac.uk -- details here. Working on scotgrid as well.

Running the Full analysis Chain with Grid

Work to do this both with the Workbook instructions and within the Ganga framework is considered here.

Running Grid from the Workbook instructions

The RStDWorkBookGridWork details the work here. Successful steps thusfar:

  • 5 Event Test on the Grid (storage of each stage output to RAL and Scotgrid)
    • Getting an Account
    • Setting up your Account
    • Running HelloWorld
    • Running Athena HelloWorld
    • Generation (on Scotgrid specifically)
    • Digitization
    • Reconstruction
    • Produce AOD
    • AOD Analysis
    • Ntuple
  • 100 Event Test on the Grid
    • Generation
    • Simulation
  • 1000 Event Test on the Grid
    • Generation
    • Simulation

Running Ganga

The RStDGangaGridWork details the work here. RStDGangaGuide describes how to use Ganga. Successful steps thusfar:

  • Getting an Account
  • Setting up your Account
  • Running HelloWorld:
    • Locally
    • On LSF
    • On Grid
  • Running Athena HelloWorld
    • Locally
    • On LSF
    • On Grid

stop.gif The next steps would be

  • To run Athena AOD analysis on LSF. This fails (see report above) because the dataset cannot be located in the preparation of the job. As of 15-Nov-05, Dietrich has made a fix. What to do to make this work is a query to him.
  • To run Athena AOD analysis on LCG. This returns an error from the resource broker as reported above.
-- RichardStDenis - 20 Oct 2005 2:49 BST

Running the Physics Analysis

The AOD Tutorial will help.

The physics coding

Considering EventView

InstructionsForEventViewin100x

InstructionsForEventViewIn1001 to build an event view in 10.0.1 and look at missing et.


Major updates:
-- RichardStDenis - 18 Oct 2005

%RESPONSIBLE%
%REVIEW%

Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointppt RStDUTFNov2205v1.ppt r1 manage 52.5 K 2005-11-19 - 17:56 RichardStDenis Version 1 Talk for UTF Meeting 22-Nov05
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2005-11-30 - RichardStDenis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback