Atlas Sw Validation Minutes 070417

Atlas Sw Validation Minutes


Minutes of the Software Validation Meeting on Tuesday April 17, 2007

Manuel Gallas

Monday April 23, 2007

In Attendance

Karim Bernadet, Claire Bourdarios, Andrea Di Simone, Olga Igonkina, Lashkar Kashif, Wolfgang Liebig, Hong Ma, Manuel Gallas, Ricardo Goncalo, David Quarrie, Adele Rimoldi, Peter Sherwood, I. Ueda, Guillaume Unal, Alex Undrus, Sven Vahsen, Jain Vivek, Wouter Verkerke

Apologies: Paolo Calafiura, Davide Constanzo, Frederick Luehring, RD Schaffer, David Rousseau, Steve Goldfarb (replaced by Lashkar Kashif).

Meeting Agenda

The meeting agenda is at: http://indico.cern.ch/conferenceDisplay.py?confId=15091

The next phone meeting will be on Tuesday, May 15, 2007 at 16:10 CERN time (from 16:10 to 17:40).

Meeting Coordinates:

ATLAS software validation

(Manuel Gallas)

Dial-in numbers: +41227676000 (Main) Access codes: 0132753 (Participant) Participant site: https://audioconf1.cern.ch/call/0132753

The 24th April will be the bi-weekly physics validation meeting, the 1st May is holiday day at CERN (and most of the countries), and the 8th of May is again the time slot for the physics validation meeting.

IMPORTANT: During this 3 weeks period without sw-validation phone meeting please send the report of your domain/detector in a written format preferably before Monday at noon(in this way the report to the SPMB will contain the most updated information).


Software validation coordination

First phone software validation meeting that will be always in a bi-weekly basis (those weeks where there is not a physics validation meeting). The report from the different sub-domains (core, databases, simulation, sub-detectors, etc) will be collected in a weekly basis in order to monitor the software status with higher frequency. For other aspects concerning the organization of the sw-validation activities look into the agenda presentation.

Actions:

  • Create a hyper news forum for the physics and software validation activities.
  • Create a mailing list for the communication with the software validation coordinators.
  • Create a Twiki page to host the links of the software validation activities in the different software domains and sub-detector software.
  • Follow up those items described in the A.O.B section: RTT and ATN documentation and examples, the mailTo , doc, and rtt classification tags, counting the ERROR lines using the existing RTT machinery,


Report from the physics validation activities

Summary report from the Physics validation phone meeting (10th April) by Wouter Vekerke. Presented the status of the validation with 12.0.6.1 and new issues in 12.0.6.3 (detailed information in the agenda).

Actions: In the future bugs found in the production and validation of sample A will be listed in order to understand why these bugs were not detected earlier.


Report from the software domain and sub-detector software

Core Services (Paolo Calafiura)

Status:

Testing based on ATN which is more suitable for AtlasCore.Tests are OK. There is a bug in Gaudi: the ServiceManager proclaims in a FATAL message that it can't find any plugin to load ... and then it happily moves on. It's harmless but embarassing and it will be needed to mask it within the job transforms.

News:

it will be LCG51 and it will be needed to check that everything works

Note:

the LCG51 is already in the LCG nightlies and it will appear in the 13.X.0 and 13.0.X nightlies by Monday (23th April rel_1). LCG51 contains: Frontier_client - 2.7.2, dcache_client - 1.7.0.31, ROOT - 5.14.00e, RELAX - 1.1.8, CORAL - 1.8.0, POOL - 2.5.2, COOL - 2.1.1

Database (David Malon)

Status:

No report.

Infrastructure (Fred Luehring)

Status:

Right now there are no open Software infrastructure issues for release 13 (Fred was running the SIT meeting at the same time)

Generators (Giorgos Stavropoulos)

Status:

No report.

Simulation (Adele Rimoldi)

People which will monitor the tests:

  • ATN-NICOS (M. Gallas, Andrea Di Simone)
  • RTT (I. Ueda, Andrea Di Simone, M. Gallas)

Status:

Since rel_5 last week we can simulate events already generated (for about ~15 days it was not possible). The general status seems good and we are performing a more detailed analysis of the created log files in order to check that the full migration to configurables is working OK. A new tag for G4AtlasApps will be requested by the end of the week in order to remove some obsolete- messages in the python layer and to add two more simulation geometry tags.

Digitization (Sven Vahsen)

Status:

Generally the status is fine. We have a new way of initializing random number seeds, and are transitions to configurables/jobProperties. We try to have a version in place that works at all time. Most of the remaining work is on jobOptions level. Only known showstopper is related to seeds with the new ranlux random number service, but the old ranecu is still the default and works. The problem seems to be in the ranlux service itself, and may require a bugfix in Core ? we?re still debugging it. We?re however often crippled by broken persistency (see next point). Muons still have persistency problems.During the last weeks, we had many weeks where persistency for some subdetector was broken in the nightlies. (Tilecal 1-2 weeks, muons still problematic.) There are no known problems with the pileup software per se, but there are several performance-related issues in pileup jobs and particularly in reconstructing pile-up events, as reported by Seth. An open question is whether the current time-shuffling of background events is good enough for Muon detectors which have a flat response over their monstrous sensitivity window. It may very well be that we'll have to invent a different mechanism to treat, in particular, cavern background.

Plan:

RTT: For digitization I hope that we?ll eventually have two types of nightly tests, "integration tests", and "detailed tests".

  • The integration tests are mostly in place in the RTT under ?Digitization?: they run the default digitization jobOptions for a number of different configurations: Different combinations of subdetectors, with/without LVL1 trigger, with pileup, with Ranecu, with Ranlux etc... - These tests are only intended to catch obvious runtime problems and report success for each job that completes without ERRORS.
  • These tests are kept up to date and checked most days by Sven Vahsen
  • The detailed tests will be specific to each subdetector, and produce a number of standard histograms used to monitor the RDOs produced by digitization. Some tests are in place, but a lot needs to be added. In the case were no additional manpower can be found for setting up RTT tests, I will have to keep relying on the current digitization contacts to add RTT/ATN tests needed.

Reconstruction (David Rousseau)

Status:

No report

Plan:

Link to the existing reconstruction testing infrastructure:

https://twiki.cern.ch/twiki/bin/view/Atlas/RecoIntegrationTests

EDM (Davide Costanzo, RD Schaffer)

Status:

At present we reached a stable situation with 13.0.10 (it compiles and we don't have any outstanding problem). The issue is to introduce T/P separation for a few extra classes without clients noticing it. Which means we will have to be extra careful not to break things for reconstruction to progress. For this we plan to use the -val nightly before we collect anything.

Plans:

One idea that we had to validate the T/P converters in AtlasEvent is to have a dump of the EDM content into a text file. A diff of these files would let us know if something changed in the converters and detect possible runtime problems as well as unexpected schema evolutions. This is possible for many classes today as they have dumper methods, but we never managed to put things together in a set of algorithm, as jobO and an RTT script. This would be a nice project for someone who is looking for something to do in the sw validation area. But I'm not sure such a person exists...

PAT (Ketevi Assamagan, Tadashi Maeno)

Status:

No report

Inner Detector (Markus Elsing)

Status:

No report

LAr Calorimeter (Hong Ma, Guillaume Unal, Karim Bernadet (TBC))

People which will monitor the tests:

  • Karim Bernardet will look into ATN
  • Karim Bernardet and Helenka Przysiezniak will look into RTT tests

Status:

  • simu: OK
  • digit: DetFlags.LVL1_setOff() otherwise job breaks
  • reco: (without ESD not AOD -> need checking) it looks ok in rel_6

Actions:

Karim will open a bug report concerning the problem found in reconstruction .

Tile Calorimeter (Sasha Solodkov)

Status:

No report.

Muon Spectrometer (Steve Goldfarb, Lashkar Kashif)

People which will monitor the tests:

Lashkar Kashif will look into the tests. He will also do an initial evaluation of the tags introduced in the bugfix/val (compilation OK and not major problems)

Status:

The following packages have consistently failed to build under recent val nightlies (as of April 17):

  • In AtlasSimulation: MuonDigitTest, MuonHitTest, MuonGeomTest
  • In AtlasReconstruction: CSC_DHoughSegmentMakerAlg, CSC_ DHoughSegmentMakerTool, MuidParticleCreator, MuonIdentificationHelper, MuonCommAlgs, MuonTBAna
  • In AtlasAnalysis: MuonIDValidation, MuonInSituPerformance, MuonRecValidator, MuonValUtils

More information:

see the muon slides in the agenda

Actions:

since the AltasSimulation project is already in tag approval mode Lashkar will look in more detail what is ongoing with the muon packages that fail in AtlasSimulation

Tracking (Wolfgang Liebig)

Present on the meeting but the report will come later.

Trigger ( Ricardo Goncalo, Olga Igonkina)

People which will monitor the tests:

Olga Igonkina, Simon George, John Baines will look into the tests

News:

We've got ATN tests (see below) that are now running. This is after a long period when this was not possible due to bad nightlies. We'll now put similar tests in RTT'

More information:

http://indico.cern.ch/materialDisplay.py?contribId=7&sessionId=1&materialId=slides&confId=14624

Production Transforms (Manuel Gallas)

People which will monitor the tests:

Manuel Gallas together with the transformation responsibles will look into the RTT tests. Seth Zenz will look into the Full Test Chain

Status:

still focused in the 12.0.6.X caches (we plan to have 12.0.6.4this week) although we will jump into the AtlasProduction 13.0.X by the end of this week. Testing machinery based on simple-internal tests we run before and after we build the cache (also almost daily by hand) and Full Test Chain (Seth). We had establish the 12.0.X nightlies for this purpose.

News:

A review of the actual tests is planed in which concerns the cache validation internal tests and RTT. As it is now for Evengen the transforms should have associated RTT tests using static input data (data per release and checking the backwards compatibility).


A.O.B

From the discussions during the meeting it seems that:

  • It is needed to create better links to the existing documentation for the ATN and RTT test machinery. Examples of different configurations are welcome.
  • Activate the technical work needed to ensure that the mailTo, the doc and the test-classification xml specification tags in the RTT can be used. Once this will be ready we will proceed project by project (starting from those that are now in tag approval) and we will request this information in all the RTT tests.
  • Usage of the RTT machinery in local mode
  • Print in the RTT pages the time used in each job.


-- Main.gallasm - 24 April 2007

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2007-05-06 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback