Transportservice Discussion

Timing measurements

by Ivan Belyaev

Making the simple intrumentation of TranpsortSvc with the trivial n-tuple:

/afs/cern.ch/user/i/ibelyaev/w0/cmtuser/Det/DetDescSvc/v1r4/src/TransportSvcIntersections.h

XXXXX::intersections {

m_chrono.start() ; ++m_stat ;

{ the code goes here }

m_chrono.stop() ;

if ( 20000 < m_stat.flag() ) { ITupleTool::Tuple tuple = m_tuple->nTuple( 1001 , "My tuple" ) ;

Gaudi::XYZPoint p1 = point + vect *tickMin ; Gaudi::XYZPoint p2 = point + vect *tickMax ;

tuple->column ( "p1x" , p1.X() ) ; tuple->column ( "p1y" , p1.Y() ) ; tuple->column ( "p1z" , p1.Z() ) ;

tuple->column ( "p2x" , p2.X() ) ; tuple->column ( "p2y" , p2.Y() ) ; tuple->column ( "p2z" , p2.Z() ) ;

tuple->column ( "du" , (float) m_chrono.delta ( IChronoStatSvc::USER ) ) ; tuple->column ( "de" , (float) m_chrono.delta ( IChronoStatSvc::ELAPSED ) ) ; tuple->column ( "dk" , (float) m_chrono.delta ( IChronoStatSvc::KERNEL ) ) ;

tuple->write() ; }

In this way running 50 Brunel events I've got few plots.. some of them are useful for interpretation but some of them are absolutely puzzling [Might be they are fine for Velo or tracking experts]

A plot of CPU-consumption of "de" versus z-distance in between 2 points. One clearly sees the linear term:

time_0.gif

The same plot but "weighted" with CPU-time (cumulated CPU consumption). Z-axis in %. surprisingly - "the line" does not clearly appear on this plot.

time_01.gif

From now on lets consider VELO only. The same plot, but limites only for VELO (both points are requred to have |z|<50cm. Up to now there are no surprizes.

time_02.gif

Distribution of the CPU-time spent for calls inside the VELO. The plotted value is LOGARITHM! one clearly see that the difference could be 3 order of magnitudes.

time2.gif

OK, lets start to see where we spent he most of the time: For CPU_comsuming calls (log10(de)>3.2) the 2D distribution of xy-distribution of coordiated of the first (top) and the second (bottom) points. The magic 10mm structiure is clearly seen.

time_03.gif

for comparison; the similar distribution if (log10(de)<2.7). It looks much more reasonable...

time4.gif

min(R1,R2)%max(R1,R2) for all calls inside VELO.clearly one can see that NUMBER of calls in the configurtaion where one point is on the beam line (min(r1,r2)=0) and the second point is on the "magic" zylinder of 10mm radius is around 20% of all calls. Please look more carefully for the top plot. It is indeed VERY interesting!

time_04.gif

The same plot but reweighted with CPU-conumprion in %; One clearly see that such "strange" calls from r=0 to r=10 mm takes 50% of all time in VELO.

time_05.gif

And here my question come:


* For me it is VERY difficult to imagine that TRACK FIT uses the transport service so frequently in this "crazy/strange" configuration (R:0mm<--->10mm).
* It is clear what component does it?
* Probably pattern recongnition or PV-fit could be resposible for such calls. (50% of CPU in VELO!!)
* MC: NO, if you look at the Brunel timing per algorithm (which includes time spent in the transport service), but pattern recognition and PV fit are fast. The time is really spent in the track fit.

by Thomas Ruf

By using a simple python script, transporting a state in the velo to the beamline, I measured on a 3Ghz machine, that this takes 20ms. If the RF foil is removed from the geometry, this is reduced to 10ms. If I naivly remove also the Velo, timing increases then to 30ms. For doing this measurements, I took a given state and transported it 1000times to a slightly different z position. If always the same end state is used, the timing is decreased considerably, but this is not the real situation. For comparison, using a linear extrapolation, this takes 0.1ms.

How to improve the Transport Service, some recipes, advices:


* IB: Proper hierarchy: As soon as 2 points located inside some detector element, and there is no lower level detector element whcih contains both of them, the timing is proportional to a number of physical volumes on the top level of the logical volume, which is associated to detector element. (one needs SEQUENTIALLY check that the line in between 2 point does not cross any other child physical volume). Thats why I've ALWAYS stressed that for the given complexity of the geometry there exist only one way of improving of the performance - one MUST introduce the hierarchy and envelops. It is the only way to switch form LINEAR to logarithmic dependence for given number of REAL low-level volumes.
* IB: Special case, Velo RF foil. Some RF-fragments represent an approximation of the long cylinder segment with union I guess it could be easy to check if cylinder could be better in this case. Also probably the special shape of torical segment could be used more efficiently instead of union of 3 cone segment. It is not clear for me. It needs to be measured. But it is not excluded that one gains here.. Also the encapsulation of "active VELO" from non-active velo, e.g. by using the cylinder of R=6-10 cm could help. But here I guess one can get difficulties with current description now. E.g probably Velo-halves will be almost impossible to describe..
* IB: Probably one can gain O(10%) from optimization of transport service itself. As Matt once pointed "copy+push_back" in many cased could be substituted with more efficient "insert", also many checks of input arguments could be eliminated or substituted with assertions… GeometryInfo::isInside for detector elements with daughter volumes (it is a good fraction of detector elements!) checks twicely "isInside" for its daughters.
* IB: "Fake" geometry solution is possible even now. One need to call Transport Service usingh the optional agrument - "alternative geometry", e.g. one can create the "fake" Velo (parallel simplified structure of few volumes, e.g. "/dd/structure/FakeLHCb/FakeVelo"), and use this fake geometry for SOME*/or *ALL calls for transport service. It exists now and could be used/tested. --* IB: One also needs to investigate the proper usage of Transport Service. Lets consider the case of track fit. I suspect that it if called many times(==nomber of measurements?) per fit of one track. I am not sure that it is rigth techniques. In this case we already know well (I guess pattern recogniciton provide us with this knowlede) the trajectory/path of the track. Assuming that track has 2 straight segements (before magnet) and after magnet and couple(or a bit more - depending on the estimated momentum) straight segments in the magnet. Having this knowledge one can call Transport service only 2+2(3?) times instead of "Nmeas". Each call will take "in average" a bit more CPU, but finally one gains..


* MN: For TT and a bit for IT we tried to keep to the 3 volumes rule and introduced few volumes to isolate various parts. What we found was that it was quite hard to do this in practice.


* TR: There exists an option in the geometry framework, coverTop, which is suppossed to simplify underlying geometries. More information is needed here, please add !!


* JP: It is not always easy to have an optimal geometry. This is particularly true of the Velo. We can have a more balanced hierarchy, but at the cost of very complex envelopes which will themselves bring in a processing time penalty. But I think at least some volumes can be simplified independently of the hierarchy.


* JP/TR: Some brainstorming on simplified geometry. Would it be possible to make a Phi-Eta-distance (and possible primary vertex Z position?) cumulative material map and use this? The binning could be variable in all dimensions, and in itself would give a way of tuning the amount of detail in the material description vs. the time spent calculating it and also the impact on performance.


* JT: A lot of time is spend in the RICH, should be improved.

Questions about Geometry Framework, Transport Service which need clarification or corrections


* JT: The TransportSvc could greatly benefit when it starts to use Assembly volumes.
* JT: It would be a relatively simple task to turn on/off detector elements for the tracking. We could make an extra XML tag which can be used to switch off detector elements which are not used in the tracking (e.g. the magnet, all the supports, etc). The TransportSvc could then easily use this switch to determine if it can ignore some detector elements.
* JT: For some complicated shapes with many volumes inside, the TransportSvc could try to smear the material inside the volume. This can also be implemented as an XML switch. (I (TR) guess, this refers to the coverTop option of the geometry framework).
* JT: It is indeed not easy to improve things in the detector description philosophy, and I also have the impression that in LHCb way things are now is partially due to a wrong understanding of this philosophy. Still I give one "arbitrary" example:

<geometryinfo lvname ="/dd/Geometry/AfterMagnetRegion/T/IT/RMS/lvFoilRMS" support = "/dd/Structure/LHCb/AfterMagnetRegion/T/IT/StationRM/Ladder1RMS" npath ="pvFoilShielding1" />

I understand here that this Foil1Lad1 must be associated to something in the geometry. But why give 3 references to this? One reference to the full physical volume path is more than enough. The second thing I don't understand is why this needs to be a detector element at all! This is a non-sensitive volume (somewhere in a non-sensitive radiation monitor detector). We do not need to locate it in the reconstruction. And don't tell me this is because of Panoramix, because this The third thing about this comes up when I realize that the TransportSvc uses detector elements to navigate through LHCb. Maybe my understanding of the TransportSvc ends here, but I wonder what would happen when we have no detector elements at all. Would it still find volumes? Would it be slower or faster? In any case, my question is why is there any reference to detector elements? They should all be replaced by physical volumes!

--* JT: Concerning the assembly volumes, with a complicated shape it is indeed impossible to figure out whether you are inside the complicated shape. BUT is is very easy to figure out whether you are not inside the complicated shape in case you are far away from it. In technical term, in case you're not in the coverTop volume, you are not in the complicated shape. Clearly this will be a potentially large gain in CPU time.

Characters in this play:

IB = Ivan Belyaev alias Vanya MN = Matthew Needham alias Matt JT = Jeroen Van Tilburg TR = Thomas Ruf JP = Juan Palacios

-- ThomasRuf - 30 Jun 2006

Topic attachments
I Attachment History Action Size Date Who Comment
GIFgif time2.gif r1 manage 9.8 K 2006-06-30 - 10:58 ThomasRuf  
GIFgif time4.gif r1 manage 10.1 K 2006-06-30 - 10:59 ThomasRuf  
GIFgif time_0.gif r1 manage 12.5 K 2006-06-30 - 10:29 ThomasRuf  
GIFgif time_01.gif r1 manage 15.2 K 2006-06-30 - 10:58 ThomasRuf  
GIFgif time_02.gif r1 manage 13.1 K 2006-06-30 - 10:57 ThomasRuf  
GIFgif time_03.gif r1 manage 9.9 K 2006-06-30 - 10:34 ThomasRuf  
GIFgif time_04.gif r1 manage 13.9 K 2006-06-30 - 10:34 ThomasRuf  
GIFgif time_05.gif r1 manage 13.6 K 2006-06-30 - 10:34 ThomasRuf  
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2006-07-04 - MarcoCattaneo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback