Titulo, autores

Observacoes -- nenhuma

Referencias

Poster maldito do tracker Paper do HPU taskforce

OBS: Revisado ate o 4!

1

Texto

One of the biggest challenges in the CMS detector, is the precise reconstruction of particle tracks. This is done by very complex algorithms, which translates into a CPU intensive task. At the scale of the LHC, understanding how the algorithm behaves according to event complexity, is one of the key factors to process workflows, in a more uniform and efficient way. This analysis makes possible to, based on previous observation, to estimate how the event reconstruction time will look like for incoming data.

Figuras

  • DQM time per path

Talvez mostrar esse poster?

https://indico.cern.ch/getFile.py/access?contribId=8&sessionId=4&resId=0&materialId=poster&confId=189524

2

Texto

The complexity of track reconstruction, comes from the number of tracks, and how much they overlap, making the algorithm to iterate more before distinguishing the tracks. This has a direct relation with the instantaneous luminosity, or with the "Number of Pile Up interactions per bunch crossing". The former is not measured, but a function of the accelerator running conditions and instantaneous luminosity , for this reason, we are focusing in instantaneous luminosity on this study. Although Pile Up is a more intuitive value.

Below we can observe in the CMS event display, a High Pile Up event, compared to an event that has less complexity.

Figuras

event-7-z-view-all-and-signal.gif

cmsdiphoton.jpg

r150431-e541464-3dv4.png

3

Texto

Less talk more code, this is the effect of the luminosity/PileUp variation into real data taking

The CMS Fill Report already provides interesting plots of instantaneous luminosity and pile up over time, of a given fill, here we can compare those, with the reconstruction time per event of this data observed in the Tier-0.

Figuras

  • Luminosity over time
  • PU over time
  • TpE over time
DISPLAYED VERTICALLY, SO ONE CAN VISUALIZE HOW IT EVOLVES OVER TIME

4

Texto

The following is a curve of CMSSW performance, for a given Release and Primary Dataset (type of event). It varies significantly according to the type of event, because, it is very related to the type of events, but is a very good reference, to estimate, based on observed values, how the time per event will look. There's an important sistematic error in this measurement, which is the fact that the workflows run in non-uniform farms. different CPU models will result in different processing times for the same event.The advantage is that as a general curve, it covers better all the CPU models range that we have in the farms that we utilize. It is at the end what is most useful for central operations.

Figuras

Usual curve in the known interval. Supposed low chi2 -- report chi2?

singleMu_TpEVsLumi.png

5

Texto

Some measurements were done on PromptReco workflows, to observe how close the estimate can get, or how far from the real value it can vary. Please consider the error introduced by the CPU difference fluctuation. In the Tier-0 farm, this is 37.75% of HEPSpecs 2006 performance difference, between the fastest and slower CPU. in the figure 37 we can observe the distribution of different CPU speeds in the farm, and in figure 47, the distribution of the error distribution for the TpE prediction for 35 different workflows. Also the table 42 showing different specific cases.

Figuras

Tabelinha =D Ver este post no HN, que agora e pato.

6

Texto

One of the uses of this measurement, is to have an idea on how the reconstruction time will look like in higher luminosities, for example, extrapolating until the Run2 (2015) luminosity. Obviously some things will change that will improve the curve parameters, but we should look at it as a guideline to see what kind of challenge lies ahead, not a precise report.

Figuras

lumitpeSingleMu-fitted2.png lumitpeSingleMu-fitted.png

7

Texto

Due to the wide range of luminosity, and its effect on time per event, we can observe here some distributions on a multi-run reconstruction workflow. the consequence, is the famous effect where 95% of the processing gets done in 50% of the workflow total time, and a considerable ammount of time, is due to high luminosity jobs, that can take up to 48h to finish, if they don't retry.

Figuras

Distribuicao TPE do DQM, distribuicao de job length do rereco

8

Texto

In order to have automatic ways to monitor this behavior, there were developed automated ways to generate this plot. At the end of a reconstruction workflow, the Workload Management Agent, harvests the performance information and uploads to a central database, in CMS DashBoard. This information is used in monitoring interfaces (figures 2 and 4), and also can be queried by automated systems and scripts, through a DataService.

Figuras

Dashboard tools

9

Texto

A work in progress is, to change the way we split jobs in a workflow in CMS. Today, we either have a number of events or number of lumi-sections, defined by operators, based on how many are needed to average the job length for 6h. A new splitting algorithm is being written, where operators inform the expected job length, the system will query DashBoard's performance database, estimate what is the time per event, and balance job inputs(number of events per job), in order to have more uniform running time, by considering luminosity in the data being processed.

Figuras

?? Talvez nao
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2013-03-10 - SamirCury
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback