Accounting data validation

In order to check trustworthiness of the provided accounting data , it should be compared with data in the experiment specific accounting systems. However we should keep in mind, that it won't be possible to get 100% agreement since we will compare two different things:

  • Batch system and correspondingly APEL consider resource usage by real jobs which are running at the site - pilot jobs
  • Experiment-specific monitoring and accounting systems consider payload jobs which are executed by pilots

In principle, experiments try their best to decrease pilot overhead and to make these two measurements as close as possible in order to efficiently use provided resources. However , we do not have clear understanding whether it is always the case and the situation can be different depending on a particular experiment.

On the other hand this is the only possibility to check data correctness in the accounting system and therefore we go this way assuming that pilot overhead is not too high. Whether it is a valid assumption or not, could be a side outcome of data validation procedure.

Some important definitions

There are various metrics exposed by the EGI Accounting Portal and used for evaluation of usage of the WLCG CPU resources. Their definitions are summarized in the table below. The definition is given for wall clock, but they are similar for CPU metrics apart of the definition of the raw CPU (is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system)

Metric name Metric definition Comment Metric type Accounted by LHC VO accounting system Accounted by APEL
Raw wall clock time multiplied by number of cores (End_time_stamp_of_the_job - Start_time_stamp_of_the_job) * Ncores Basic measurement reported by all LHC jobs to the experiment specific systems, and accounted by them. However, we can not assume that currently wallclock exposed by the accounting portal always represents the raw wall clock. Often walltime is scaled by the batch system. See next definition time YES Not always , depends on batch system implementation and site cluster configuration
Scaled wall clock multiplied by number of cores Raw wallclock scaled by a batch system and multiplied by number of cores Scaling allows jobs to run smoothly in an heterogeneous resource because the batch system automatically extends the cpu time and walltime allowed on slower nodes proportionally to their power. The way usually it is done is not with "random" numbers but taking a reference nodes and calculating the scaling factors using the the ratio between respective HS06 values (though the number is NOT the HS06 value). Both Cputime and Walltime are scaled because otherwise the efficincy would be off by a scaling factor. time NOT YES. If raw wall clock is not scaled by the batch system, raw wallclock is the same as scaled wall clock
HS06 normalized wall clock multiplied by number of cores Raw wall clock time normalized by benchmarked HEPSPEC06 strength of a given CPU resource multiplied by number of cores We would need to understand how transformation from time to work is performed at all levels work YES but not accurately YES, but need to understand how accurately it is done

For more details about scaling see this useful doc

Action plan

  • Compare raw wall clock in the experiment specific systems with raw/scaled wall clock in the accounting portal. Understand discrepancies. Make first conclusions about data correctness
  • Compare HS06 normalized wall clock and CPU. Investigate discrepancies
  • Enable crosschecks in an automatic mode, recording results in the WLCG SSB instance
  • Investigate where/how transformation from time metrics to work metrics happens. Document everything. Make a conclusion whether the procedure is correct. Hopefully derive some recommendations regarding normalization

Comparison of raw wall clock in the experiment specific systems with raw/scaled wall clock in the accounting portal.

Work already started.

We started with ATLAS and CMS since ATLAS and CMS Job Dashboards expose APIs which allow to easily extract necessary data. For ALICE and LHCb the situation is a bit more complicated, since there is no APIs in place. However, Andrew promised to provide required data for a first half of this year. In case of ALICE data can be retrieved with 'cut and pace' using ALICE monitoring portal. This is not ideal and can not be used in the automatic procedure, but would allow us to do some initial comparison.

ATLAS comparison

  • See excel file with the results of ATLAS T1 wall clock and HS06 comparison January-April 2016.
  • See excel file with the results of ATLAS T2 wall clock and HS06 comparison January-April 2016.

Big discrepancies were investigated by ATLAS people, there is a nice summary from Alessandra :


The sites investigated were DESY-* and RAL.

DESY-ZN/DESY-HH

This is because the Tier2 view from the EGI accounting portal view used 1 contains Federations. The federations are imported from REBUS reading a standard CVS file 2. In most countries the Federations are purely regional so each site belongs to one and only one federation. In Germany case instead the federations are mapped to the experiments with the aim of better expose the resources for each experiment I assume. However this means that some sites appear in more than one federation. The accounting portal doesn't treat this correctly because it doesn't map the accounting data to each experiment federation but to the site. So all the the experiments data appear in each federation bringing to double counting. The EMI3 view 1a doesn't have this problem because it doesn't expose federations.

While in the short term one can use the EMI3 view or the Country view in the new accounting portal 3, this problem of the German federations incorect mapping needs to be solved because the WLCG view in the new portal has the same topology and therefore the same problem.

RAL

RAL case instead is due to the walltime scaling in the batch system. This practice derided yesterday by someone is instead a healthy one because it allows jobs to run smoothly in an heterogeneous because the batch system automatically extend the cpu time and walltime allowed on slower nodes proportionally to their power. The way usually it is done is not with "random" numbers but taking a reference nodes and calculating the scaling factors using the the ratio between respective HS06 values (though the number is NOT the HS06 value). Both Cputime and Walltime are scaled because otherwise the efficincy would be off by a scaling factor. The batch system ogs the scaled *time but doesn't log the scaling factor and in any case it is not sent with the accounting records to APEL. Some sites may publish this in the BDII in the form of a reference kSI2k value in Glue1 and explicitely as a ratio in Glue2. You can find a brief explanation on how it is done here , explanation is valid for other batch systems too.

Why not all sites do it or have different discrepancies in the walltime reported and the scaled walltime

The practice is for heterogenous cluster the larger the variance of nodes power the more off will be the walltime compare to the clock on the wall.

Some sites don't have such large variance and some sites group similar nodes behind different batch system so they can always use scale factor = 1. Some sites may not do it and the jobs may fail on the slower worker nodes. Other sites in UK do this, but they don't have the variety of WNs the T1 has.

Why it hasn't been a problem so far?

Because as RAL number show when the scaled walltime is then multiplied by the machine power to produce the work value (to follow Jeff's suggestion at the GDB). everything gets balanced. So RAL numbers for the work (aka HS06 normalized) for example are the following (EGI are from Julia spreadsheet, atlas are from the atlas dashboard)

SITE Jan 2016 Feb 2016 Mar 2016 Apr 2016 Total
EGI 70,864,533 58,923,514 45,892,671 44,749,843 220,430,560
ATLAS 64,951,944 56,268,120 44,372,160 43,489,440 209,081,664
percentage 109% 105% 103% 103% 105%

Next steps for ATLAS comparison

  • John is going to other cases and looking in the accounting DB find out whether other discrepancies can be explained by scaling of the raw wallclock by the batch system
  • Olga Kodolova enables publishing of this comparison in the WLCG SSB
  • Repeat the same exercise now for the work metrics

Overall ATLAS agreement is pretty good. We see 5-10% difference between the EGI accounting portal and ATLAS Dashboard

ALICE comparison

  • See excel file with the results of ALICE wall clock and HS06 comparison January-April 2016.

Overall ALICE agreement is very good , level of 4% difference.

CMS comparison

  • See excel file with the results of CMS T1 wall clock and HS06 comparison January-April 2016.
  • See excel file with the results of CMS T2 wall clock and HS06 comparison January-April 2016.

CMS shows higher discrepancy (~30%) compared to ATLAS and ALICE. According to the CMS experts, the main reason is the fact that CMS is running single core payloads with multicore pilots. And the situation should improve as soon as CMS runs most of payloads as multicore. More details are provided in this talk from Antonio. Another reason was a bug in Dashboard where the multicore payloads (currently 4% of all CMS payloads) were not properly accounted by Dsahboard (not multiplied by number of cores). THis problem was fixed and re-calculation of the CMS comparison has been performed. However Dashboard data for June , July and partially in August is not completely reliable, that is why these months look red in the SSB accounting validation view for CMS.

LHCb comparison

Many thanks to Concezio Bozzy who performed comparison of the LHCb data for January-July 2016. Overall, agreement is pretty good. However in terms of work Dirac shows slightly higher numbers compared to EGI portal. This could be explained by different time-work conversion applied by Dirac and by the EGI portal. In terms of raw wallclock apart of several sights with known problems (Pisa) or those which report scaled time instead of raw wall clock time (RAL), results are almost identical.

  • See graph with the results of LHCb T2 wall clock time comparison for January-July 2016.
  • See graph with the results of LHCb T2 CPU time comparison for January-July 2016.
  • See graph with the results of LHCb T1 CPU time comparison for January-July 2016.
  • See graph with the results of LHCb T1 wall time comparison for January-July 2016.
  • See graph with the results of LHCb T2 CPU work (kHS06) comparison for January-July 2016.
  • See graph with the results of LHCb T1 CPU work (kHS06) comparison for January-July 2016.

Many thanks to Andrew for providing an API to publish Dirac data to SSB. Work on automation of Dirac data validation is ongoing.

Latest excel files with ATLAS and CMS data

See excel files with the results of CMS and ATLAS comparison for January-September 2016.

-- JuliaAndreeva - 2016-06-10

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 20160623_CMSPilots_AccTF_APCY.pdf r1 manage 1048.8 K 2016-06-29 - 17:09 JuliaAndreeva  
PDFpdf PastedGraphic-1.pdf r1 manage 62.8 K 2016-09-06 - 15:17 JuliaAndreeva  
PDFpdf PastedGraphic-2.pdf r1 manage 70.6 K 2016-09-06 - 15:16 JuliaAndreeva  
PDFpdf PastedGraphic-22.pdf r1 manage 62.6 K 2016-09-06 - 15:47 JuliaAndreeva  
PDFpdf PastedGraphic-3.pdf r1 manage 251.8 K 2016-09-06 - 15:16 JuliaAndreeva  
PDFpdf PastedGraphic-32.pdf r1 manage 251.8 K 2016-09-06 - 15:30 JuliaAndreeva  
PDFpdf PastedGraphic-33.pdf r1 manage 214.2 K 2016-09-06 - 15:40 JuliaAndreeva  
PDFpdf PastedGraphic-4.pdf r1 manage 212.4 K 2016-09-06 - 15:17 JuliaAndreeva  
Unknown file formatxlsx T1_Graphs_atlas_general-1.xlsx r1 manage 48.6 K 2016-06-29 - 16:55 JuliaAndreeva  
Unknown file formatxlsx T1_Graphs_cms_general-3.xlsx r1 manage 46.0 K 2016-06-29 - 17:03 JuliaAndreeva  
Unknown file formatxlsx T2_Graphs_alice-3.xlsx r1 manage 98.7 K 2016-06-29 - 16:59 JuliaAndreeva  
Unknown file formatxlsx T2_Graphs_atlas_general-2.xlsx r1 manage 119.5 K 2016-06-29 - 16:49 JuliaAndreeva  
Unknown file formatxlsx T2_Graphs_cms_general-2.xlsx r1 manage 108.9 K 2016-06-29 - 17:06 JuliaAndreeva  
PDFpdf WLCG_Accounting_Data_2016.pdf r1 manage 160.7 K 2016-10-27 - 14:42 JuliaAndreeva  
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2016-10-27 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback