LCG Management Board
Tuesday 20 March 2007 - 16:00 – 17:00 – Phone Meeting
(Version 1 – 23.3.2007)
A.Aimar (notes), I.Bird, Ph.Charpentier, L.Dell’Agnello, D.Duellmann, M.Ernst, J.Gordon, C.Grandi, F.Hernandez, J.Knobloch, E.Laure, M.Lamanna, H.Marten, P.Mato, G.Merino, R.Pordes, L.Robertson (chair), Y.Schutz, J.Shiers, R.Tafirout, J.Templon
Mailing List Archive:
Tuesday 27 March 2007 - 16:00-17:00 – Phone Meeting
1. Minutes and Matters arising (minutes)
1.1 Minutes of Previous Meeting
Minutes of the previous meeting approved.
1.2 Matters Arising
A.Aimar distributed a few document for
information and/or approval by the MB:
Feedback/Approval from the
MB: New QR Reports and New HL Milestones 2007 (document).
- The new QR Report format will be used from the 2007Q1 quarter onwards.
- The new High Level Milestones will be tracked from the 2007Q1 quarter onwards.
Site Reliability Reports for
February 2007. (Reports Received (3PM))
Topics for future MB Meetings
2. Action List Review (list of actions)
Actions that are late are highlighted in RED.
Not done. H.Renshall presented the Experiments Plans at the GDB on the following day.
3. FTS V2 Status (Slides) – G.McCance
G.McCance presented status of the FTS Production service and the development of FTS 2.0.
3.1 FTS Production Status
Currently CERN and all Tier-1 sites are using FTS 1.5 (from gLite release 3.0) and the infrastructure is working well. There are > 6 PB of data that have been exported from the Tier-0 since start of SC4. CERN and T1 sites ~understand the software, most problems have been solved last year and the remaining problems are understood with experiments and there is a plan to address them.
There are still problems with the end-to-end service, but the FTS team is starting to systematically follow-up (SRM) problems detected by FTS.
3.2 FTS Service Split at CERN
Until now the Tier-0 export channels at CERN have been run on the same service as the Tier-2 to CERN channels. This is undesirable as the Tier-0 export must be a ‘stable’ production service, while the Tier-2 import to the CAF is less planned and more likely to have issues requiring interventions and debugging, There are also more sites involved (GPN vs. OPN).
The split was made at the beginning of March:
- “prod-fts” is the main Tier-0 export service
- Tier-1 imports will continue to run on the production service “prod-fts”
- New separate service is there for running the Tier-2 transfers.
have been requested to migrate their Tier-2 transfers to the new service by
3.3 FTS 2.0
The features introduced are based on
feedback from the Mumbai and
The main features are:
- Certificate delegation: this solves a major security issue. No more MyProxy “pass phrase” in the job itself.
- Improved monitoring capabilities: this is critical for the reliability of the ‘overall transfer service’.
- Added Alpha SRM 2.2 support.
- Better database model: improved performance and scalability.
- Better administration tools: make it easier to run the service.
- Added placeholders for the future functionality: to minimise the service impact of future upgrades.
The schedule that will be followed is:
- FTS 2.0 is currently deployed on pilot service@CERN
- In testing since December: running dteam tests to stress-test it
- This is still ‘uncertified’ code
The next step is to:
- Open the pilot installation to experiments to verify full backwards compatibility with experiment code
- The release schedule is subject to successful verification by the experiments.
The deployment at CERN Tier-0 was scheduled for April 2007 but the goal of April 1, will be missed. Meanwhile RAL have offered to run pre-production in parallel to CERN.
The roll-out to Tier-1 sites will be a month after successful check. And it is expected that at least one update will be needed to the April version before it is distributed to the Tier-1 sites.
The Integration with SRM 2.2 test instances is tracked here:
And they are ready to start integration and stress-test for:
- DPM at CERN
- dCache cluster at FNAL
J.Templon asked whether, in order not to overload a site or a MSS system by one VO only, the new FTS will allow limiting the usage of bandwidth by one VO.
G.McCance replied that the new release has the possibility for the administrator to limit the rate of specific VOs.
H.Marten asked whether FTS 2 will be compatible with old the SRM interfaces.
G.McCance replied that FTS 2 will be compatible with both SRM 1.1 and 2.2. And FTS 2 will also be compatible with the existing FTS 1.5 clients installed currently.
4. Experiments Dashboard (Slides) – P.Saiz
P.Saiz presented the work done in order to provide tools for monitoring reliability and performance of the experiments’ jobs on the grid.
The goal of the activity is to develop tools for investigating and solving all the possible Grid errors. The method chosen it to implement and use experiments’ dashboards for monitoring the user jobs.
The tools used are site efficiency tables that give a view of:
- Site performances as seen by selected applications.
- Tools to monitor the sites “day-by-day” and augment the available information for more efficient debugging.
Slide 3 shows the kind of efficiency tables that are generated.
Slide 4 explains how the investigation of errors works on the “job attempts” not on a user job as a whole. If a job is resubmitted each resubmission is studied individually. The retries of a job are hidden to the user, but need to be studied in order to improve the “effective efficiency”.
Slide 5 shows the kind of reports that are generated daily for each site. One can the see each site (slide 6) and the errors occurred on each CE.
In addition there is a view for each single site (slide 7) showing all VOs at the site. There is also a “site of the day” classification here: http://dboard-gr.cern.ch/dashboard/data/summaries.
A site can be selected and one can generate the efficiency of a site for each month (slide 8).
There is also the list of “common errors” for a VO (slide 9). One can select the errors on a site or by a VO. And there is a link to the explanation of the error.
On Slide 10 there is information on additional tools and URLs:
the ‘bad site of the day:
evolution of a given site (ALICE, ATLAS & LHCb):
most common error messages (ALICE, ATLAS & LHCb):
The main activities that remain to do are:
- Deploy Error list and site evolution for CMS
- Support its usage by sites and VOs
M.Lamanna added that is useful to notice that one can even monitor daily the efficiency of the site without waiting the end of the month for the report. Feedback from sites and experiments is welcome in order to know whether they use it regularly to monitor the site reliability and what they would like to see changed.
L.Robertson asked whether the dashboards take into account only the jobs submitted via the RB, or whether they also monitor jobs submitted directly to the Condor G interface.
replied that successive attempts to submit the same job are not recognised as
such if the job is submitted directly since the Job ID changes. Jobs
submitted directly to Condor G by some experiment tools (such as
L.Robertson asked information about the Job Wrapper status.
I. Bird replied that the Job Wrapper has been deployed and collects information in a database. But for now there is no application to extract the information from the database, analyse it and present it. Maybe this should be integrated into the dashboard tool.
5. Discussion on Benchmarking (Slides) – M.Alef
The MB has discussed in previous meetings the validity of the current CPU performance metrics used to assess the capacity delivered to experiments. M.Alef was asked to explain how GridKa measures the CPU performance. H.Meinhard, chairman of the HEPiX working group on benchmarking, was also present.
5.1 The Problem with Using the “spec.org” Benchmarks
The LCG MoU defined that the metric used for CPU requirements is kSPECint2000, which matched fairly closely the relative performance of physics codes on different processors. However, this is no longer the case, especially with the recent multi-core processors. The SPEC group published SPECin2000 values for new processors until February 2007 (see www.spec.org) but the values depend heavily on the environment used for the benchmark. For example, results for the Opteron 246 CPU vary from 1226 to 1438 SPECint2000, according to the OS (Windows, which Linux, etc), 32/64 bit, compiler, optimizing compiler flags, optimized libraries, etc. In addition the published results refer to only 1 copy of the benchmark run per (multi-processor) system. There is no information about how the performance scales with the number of CPUs / cores. Another difference is that the new results published at spec.org have been computed using the latest compiler versions, while in HEP we do not use the latest compiler.
5.2 Approach at GridKa
The approach chosen at GridKa is to run the SPEC benchmark in the current GridKa environment: Scientific Linux, GCC 3.4.x, a pre-defined set of flags, one benchmark run per core in parallel (simulation of batch mode). The “box performance” is then taken as the sum of individual results. Below are some examples of the resulting benchmark comparisons between GridKa and spec.org.
(*) OS, compiler and options used: SL3/SL4 i386, gcc 3.4.x -O3 -funroll-loops -march, 1 run per core in parallel, average spec marks per core.
The assumption is that the increasing gap between the results gathered from spec.org, the GridKA benchmark runs is caused by new compilers + benchmarks improvements + scaling issues to multi-core:
The diagram (from slide 6) shows how the benchmark results are diverging from those of spec.org since 2001, where the blue area shows the evolution according to the GridKA benchmarks, and the pink curve shows the spec.org evolution. GridKA decided to use the spec.org value for 2001 as the base – 25% above the GridKA measured value – and apply the same (25%) scaling to later processors. This is indicated by the yellow area in the diagram.
The result is shown in blue below.
These scaled-up values are used by GridKa for their accounting reports, and are still considerably lower than the spec.org values.
The CPU benchmark version SPEC CPU2000 has been retired in February 2007 and replaced by the CPU2006 suite.
- Re-calculating CPU requirements in the MoU using SPEC CPU2006.
- Provide “cookbook” on how to use the benchmark (e.g. the annex used for procurements at GridKa and CERN).
- Details should come from the HEPiX benchmarking working group.
More information is also available on the HEPiX CPU technology tracking pages: http://hepix.caspur.it/afs/hepix.org/project/ptrack/
L.Robertson asked whether the move to SPEC CPU2006 will be done rapidly.
M.Alef replied that this will be proposed at next HEPiX Benchmarking WG meeting.
The sites present were asked to state how they measured processor performance.
- GridKa: 125% of their measured values
- CERN: The performance used for procurement is the SPEC benchmark run in a specific way, similar to that of GridKA, but CERN uses the gcc flags as chosen by the Architects Forum (reducing the performance by a few percent). For reporting they use these tests with the results scaled up to 130%.
- IN2P3: For accounting purposes they report the values observed on site by using a specific set of programs to estimate the relative performance of new processors. For procurement they install 30% more capacity than what they are supposed to pledge.
FNAL: The manufacturer values did not match with the observed
performance. Using some CMS applications codes they estimate the relative
performance of new processors.
- PIC: No internal benchmarks, they use the values from the vendors.
NIKHEF: The vendors run the benchmarks,
one copy for each processor.
- INFN: Compare the system with some internal benchmark programs. They have not compared the results with the published spec.org performance. For the reporting they use the values from the vendors.
- BNL: Benchmarking intensively using Atlas programs. They scale down 30% or more from the vendors values, and increase the number of machine bought to compensate. The reporting is with the scaled-down effective values.
RAL: No scaling done, they report the vendors’ values.
Therefore they are probably under-delivering.
- NDGF: The site admin is using the vendors’ benchmarks. Accounting is done using estimates because there are several sites included in NDGF and not all is precisely reported.
- ASGC: Use the benchmark values given by the vendors.
- TRIUMF: Used for accounting reporting values estimated from their ATLAS Benchmarks. But after the February MB meeting are now using the vendors’ values. They will publish whichever standard is agreed.
Experiments expressed their opinion:
- CMS: Basing estimates on performance at the time of the TDR (2005). If the new machines diverge from the benchmarks the requirements would need to be recalculated.
- LHCb: The estimates were based on the effective processor performance at the time of the TDR, so are probably 20% too small now. LHCb does a quick benchmarking. The application when it starts evaluates the CPU speed to estimate the duration of a job and they archive it to do bookkeeping of their CPU usage and platforms.
I.Fisk asked whether it would be a better idea to re-introduce something like the old “CERN Benchmarks” made of standard HEP programs.
J.Gordon replied that SPECint seemed correct in 2001 and T.Cass added that the CERN Benchmarks were in FORTRAN and it would imply a major effort to collect and continuously port C++ benchmarks to new compilers.
H.Meinhard said that they want to check the appropriateness of the CPU2006 benchmark. Only if there is a major problem should one envisage HEP-specific benchmarks. Maintaining it and checking its validity over time would be a major effort.
M.Kasemann stressed that it is important that all sites use the same criteria.
J.Gordon added that the amount of equipment is bound by the amount of funds each site has. So this new standard benchmarking would change the values reported, but not what can be actually procured by the sites.
J.Templon added that maybe it is a problem of calibration of the applications to the recent architectures (Woodcrest, etc) and one should also work on improving how the CPU and memory is used by HEP applications or libraries.
L.Robertson added that in some cases also the HEP application is not the same as in 2001, because also the applications have changed and new versions of them are used.
L.Robertson concluded proposing that new machines installed should be benchmarked with a scaling factor for the time being, until the HEPiX working group completes their work.
A proposal should be circulated and discussed at the MB in a couple of weeks.
7. Summary of New Actions
The full Action List, current and past items, will be in this wiki page before next MB meeting.