LCG Management Board
Tuesday 5 June 2007 16:00-18:00 - F2F Meeting at CERN
(Version 1 9.6.2007)
A.Aimar (notes), D.Barberis, I.Bird, N.Brook, F.Carminati, L.Dell’Agnello, F.Donno, M.Ernst, X.Espinal, I.Fisk, S.Foffano, D.Foster, J.Gordon, C.Grandi, F.Hernandez, J.Knobloch, M.Lamanna, E.Laure, H.Marten, P.Mato, H.Meinhard, P.McBride, B.Panzer, R.Pordes, L.Robertson (chair), J.Shiers, O.Smirnova, R.Tafirout, J.Templon
Mailing List Archive:
Tuesday 12 June 2007 16:00-17:00 - Phone Meeting
1. Minutes and Matters arising (Minutes)
1.1 Minutes of Previous Meeting
Minutes of the previous MB meeting distributed only on Tuesday.
Note: No comments received during the week.
1.2 Site Reliability Data - May 2007 (Reliability Data)
During May 2007 the Reliability Reports were collected at the Operations meeting. Not all sites have reported on their down times in their weekly reports. A.Aimar will distribute the reports to the MB members so that they can complete them and mail it back within a week. The MB representative should insist with its Operation representative to better fill the weekly reports
12 Jun 2007 - MB Members complete Site Reliability Reports for May 2007 and send them back to A.Aimar.
1.3 Job Reliability Data (Slides)
procedure to generate the reports is in place. Every month the job reliability reports show the
efficiency as seen by the users, without taking into account the job
attempts. The Job Reliability Reports are always available on the LCG
Planning (http://cern.ch/lcg/planning) page. ATLAS jobs, those
2. Action List Review (List of actions)
Actions that are late are highlighted in RED.
Done later in this meeting.
Not done. Will be done in the next couple of weeks.
Postponed to the 31 July 2007.
3. Benchmarking Conclusions (Slides) - M.Alef
M.Alef presented the follow-up to the presentation given in March 2007 and proposed the next steps and the conclusions on how to benchmark CPU for the LCG procurement activities in 2007 (waiting for the conclusions of the HEPiX working group on benchmarking).
Summary of the 20 March presentation:
At present, GridKA runs SPEC CPU2000 in the real environment used by the VOs at FZK:
- Run on SL (3 or 4)
- gcc 3.4.x -O3 -funroll-loops -march
- One copy of the benchmark per core, running simultaneously
- Using a scaling factor: 1.25
The CPU benchmark version SPEC CPU2000 has been retired in February 2007 and replaced by the CPU2006 suite. In the longer-term the HEPiX benchmarking working group should investigate if the SPEC CPU2006 suite is appropriate for measuring relative performance of HEP codes on different processors and provide a ”cookbook“ of how to do this.
The short-term proposal is contained in the note distributed to the MB "Proposal for an interim CPU capacity metric for WLCG" (Link, CERN NICE login required) by L.Robertson, H.Meinhard and M.Alef. This uses the methodology of GridKA (i.e. run one copy of the SPEC CPU2000 code per core), uses the compiler flags as specified by the Architects Forum (-O2 -pthread -fPIC), with a scaling factor of 1.5 (reflecting the lower level of compiler optimization compared with the current GridKA benchmark test).
A HOWTO section with benchmark script and configuration files is available on the HEPiX pages: see http://hepix.caspur.it/processors/ .
Some questions followed the proposal via email.
Q: Is the SPECint_rate metric more appropriate than SPECint2000?
A: For SPEC CINT2000 it is not suitable, because the numbers differ by a factor of around 80 with multi-core hardware compared with the results obtained by running one copy of CINT2000 per core. With SPEC CINT2006 the difference is only about 1% when using multi-core machines.
Q: Power consumption is a more significant consideration when comparing processors than CPU performance.
A: This is an important purchasing consideration and talks and discussions took place at the last HEPiX conference. Some sites have defined penalty points in the tenders for hardware requiring higher power consumption.
Q: "The compiler flags used will be specified from time to time by the Architects Forum ...". Is it not a problem if the compiler flags are continually changing?
A: The sites and the AF should always discuss updates of compilers and be in contact. Using recent compilers allows a better usage of the hardware provided by the sites: Repeating the comment by J.Templon (at MB March 20): "The VOs should move to recent compilers in order to better use the hardware that the sites provide...”
- Moving from gcc 3.4.x to 4.x.x: + 5...10%
- Moving from gcc to Intel or PGI compilers: + 10...20%
- Moving from dynamic to static linking: + 7% (Intel, x86_64), +30% (Opteron, i386)
However to be useful the benchmark must be run with the parameters used by the experiments.
L.Robertson noted that the discussion is taking place because a realistic normalised way to compare acquisitions at the different sites is needed, in the period until HEPiX will have defined which standards to use in the longer-term.
J.Templon proposed to use directly the SPECint 2006 if the difference is only 1%. H.Meinhard replied that 1% refers to the difference between the SpecInt2006_rate on dual core and quad-core hardware compared with running one copy of CINT2006 per core . 1% is not the difference between SPECint2006 and the relative performance of HEP applications on different processors. The difference with HEP applications and how they scale with multi-core still need to be evaluated.
H.Meinhard said that the HEPiX benchmarking group will not manage to provide a report before the end of 2007, or later.
L.Dell’Agnello reported that INFN is running tests comparing SPECInt2006 with the performance of LHC experiments’ software on several platforms (Intel, AMD single and multi-core machines). The work will continue during the month of June. If the results will be meaningful they will be presented to the HEPiX working group and to the MB
D.Barberis noted that the LHC Experiments have the applications ready; therefore, if the systems are made available, they can easily benchmark them and find the scaling factors of their applications on multi-core hardware.
F.Carminati agreed that ALICE also can run their benchmarks if they are given access to the platforms.
A discussion followed
about the purpose of the “HEP scaling factor”. The experiment requirements
were estimated in 2005 at the time of writing the TDRs using the processors
available at that time and converted to SPECint2000s using the numbers published by the manufacturers at that time. When the benchmark was run on the
same processors using the Karlsruhe process with the gcc compiler and HEP
optimisation levels the results gave a lower result which had to be increased
by 50% to equal the manufacturer’s number. The manufacturers’ measures have
continued to increase relative to those measured by the Karlsruhe process.
Using the manufacturers’ measures to assess capacity will therefore provides
less real capacity than is expected by the experiments. Using the
10-July 2007 - INFN will send a summary of their findings about HEP applications benchmarking to the MB
The MB agreed that, until the HEPiX benchmarks are defined, the proposal distributed by L.Robertson is accepted.
4. Accounting Grid and Non-Grid Submitted Jobs (Slides) - J.Gordon
When the MB agreed to move to APEL for grid accounting the MB had asked for a proposal on how to account for non-grid usage too. J.Gordon presented his proposal.
APEL sensors identify grid usage by matching entries in the gatekeeper log, messages, and batch system logs. Only grid usage is reported to the central repository.
Systems which instead use their local accounting either:
- Report all work
- Or can choose whether to report Grid or all.
- But cannot report separately gird and non-grid usage.
When APEL selects grid jobs it could also log the total CPU etc for jobs which do not match any DN, making the assumption that this is non-grid work
The proposal is that non-grid usage would be reported only “per Unix group, per site, per month” and stored in a parallel table, not in the GOC database tables. The APEL Portal would then show similar views to the grid use and/or compare with Grid use.
This will be without the VO identification because the mapping between local site Unix groups and VOs is not available to APEL. .
Are sites happy to publish the non-grid use externally from their site? Or is this an invasion of privacy?
It is not obvious how to restrict publishing to LHC use only.
It will take some effort to extend APEL but not an excessive amount if the GOC database is not modified:
- Current priorities are UserDN, FQAN, GOCDB3 migration
- Could be completed this year
Is all this worth it? Or will non-grid use die away?
C.Grandi clarified that now DGAS is working in INFN and is using the DGAS to APEL application to publish the information in APEL. J.Gordon replied that for now it seems that it does not fully work. C.Grandi added that DGAS can associate local user accounts to VOs if that is useful
J.Gordon added that storing local information in APEL would require an important change in the job record in the GOC database. The sites currently only submit a monthly total and the number of jobs, not each individual job. The proposal is to use a parallel separated tree in order not to change the GOC tables.
F.Hernandez stated that currently CC-IN2P3 is sending to the APEL repository the accounting information for both grid and non-grid jobs. However, the accounting web portal doesn't allow you to distinguish between those 2 usages of the site because in the APEL database schema there is no field for storing the type of job. As CC-IN2P3 has its own batch system it is not using APEL directly. The BQS accounting database is queried to build the job records to be sent to the APEL repository (through RGMA) in the appropriate schema. Records for non-grid jobs sent to the APEL repository contain the VO name but not the DN of the submitter because it is an attribute relevant for grid jobs only.
I.Bird and T.Cass proposed instead to modify the job record by adding a field to record how a job was submitted and with the site providing the VO corresponds to the local Unix group.
D.Barberis added that this feature is needed because the experiments want to know the amount of work that is grid and non-grid at each site, ans also to identify the specific user at the site, if needed.
T.Cass added that CERN is currently not publishing 75% of the usage which is non-grid at CERN
I.Bird noted that it is essential for EGEE to be able to distinguish between grid and non-grid usage.
The MB would like this issue investigated further. It is necessary to distinguish between grid and non-grid and non-grid work should be reported with as much information as possible (including VO and the user identity) as needed by the experiments.
The meeting had to end at this point, and the decision should be made at the next MB.
5. SRM 2.2 Update (Slides) - F.Donno
F.Donno provided the monthly update on the SRM 2.2 implementations.
5.1 General Status
The general SRM 2.2 status is:
- At the moment under S2 tests 18 endpoints with the latest release of the SRM v2.2 implementation for 5 flavours: CASTOR, dCache, DPM, StoRM, and BeStMan.
- They are both “development” endpoints (where main development takes place) and deployment- only endpoints.
- The deployment endpoints allowed discovering several configuration issues.
- These are mostly test instances.
Slides 3 to 6 show the current status if the tests.
Most issues are solved; also the dCache configuration issues mentioned in slide 6 are being solved.
5.2 Specific Implementations
Note: Due to lack of time, F.Donno did not present them in detail the following slides. Here is what is in the slides (as it is).
- Static space reservation manually created. Need to reserve dteam special space tokens for SAM tests.
- Pins are not honored. VOs need to negotiate with the sites the size of the disk buffers and tuning of the garbage collector.
- Tape0Disk1 storage class implemented at the moment by switching off the garbage collector. However, CASTOR cannot handle gracefully the situation where the disk fills up. Requests are kept in queue waiting for the space to be freed up by the garbage collector. Disk1 storage issues will be probably addressed by the end of this year.
- Very slow PutDone. This has stopped us from performing real stress tests on the CASTOR instance. It can crash the server or make it unresponsive. A fix for this problem is available in latest releases and will be deployed in production in the coming months.
- No quota provided for the moment. Since the space is statically partitioned between VOs, it is the partition that guarantees that a user does not exceed its allocated space
- No support for VOMS groups/roles ACLs at the moment.
- Many use cases fixed. Still some minor issues.
- To create a file in SRM you need to execute srmPrepareToPut, transfer, and srmPutDone. The SRM protocol establishes that the existence of the SURL starts when the srmPrepareToPut operation is completed. For dCache the SURL exists only when file transfer is initiated. This has consequences on the clients that have to deal with this situation. Two concurrent srmPrepareToPut succeed, but the associated transfers are then synchronized, one of them failing. This avoids the need to explicitly abort Put requests for transfers that have not been initiated.
- dCache does not clean up after failed operations. It is up to the client to issue an explicit srmAbort [File] Request and remove the correspondent SURLs before retrying. Other clients do not behave the same (see later). However, this behaviour is compliant with the SRM protocol specification.
- Different interpretation of space reservation in case of Tape1Disk0 storage class: the space dynamically reserved by the user is decreased by the size of the files migrated to tape. Users have to re-issue a space reservation request if more space is needed. Other implementations use the reserved space as a window for streaming files to tape. An upcoming dCache update will support this as well.
- Soft quota provided for the moment. Transfers are controlled so that the user cannot write new files in a full space. However transfers initiated when space was available are not aborted if space is insufficient (if gridftp is used even these transfers are aborted if the space allocation is exceeded). No quota for space reservation: users can reserve as much space as they want if the space is available.
- No support for VOMS groups/roles ACLs at the moment. Foreseen by the end of 2007.
- SURL created at the completion of srmPrepareToPut operation. However, two concurrent srmPrepareToPut with the overwrite flag enabled succeed. ATTENTION: The implementation does not flag the user that another concurrent srmPrepareToPut is being executed. A second srmPrepareToPut with overwrite flag set automatically aborts the first one. This has been done to avoid the need to explicitly abort Put requests and to remove the corresponding SURL for transfers that have not been initiated.
- DPM v1.6.5 does provide garbage collector for expired space. However, this release is not yet in certification. Sites installing DPM v1.6.4 should be careful.
- Soft quota provided for the moment. Transfers are controlled so that the user cannot write new files in a full space. However transfers initiated when space was available are not aborted if space becomes insufficient. No quota for space reservation: users can reserve as much space as they want if the space is available. This will be provided as soon as accounting will be implemented in DPM.
- Space tokens are strictly connected to paths specific per VO. The SRM protocol establishes that only the space token determines the storage class and there is no connection between space tokens and the SRM namespace (for some implementation the path can determine the tape set where the files must reside). In StoRM, the path does determine the storage class at the moment.
- Space reservation is guaranteed. For other implementation space reservation is on best effort: a user can request a given amount of space. However, if another user on the same system exceeds its reserved space, it could happen that the first user cannot benefit of the space reserved. StoRM instead pre-allocates the blocks at the file system level, therefore the space reservation is guaranteed.
- Quota available through the underlying file system: there is no direct support for quota in StoRM. If the file system supports it then sys admins can decide to enable it. However, for GPFS, some time ago it was observed that enabling quota slows down the performance.
- Simple security mechanism: at the moment it is possible for a member of LHCb to write in the ATLAS space.
- PERMANENT files cannot live in volatile space. The SRM protocol specifies that permanent files created in volatile space should continue to live in a default space when the volatile space expires (as per decisions taken during the WLCG workshop in January and various discussions on e-mail).
- Quota available through the underlying file system: there is no direct support for quota in BeStMan. If the file system supports it then sys-admins can decide to enable it.
- No support for ACLs based on VOMS groups and roles.
The issues currently being discussed in the GSSD group are Quota and VOMS ACLs.
A summary will be presented at the GDB. The implementation by DPM (already available) will be compares with the plans of the other implementations and with the requirements of the Experiments.
The next steps are:
- ATLAS performing FTS 2.0 transfers from CASTOR Production SRM v1 to various SRM v2 endpoints (dCache, DPM, StoRM) with REPLICA-ONLINE. FZK will probably make available CUSTODIAL-NEARLINE.
- LHCb interested in tests with StoRM (EIS).
- Few minor bugs found in FTS 2.0 for SRM v2 will be fixed.
The deployment endpoints allowed the testing team to find configuration issues (BNL, IN2P3, FZK, NDGF, and DESY not yet ready).
The experiments are reluctant to use the PPS but ATLAS and LHCb accepted to use a mixed system in order to try the SRM 2.2 installations.
Collision of symbols when using both libshift.so and libdpm.so from ROOT and GFAL
A proposal is under study to
solve this problem in short-medium term.
Tests will be executed and a
report made available (check “Transparent data access”)
F.Carminati noted that the issues are due to the conflict between DPM and CASTOR.
F.Donno replied that this issue will not allow an application to use both DPM and CASTOR. But the issue she was referring to is the conflict between ROOT and GFAL due to the linking of libshift in ROOT that is not the same version libshift as the one used in GFAL. Somehow they should use the same version.
H.Marten stated his worry about the fact that:
- the tests on SRM are still going on and
- the SRM 2.2 is not deployable yet even if
- the Experiments FDRs are scheduled for July.
What is the schedule for installing SRM 2.2 in Production? What the sites that are not installing in PPS will do?
L.Robertson replied that for the moment the sites should continue running SRM 1.1 in Production until SRM 2.2 is certified and tested by the experiments. And also the Experiments should warn about changes in their schedules.
And if the Experiments do not test it CASTOR SRM 2.2 will go in productions at end of June 2007 in order to tune it during the summer.
1.1 Presentations Moved to the GDB on the Following Day.
- SAM Status Update (Slides) - P.Nyczyk
- CASTOR Status Update (Slides) B.Panzer
2. Summary of New Actions
The full Action List, current and past items, will be in this wiki page before next MB meeting.