LCG Management Board

Date/Time:

Tuesday 19 December 2006 at 16:00

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=a063279

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 2.1.2007)

Participants:

A.Aimar (notes), D.Barberis, T.Cass, Ph.Charpentier, B.Gibbard, J.Gordon, M.Lamanna, M.Kasemann, J.Knobloch, H.Marten, G.Merino, R.Pordes, L.Robertson (chair), J.Shiers

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Next Meeting:

Tuesday 9 January 2007 - 16:00-18:00 – Face-to-face Meeting

1.      Minutes and Matters arising (minutes)

 

1.1         Minutes of Previous Meeting

No comments. Minutes approved.

1.2         Documents Distributed at the MB

-          Storage Classes working group mandate (Working Group Mandate) - K.Bos
Feedback from the MB to K.Bos before the next MB meeting.

-          SL3 to SL4 Migration (Email) - T.Cass

 

2.      Action List Review (list of actions)

Actions that are late are highlighted in RED.

 

  • 25 Nov 06 - Sites should send to H.Renshall their procurement plans by end of next week.

Not done. The values from H.Renshall were distributed two weeks ago; therefore the procurement plans from the sites are due.

  • 19 December - J.Shiers and H.Renshall will report on the progress on the definition of targets and milestones for 2007 at the LCG ECM meeting.

On the way. Initial input from experiments received. A first proposal of the 2007 targets should be available early January 2007.

  • 19 Dec 2006 - K.Bos should distribute to the MB list the mandate, and participation, of the Storage Classes working group.

Done Proposal distributed. Feedback from the MB before next MB meeting.

  • 19 Dec 2006 - The proposal to ALICE is to consider, as in the TDR, a value of 10**6 for the ALICE ion runs. L.Betev agreed that ALICE should confirm it within a week.

ALICE not represented to today’s MB meeting.

 

3.      Update on Accounting (Presentation, Document) – J.Gordon

 

J.Gordon presented a summary of the status of the sites accounting activities, including tools and open issues.

 

The Document attached is a written summary of the Presentation.

3.1         Current Reporting

Currently the reporting is done manually. At end 2006 all Tier-1 sites could publish their data for grid-submitted work using automatic accounting (because different accounting systems are deployed at most sites, from NDGF there is no news about it).

The current site accounting reports include: “per VO and per site: normalised CPU time, wall clock time (for both grid and non-grid job submissions), disk allocated and used, and tape used”.

3.2         APEL Portal

The APEL Accounting Portal at RAL has been storing CPU accounting results for WLCG for more than a year.

 

CESGA has taken over development of the various reports so the EGEE support is now definitive. CESGA also monitors which EGEE sites are publishing and raise trouble tickets in GGUS when a site fails to publish for 30 days. 

 

Manual checking of published results has revealed some gaps in the data collected; therefore for now the data needs to be compared with the manual values. Automatic SAM tests for APEL are under development to compare the results stored locally in the RGMA MON box with the central data.

 

Slide 6 (“Accounting Portal”) shows a chart of how a site publishes accounting data over time. The second graph shows the CPU SpecInt published as function of time. In the example it seems that there was no data from CERN for a couple of months (May-June) but this graph needs to be investigated.

3.3         APEL Sensors

APEL2 was released in production in gLite 3.0.2 Update 10. gLite 3.0.2u10 also contains patches to the gLite CE to correct errors in the Blah accounting log. The APEL sensors now work correctly with the gLite CE.

 

The main new features are:

-          More reliable publisher which can handle TCP connection timeouts with the archiver (was an old bug).

-          Encryption of UserDN using a 1024-bit RSA key. APEL is now ready for user-level accounting.

-          Support for the  Blah accounting file on the gLiteCE

 

Not all sites report CPU accounting via the APEL sensors but use other systems (DGAS, GRATIA, etc). Some interrogate their own site accounting databases and publish directly using R-GMA. A tutorial on how to do so is available at 

http://goc.grid-support.ac.uk/gridsite/accounting/faq.html

3.4         Other Sensors

DGAS

INFN uses DGAS to collect accounting information and stores it in its own repositories (HLR) for each site. A new development DGAS2APEL, takes information from the site HLR and publishes it via R-GMA into the APEL repository. This is deployed in production at 3 INFN sites and usage records are being successfully transferred to the central APEL  repository

 

R.Pordes added that also OSG, for the Tier-2 sites, will not use APEL sensors but GRATIA and publish the data directly into the APEL repository.

3.5         User Level Accounting

Now APEL2 encrypts the user DN in the Usage Record. When a site switches on “external user publishing” the encrypted DN is sent to the central repository where it is decrypted to allow aggregation, and then re-encrypted.

 

No user DN information will be made available until the relevant policy documents are in place, approved, and signed by the relevant individuals.

 

A prototype portal has been developed to show information to the roles identified (see GDB talks from October and December). See also slides 10-14.

3.6         Storage Accounting

GridPP in the UK has developed storage accounting using values published in GLUE and harvesting them from the BDII. The results are published and summarised in the same way as cpu and some example visualisations (by CESGA) are shown, using data from GridPP sites. A roadmap exists for further development of the portal: http://goc02.grid-support.ac.uk/accountingDisplay/

 

This storage accounting has recently been extended to all EGEE sites. OSG are developing their own solution.

 

One of the difficulties can be to obtain detailed data from small sites that share their SEs among several VOs.

 

Visualisation of Storage Used per VO for Disk and Tape: http://goc02.grid-support.ac.uk/accountingDisplay/view.php

-          Select Resources via a Tree

-          Select time interval (last year, last month, last week, last day)

 

Slide 17, for instance, shows:

-          Data for RAL-LCG2, the UK Tier-2 sites.

-          Storage units are 1TB = 10^6 MB

-          Tape Used + Disk Used = Total

3.7         Issues on Accounting

3.7.1          CPU Reporting:

-          Reporting should be extended to Tier-2 sites.
The current web reports organise the sites by country and there is no identification of the Tier-2 sites in the database.
And the “Tier-1 and Tier-2 mapping” is dependent on the VO and is not in the database..

 

Issue to follow:

The accounting reports required for the LCG needs to be discussed and agreed.

 

-          Local versus GRID submission needs to be clarified.
Non-grid work is a significant fraction at some sites, but only the grid submissions are handled by APEL.

 

Issue to follow:

The issues on local vs grid-submitted CPU accounting should be clarified at next MB meeting in January.

 

-          Correctness and completeness of data needs manual checking to verify that there are no bugs in terms of algorithms and “CPU normalization to SpecInt” with the correct parameters for all sites.
A SAM test is being developed to check that a site is publishing accounting.
For a few months we should continue with both manual and automated reporting, in order to compare the results.

-          CPU versus wall clock.

-          How many accounting solutions do we need?
Now APEL and DGAS do different tasks better and there are two accounting systems being developed in parallel. Then the values are all sent to the same APEL repository. This is working for the LCG, and it is more an EGEE issue therefore the topic was not discussed further.

-          Use of VOMS.
Roles and groups are available and should be used by the VOs and sites.

3.7.2          User-Level Accounting:

-          Sites to deploy gLite 3.0.2u10 and start publishing encrypted DNs.
But the user information will not be distributed until the policies are approved.

-          The relevant policies need to be formulated and approved.
The security policies documents will be distributed by D.Kelsey for discussion and approval by the GDB in Jan or Feb 2007.

-          Feedback on the user-level reporting suggested at December GDB.

3.7.3          Storage-Level Accounting:

-          GLUE1.3 introduces new SE reporting concepts.
Are they sufficient for storage accounting?
Can they be implemented across all SEs?

-          Can one ever account “shared space” on SEs correctly?

3.8         Next Steps

-          Introduce T2 reporting using the APEL repository for CPU (now) and for storage (soon)

-          Sites to check the data being published for storage

-          Rollout DGAS2APEL across INFN so that information from Italy is collected centrally.

-          Check results from Storage Accounting, and develop information providers further

-          NDGF should an automated reporting solution.

-          Rollout user level accounting while the policies are approved.

 

Action:

22 Dec 2006 - Distribute a proposal on next steps on accounting, for approval at next MB face-to-face meeting.

 

4.      Update on Targets and Milestones for 2007 (Paper) – J.Shiers

 

J.Shiers presented the targets and milestones for 2007 as discussed with the ECM representatives of the experiments.

 

The Paper attached is in its 0.5 draft version.

The milestones and targets are not by quarter but roughly the each  quarter represent:

-          Q1: Completion of the SC activities

-          Q2: Preparation for the full dressed rehearsals

-          Q3: Execution of the FDRs

-          Q4: Machine run

 

The document was not discussed in detail but is distributed for the MB members to send their comments before next MB Meeting. 

 

But some targets seem “overambitious” and should be discussed further (ex: ALICE heavy ion data rates for 2007, ATLAS milestones on the DB phase 2 sites, and reaching “nominal rates” already in April 2007, etc).

 

J.Shiers noted that from the existing goals the sites should be able to extract their milestones and targets. He showed as example the Appendix of the document, where CMS measures the Service Metrics of each site during their activities.

Most sites were absent therefore the discussion should take place in a future MB Meeting.

 

J.Gordon noted that these metrics are very VO related and the Tier-1 will not know the reasons for the site failures. J.Shiers agreed that for each metric there should be some clear measurement for each Tier-1 site. This also needs to be clarified.

 

M.Kasemann noted that the CMS planning is being re-assessed and will be available only by end of Jan 2007. Therefore he reminded that the CMS milestones could change their timing. For example; MTCC3 will not be for April 2007 but will be the last month before start of data-taking (likely October 2007).

 

L.Robertson noted that, as well as short “difficult-to-repeat” high-rate periods, it is important also to have lower target data rates but for longer periods. In this way the impact at the sites of  overlapping usage by experiments will become apparent. He also added that for each experiment there should be a couple of metrics to measure a site. The sites need to have  simple” site-specific objectives that they have to reach.

 

J.Shiers concluded that the MB meeting of the 9th January seems too early for converging of 2007 targets and milestones, but that MB meeting could be used to evolve the discussion and receive more information, at least for Q1 and Q2 2007.

 

 

5.      Long Term Planning for the CERN Computing Centre - T.Cass, L.Robertson

 

 

Postponed to next MB Meeting.

 

 

6.      AOB

 

 

L.Robertson thanked the MB members for the work and the progress made in 2006 and wished everybody a happy 2007.

 

7.      Summary of New Actions 

 

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.

 

Action:

22 Dec 2006 - Distribute a proposal on next steps on accounting, for approval at next MB face-to-face meeting.

 

Issue to follow:

The accounting reports required for the LCG needs to be discussed and agreed.

 

Issue to follow:

The issues on local vs grid-submitted CPU accounting should be clarified at next MB meeting in January.