LCG Management Board

Date/Time:

Tuesday 6 November 2007 16:00-18:00 – F2F Meeting

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=22185

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

 

(Version 1 - 13.11.2007)

Participants:

A.Aimar (notes), D.Barberis, O.Barring, I.Bird, Ph.Charpentier, L.Dell’Agnello, T.Doyle, M.Ernst, S.Foffano, J.Gordon, C.Grandi, A.Heiss, F.Hernandez, J.Knobloch, M.Lamanna, E.Laure, S.Lin, U.Marconi, G.Merino, B.Panzer, D.Petravick, , L.Robertson (chair), Y.Schutz, J.Shiers, O.Smirnova, R.Tafirout

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive:

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting:

Tuesday 20 November 2007 16:00-17:00 – Phone Meeting

1.    Minutes and Matters arising (Minutes)

 

1.1      Minutes of Previous Meeting

The minutes of the previous MB meeting were approved.

1.2      Update on Site Names (Sites Names)

All names are now agreed and will progressively be used in all reports and tables.

 

The final names have been agreed.

 

WLCG Tier-0 and Tier-1 Site Names
12.11.2007

GOCDB Id

Site Name
(in alphabetical order)

TRIUMF-LCG2

CA-TRIUMF

CERN-PROD

CERN *

FZK-LCG2

DE-KIT

pic

ES-PIC

IN2P3-CC

FR-CCIN2P3

INFN-T1

IT-INFN-CNAF

NDGF-T1

NDGF *

SARA-MATRIX

NL-T1

Taiwan-LCG2

TW-ASGC

RAL-LCG2

UK-T1-RAL

USCMS-FNAL-WC1

US-FNAL-CMS

BNL-LCG2

US-T1-BNL

(*) No ISO Country Code for CERN and NDGF.

 

1.3      Sites Reliability and Job Efficiency Reports for October 2007 (SR and JE Tables)

A.Aimar will write to the sites to ask for the completion of the sites reliability reports that are incomplete after the Operations weekly reports.

 

Update 13.11.2007:  the JE Tables for ATLAS Ganga seems incorrect.

 

2.    Action List Review (List of actions)

Actions that are late are highlighted in RED.

·         D.Barberis agreed to clarify with the Reviewers the kind of presentations and demos that they are expecting from the Experiments at the Comprehensive Review.

Done two week ago.

 

  • 6 November 2007 - J.Gordon and A.Aimar will send a proposal to the MB for an agreement on the acceptance of pilot jobs and glexec.

Done, by L.Robertson

 

  • 21 October 2007 - Sites should send to H.Renshall their resources acquisition plans for CPU, disks and tapes until April 2008

Not done.
The only Sites that have sent their acquisition plans are: TW-ASGC, US-T1-BNL, DE-KIT and FR-CCIN2P3. The others should send them to H.Renshall.

 

1.    SRM 2.2 Weekly Update - J.Shiers

 

 

SRM 2.2 Deployment

The process of upgrading sites to SRM v2.2 in production has started:

-       The first site - NDGF - is now running SRM v2.2 in production.

-       FZK is in the processing of upgrading.

-       CASTOR2 SRM 2.2 services are being setup this week

 

LHCb SRM Testing

LHCb testing was presented at today's GSSD.

 

The need for experiment preparation for SRM v2.2 - in SRM v2.2 mode - well prior to CCRC'08 was emphasized at today's GSSD.
This will be followed in CCRC'08 phone and F2F meetings.

 

CCRC Phases

It is expected that CCRC and SRM usage will go in two phases:

-       First focusing on the Tier0 + Tier1 sites

-       And then involving also the Tier2 sites

 

L.Robertson asked whether the testing by the Experiments is considered sufficient by all parties and deployment can proceed as planned.

A.Heiss replied that the installation is proceeding well in FZK and D.Barberis added that for ATLAS the tests seem sufficient for proceeding with the installations at the sites.

 

2.    Update on CCRC-08 Planning (CCRC'08 Meetings, Slides) - J.Shiers

                                                                 

 

Changes - J.Shiers commented the changes compared to the previous weekly summaries. The proposals are:

-       Use January (pre-)GDB to review metric, tools to drive tests and monitoring tools

-       Use the  March GDB to analysis CCRC phase 1

-       Launch the May challenge at the WLCG workshop (April 21-25, 2008)

-       Schedule a mini-workshop after the challenge to summarize and extract lessons learned

-       Document performance and lessons learned within 4 weeks.

 

News - The Tier-2 coordinators have been nominated and are now added to the CCRC mailing list.

Excellent Tier0-Tier1 transfers from both ATLAS&CMS

 

F2F CCRC Meeting - The sites have expressed the need to better clarify what the experiments expect from each site explicitly and in some more detail. There are quite a lot of details that need to be worked through this year (in time for February challenge).The same will be true for the May challenge.

For the December GDB it is not clear that a ½ day F2F will be enough.

 

SRM 2.2 - SRM v2.2 was explicitly listed by 3 experiments (#1) as a pre-requisite for CCRC’08. Implicitly by ALICE – required for Tier0-Tier1 FTS

The CCRC Workshop should be in June (12-13) just following the GDB.

 

Documentation - Is important to document the lessons learnt because are a lasting value.
One possibility is via a paper, e.g. to Computing Physics Communications or IEEE Transactions on Nuclear Science.

 

3.    LHCC Comprehensive Review Agenda (Agenda) - L.Robertson

 

L.Robertson showed the Agenda of the LHCC Comprehensive Review.

 

The only change is the move of the “Asia-Pacific Tier-2s (30')” by Glenn Moloney (Univ. of Melbourne) to the morning of the Second Day and “Management, Planning and Communication” at 17:00 on the First Day.

 

D.Barberis reported that the agreement with the referees is that the Experiments will provide “walk-through” demonstrations of their applications and not complete demonstrations really running. For instance ATLAS will provide an end-to-end demonstration, from data selection in the catalog to submitting jobs, retrieving the output and finally producing plots. This will be explained showing screen shots and explanations of what needs to be done to execute the applications. Also the other Experiments will prepare something similar.

 

L.Robertson will send a reminder to all speakers of the LHCC Comprehensive Review.

 

4.    ATLAS Quarterly Report and Plans (Slides) - D.Barberis

D.Barberis presented the quarterly report on progress and plans of the ATLAS Experiment.

Please refer to the Slides for more details.

4.1      Recent news from ATLAS

Slides 3-5 show the status of the construction and the schedule that shows that from end of April the pit should be closed.

4.2      M4 Cosmic Run

ATLAS had a cosmic run this summer - August 23 – September 3, called M4 - with the first large scale export of data to the Tier-1 sites.

M4 had these characteristics:

-       Using 4 SFOs with a data: rate < ~250 MB/s

-       Data written into Castor 2 (~40 TB)

-       Full Tier-0 operation

-       RAW data subscribed to Tier-1 tape

-       ESD data subscribed to Tier-1 disk

-       ESD data subscribed from Tier-1s to Tier-2s

-       Analyse M4 data at Tier-2s

 

By the end of the M4 challenge all the goals above were reached (see slides 7-8).

4.3      Data and Streaming Decisions

Slide 9: There is no ‘obvious’ right way to stream therefore flexibility is vital. The overlaps, of events in more than one stream, vary with luminosity.

 

All Streaming (RAW, ESD, AOD) is based on trigger decisions. The baseline is to have ~5 physics streams, plus express stream and calibration streams.

The Physics streams are “inclusive”; i.e. one event may be in >1 streams depending on the triggers (e+γ, μ+Bphys, jets, τ+Et miss ,minbias) there can be overlaps of ~10%. The ESD streams will be the same as the Raw streams, the AOD streams from central production and reprocessing.

 

ATLAS has also been working on the definition of Derived Physics Datasets (DPD) that are used to represent many derivations (skimmed AOD, data collections, augmented AOD, other formats). In each case, aim is to be faster, smaller data and more portable formats.

 

Decisions have been taken (see slide 11 and 12) about Data on Disk at the Tier-2 sites.

There will be ~30 Tier-2 sites of very different size containing some of ESD and RAW data:

-       In 2007: 10% of RAW and 30% of ESD in Tier-2 cloud

-       In 2008:  30% of RAW and 150% of ESD in Tier-2 cloud

-       In 2009 and after: 10% of RAW and 30% of ESD in Tier-2 cloud

-       This will largely be ‘pre-placed’ in early running and consist of recall of small samples through the group production at T1

 

Additional access to ESD and RAW will be in the CAF: about 1/18 RAW and 10% ESD.

 

In total there will be about 10 copies of full AOD on disk at the sites.

 

In order to perform On Demand Analysis ATLAS will need:

-       Restricted Tier 2 sites and CERN Analysis Facility (CAF)

-       Most ATLAS Tier 2 data should be ‘placed’ with a limited lifetime as needed by the users ( a few months)

-       Role and group based quotas are important

-       A study group has been launched to define what a Tier-3 and how end-user analysis will work on the Tier-3 sites.

 

Event Size and Performance (slides 16-17) have been studied and improved between successive releases (Release 12 vs. Release 13):

-       ESD Size: Rel 12 (~1700kB) => Rel 13 (~800kB)

-       AOD Size: Rel 12 (~200kB) => Rel 13 (~250kB)
Truth reduced, added e gamma/muon track + calo cells
Trigger EDM size x0.5 but larger exploratory menu and lower thresholds
Still duplications (muon tracking, jet collections, etc.)

 

Slides 19-20 show the status of the Database Replications.

Leveraging on the work of the 3D Project Infrastructure the ATLAS Conditions DB replication is now in production with data to all ATLAS Tier-1 sites.

 

In addition ATLAS users now start using the TAGs Database:

-       Support direct navigation to events (RAW, ESD, AOD)

-       A selection of e.g. 5% of events via TAG query is really x20 faster than reading all events and rejecting 95% of them.

 

Slides 21-23 show the tests and the current usages of the TAGs Database.

4.4      Distributed Activities

Distributed Simulation Production continues all the time on the 3 Grids (EGEE, OSG and NorduGrid) and reached 1M events/day recently

The rate is limited by the needs and by the availability of data storage more than by resources. Currently ~50% is done at Tier-2 sites, ~30% at Tier-1 sites and 6% at the Tier-0 (slide 25).

Validation of simulation and reconstruction with release 13 is in progress.

While large-scale reconstruction will start soon for the detector paper and the FDR

 

For the Distributed Analysis GANGA simplifies running of ATLAS (and LHCb) applications on a variety of Grid and non-Grid back-ends (as shown in slide 26). ATLAS end users are learning to use the appropriate tools (such as Ganga) to send jobs to their input data. Rather than copying files to their local computing clusters and running locally.

 

The Export from Tier-0 to Tier-1 sites works. Last data throughput tests (slide 28) showed that all obstacles to data export from CERN have been identified and removed:

-       An export rate of ~1.2 GB/s could be sustained for prolonged periods using an incomplete set of Tier-1s

-       BNL took less than their nominal rate (but we know they can take a lot more)

-       ASGC was not included but will join in next time (November) as problems were since fixed

 

Also the Throughput Tests will continue (a few days/month) until all data paths are shown to perform at nominal rates:

a) Tier-0 → Tier-1s → Tier-2s for real data distribution

b) Tier-2 → Tier-1 → Tier-1s → Tier-2s for simulation production

c) Tier-1 Tier-1 for reprocessing

4.5      Distributed Computing Re-organization

The Distributed Computing structure has been reorganized. Until now we had two separate areas within Software & Computing, covering respectively the development and operation of Grid Tools & Services. This structure turned out to be less than optimal to ensure good communication between developers and operators, and also cross-communication between activity areas.

 

To overcome this situation, we decided to create a “Distributed Computing” project that includes both development and operations activities, within which people can be assigned to tasks in a more flexible way. As in the near future the needs of operations have to set the priorities for everybody:

-       Kors Bos, currently Computing Operations Coordinator, will lead the Distributed Computing Project

-       Jim Shank will be Deputy Distributed Computing PL

-       Massimo Lamanna will be responsible for all development activities

-       Alexei Klimentov will be responsible for all operation activities

 

The first task of these people named above (plus D.Barberis) is to write down, in close consultation with all people currently involved in these activities:

-       A description of scope and organisation of the Distributed Computing project

-       The global system architecture that can be achieved by mid-2008

-       The work plan to get there to that architecture

-       The list of deliverables and milestones, taking external constraints into account (M* runs, SRM2.2 readiness, FDR, CCRC, etc)

-       The manpower needed and available for each task

 

As soon as this is completed (in 2-3 weeks), the new organisation will be effective

4.6      Evolution of Production System

During the ATLAS Computing Operations Meeting in the Software & Computing Week it was discussed and decided that the ATLAS production system will evolve towards having just one way of submitting and running production jobs on the OSG and EGEE Grid resources.

A suite of ATLAS and Middleware tools and services (the new names of Pallas and Palette were proposed) will be selected to make this happen.

Two important choices of input to the baseline system were made already during the meeting: the Panda pilot job technology and the Local File Catalog LFC will be used.

 

While this may have longer term implications for distributed analysis, the decision does not imply that the same tool will be used for that purpose; both the problems to be addressed and the scale are rather different.

In the short term, while the developers work together to turn the currently available set of tools into a coherent and modular system suitable for the longer-term production needs of ATLAS, the production will continue at full speed with the system used till now, with the usual bug fixes and with the minimal evolutions needed for good operation.

 

It was realised that for NorduGrid this evolution would not be straightforward, as pilot jobs do not really fit that architecture, which is already performing very well. NorduGrid and NDGF support the idea of having just one way of submitting jobs to all the grids.

 

A complete and concise technical documentation and a proof of concept of the new system must however be provided before any decision can be made. This concerns both the "pilot job" option of submitting jobs and the choice of the file catalogue.

4.7      Schedule and Plans

FDR must test the full ATLAS data flow system, end to end

-       SFO → Tier-0 → calib/align/recon → Tier-1s → Tier-2s → analyse

-       Stage-in (Tier-1s) → reprocess → Tier-2s → analyse

-       Simulate (Tier-2s) → Tier-1s → Tier-2s → analyse

 

The SFO→Tier-0 tests interfere with cosmic data-taking.

 

We must decouple these tests from the global data distribution and distributed operation tests as much as possible CCRC’08 must test the full distributed operations at the same time for all LHC experiments.

 

Software releases

-       13.0.30.3: Week of 05-09 Nov

-       13.0.4: Week of 19-23 Nov

-       13.1.0: Week of 5-9 Nov (note clash with 13.0.30.3 - should be manageable)

-       13.2.0: Week of 3-7 Dec

-       14.0.X: Staged release build starts week of 17-21 Dec; base release 14.0.0 available Mid-end Feb 2008

-       15.0.X (tentative): Mid 2008

 

Cosmic runs

-       M6: (Not earlier than) second half of February 2008

-       Continuous mode: Start late April 2008 (depends on LHC schedule)

 

FDR

-       Phase I: February 2008 (before M6)

-       Phase II: April 2008 (before start of continuous data-taking mode)

 

CCRC’08

-       Phase I: February 2008 (coincides with FDR/I)

-       Phase II: May 2008 (in parallel with cosmic data-taking activities)

 

5.    LHCb Quarterly Report and Plans (Slides) - U.Marconi

 

U.Marconi presented the quarterly report with progress and plans for the LHCb Experiment.

Please refer to the Slides for the details.

 

Slides 2-3 shows the Data flow and distribution.

One can notice that all LHCb off-line activities – Reconstruction, Stripping, and Users Analysis - will take place at Tier-1 sites, except Simulation at the Tier-2 sites. All data – RAW, DST, MC - will be both at CERN and at the Tier-1 sites but the rDST – reduced DST – will be at the Tier-1 sites only.

The Users Analysis will use ROOT-tuple files.

5.1      DC’06 and 2007 Activities

DC’06 - since June 2006 - produced 80% of the whole LHCB production. Slide 4 shows the details over time and by site. They also simulated the distribution of Tier-0 data. RAW data was collected at CERN and distributed to the Tier-1s emulating real data taking for reconstruction and stripping at sites.

 

Since February 2007 onwards::

-       Events reconstruction at Tier1s of RAW data files no longer on cache, to be recalled from tape.
A reconstruction job, for instance, uses 20 MC RAW data files as input.

-       The rDST data output has to be uploaded locally to the Tier1.

 

Since June 2007 onwards:

-       Events stripping at Tier1s.
A stripping job uses 2 rDST files as input.
Accesses to the 40 corresponding MC RAW files for the full reconstruction of the selected events.

-       DST files have to be distributed to the Tier1s.

5.2      Feedback Received

The feedback collected from the users are that:

-       Reconstruction it is easy for first prompt processing while it is difficult for re-reprocessing, when files have to be staged.
Jobs are put in the DIRAC central queue only when the files are staged.

-       Too much instability in SEs was noticed.
Full time job checking availability, enabling/disabling SEs in the DMS.

-       Staging at some sites is extremely slow. Problems with the SE software? Problems with the configuration (number of servers, number of tape drives)?

-       Some files are not retrievable from tape: registered in our LFC, found using srm-get-metadata but fail to get a tURL (error in lcg-gt).

-       Shortage problems encountered with Disk1TapeX at the sites.
Need to clean up datasets to get space: painful with SRM v1.

-       Not easy to monitor the storage usage:
Developed a specific agent reporting every day from LFC.
Agents checking integrity between SEs and catalogs.
VOBOX helps but needs guidance to avoid DoS.

 

-       Need to establish a protocol to get warning from site to set a flag in LFC indicating the replica is temporarily unavailable (not used for matching jobs).

-       On our side it may help to tune the number of stage requests issued in one go trying to optimise the recall from tape.

-       Inconsistencies between SRM tURLs and root access.

-       Problems with ROOT finding the HOME directory at RAL, fixed by providing an additional library (compatibility mode on SLC4).

-       Unreliability of rfio, problems with rootd protocol authentication on the Grid (now fixed by ROOT).
lcg-gt returning a tURL on dCache but not staging files, workaround with dccp, then fixed by dCache.

5.3      Tests and Development

The SL4 migration was straightforward for LHCb applications. Problems were found with middleware clients used by those applications: dCache, gfal, lfc, etc.

It is essential to test sites permanently with the SAM framework: CE, SE, SRM.

 

The SRM v2 tests passed successfully. Several plans for SE migration are ongoing at RAL, PIC, CNAF, SARA (to NIKHEF).It is a large effort LHCb have to put, in particular concerning the changes of replicas in the LFC information.

The required VOMS set of groups/roles is not available. With a default set of roles/groups there are still difficulties to have the proper mapping, in particular for SGM and PRD.  It induces difficulties in LFC registration (impossible for us to modify the internal mapping of DNs and FQANs, having to go through the administrators).

 

Slide 10 shows the increasing usage of GANGA and of user analysis jobs since January 2007 to date.

5.4      CCRC08 Goals

The main LHCB goals for CCRC’08 are:

-       Test the full chain: from DAQ to Tier-0 to Tier-1’s.

-       Test data transfer and data access running concurrently (current tests have tested individual components).

-       Test DB services at sites: conditions DB and LFC replicas.

-       Tests in May will include the analysis component.

-       Test the LHCb prioritisation approach to balance production and analysis at the Tier-1 centres.

-       Test sites response to “chaotic activity” going on in parallel to the scheduled production activity.

 

The tasks that LHCb plans to execute are:

-       RAW data distribution from the pit to the Tier-0 centre.
Using rfcp into CASTOR from pit to the T1D0 storage class.

-       RAW data distribution from the Tier-0 to the Tier-1 centres.
Using FTS. Storage class T1D0.

-       Reconstruction of the RAW data at CERN and at the Tier-1s for the production of rDST data.
Using SRM 2.2. Storage class T1D0

-       Stripping of data at CERN and at T1 centres.
Input data: RAW and rDST on T1D0.
Output data: DST on T1D1
Using SRM 2.2.

-       Distribution of DST data to all other centres
Using FTS - T0D1 (except CERN T1D1)

-       Preparation of RAW data will occur over the next few month
We need to merge existing MC datasets into ~2GB files

 

The activities and goals for the February and May challenges are:

-       February activities:
Maintain the equivalent of 2 weeks data taking.

-       May activities:
Maintain equivalent of 1 month data taking.
Run fake analysis activity in parallel to production type activities using generic agents.
Generic agents are the LHCb baseline. It needs to be integral part of CCRC’08.

 

6.    WLCG policy on pilot jobs (Paper) - L.Robertson

 

L.Robertson presented the proposal that he had distributed to the MB.

 

The issue of pilot jobs has been discussed several times at the MB and GDB meeting, and the proposal distributed collects all the decisions on the subject and asked for the approval of the MB.

 

The goal of the document is to define the requirements that once met will ensure that pilot jobs are allowed at all sites.

 

Here is the text of the proposal as distributed before the discussion.

 

WLCG policy on pilot jobs submitting work on behalf of third parties

 

The topic of pilot jobs has been discussed several times in the GDB, and in particular at the last two meetings. At the October meeting it was agreed to make a proposal to the MB to adopt a policy requiring that sites support pilot jobs submitting work on behalf of third parties.

 

A summary note was prepared by J.Gordon (17/10/07) and presented to the MB on 23 October. This identified a number of issues and made recommendations for a pilot job policy. After discussion the following policy was agreed:

WLCG sites must allow job submission by the LHC VOs using pilot jobs that submit work on behalf of other users. It is mandatory to change the job identity to that of the real user to avoid the security exposure of a job submitted by one user running under the credentials of the pilot job user.

 

Implementation of this policy is subject to the following pre-requisites:

1.       The identity change and sub-job management must be executed by a commonly agreed mechanism that has been reviewed by a recognized group of security experts. At present the only candidate is glexec, and a positive review by the EGEE security team would be sufficient.  

2.       All experiments wishing to use this service must publish a description of the distributed parts of their pilot job frameworks. A positive recommendation on the security aspects of the framework by the JSPG to the MB is required.

3.       glexec testing:  glexec must be tested with the commonly used batch systems (BQS, PBS, PBS pro, Condor, LSF, SGE).

4.       LCAS/LCMAPS: the server version of LCAS/LCMAPS must be completed, certified and deployed.

 

This policy should be incorporated in the Grid Multi-User Pilot Jobs Policy document, currently in draft form.

 

 

L.Dell’Agnello noted that INFN agrees with the requirements in the document but would like to have an assessment by each site once the requirements are met. Or that an assessment by the sites is done again once the requirements are met, in order to ensure that the solution adopted satisfies the requirements of the sites from the security point of view.

 

J.Gordon noted that a “pre-agreement” ensures that the sites will accept the pilot jobs once the tests are executed and if the review is positive. If the sites want to keep the choice of refusing the pilot jobs even if the requirements are met then it is useless to execute the tests and the security review.

 

L.Robertson noted that a new assessment should be limited to check whether these requirements above are met and there are no fundamental issues. A refusal based on new or other issues later would cause the whole pilot issue to restart from zero and would not be compatible with the solutions adopted by the Experiments.

 

L.Robertson then suggested to first check whether the requirements (1 to 4 above) are agreed.

 

T.Doyle asked that point 3 should include that glexec must be “analysed, tested and pass the tests with..”.

C.Grandi noted that glexec is part of CREAM and will be certified and reviewed, on its Security aspects, by security experts outside JRA1.

D.Barberis asked that no changes are going to interfere with the current production that are run by the production manager and use a pilot jobs mechanism.

 

F.Hernandez asked what happens if a batch system has problems with the current implementation with glexec?

L.Robertson replied that in this case that issue should be discussed at the MB again: It all depends on the problem found; in general a technical solution could be studied.

 

L.Robertson, replying to an email of R.Pordes, noted that OSG is not mentioned because no specific requisites are referring to OSG or to grid infrastructures.

D.Petravick explained that OSG does not think that JSPG is the best body to review the framework. Usually the OSG approval would go via the VDT process (it is not known if VDT has reviewed glexec).

D.Petravick added that for them it would be fine is the members selected for the security are approved also by OSG.

 

J.Gordon explained that it is not the JSPG that will execute the review. JSPG will choose who the reviewers with adequate expertise will be (security experts, batch experts, etc).

L.Robertson proposed to change the text to the requirement that “the security aspects are approved by the EGEE and OSG security teams.”

 

Ph.Charpentier asked for a clear timescale by when these pre-requisites are verified. For instance some date should be required by when the LCAS/LCMAPS development, certification and deployment must be executed.

C.Grandi replied that currently a prototype is being tested at NIKHEF, but a clear date is not defined. The release could be for end of 2007 but then certification and deployment will be later.

 

L.Robertson noted that the MB will have to discuss the timescale and the reviewers’ membership in a couple of weeks and expects information on the LCAS/LCMAPS development and on the names of the reviewers selected by the JSPG.

 

F.Hernandez asked how this policy will affect the Tier-2 sites and how the information be distributed to them.

L.Robertson replied that the Tier-2 sites will be informed at the GDB.

 

Decision:

Coming back to the initial issue, L.Robertson proposed that it could be explicitly specified that the MB will have to decide whether the requirements are met.

The MB approved the proposal.

 

Update: New proposal distributed by L.Robertson after the MB meeting.

 

WLCG policy on pilot jobs submitting work on behalf of third parties

 

The topic of pilot jobs has been discussed several times in the GDB, and in particular at the last two meetings. At the October meeting it was agreed to make a proposal to the MB to adopt a policy requiring that sites support pilot jobs submitting work on behalf of third parties.

 

A summary note was prepared by J.Gordon (17/10/07) and presented to the MB on 23 October. This identified a number of issues and made recommendations for a pilot job policy. After discussion the following policy statement proposed and endorsed by the MB meeting on 6 November.

WLCG sites must allow job submission by the LHC VOs using pilot jobs that submit work on behalf of other users. It is mandatory to change the job identity to that of the real user to avoid the security exposure of a job submitted by one user running under the credentials of the pilot job user.

 

Implementation of this policy is subject to the following pre-requisites:

1.       The identity change and sub-job management must be executed by a commonly agreed mechanism that has been reviewed by a recognized group of security experts. At present the only candidate is glexec, and a positive review by the security teams of each of the grid infrastructures (OSG, EGEE) would be sufficient.  

2.       All experiments wishing to use this service must publish a description of the distributed parts of their pilot job frameworks. A positive recommendation to the MB on the security aspects of the framework by a team of experts with representatives of OSG and EGEE is required. The frameworks should be compatible with the draft JSPG Grid Multi-User Pilot Jobs Policy document.

3.       glexec testing:  glexec must be integrated and successfully tested with the commonly used batch systems (BQS, PBS, PBS pro, Condor, LSF, SGE).

4.       LCAS/LCMAPS: the server version of LCAS/LCMAPS must be completed, certified and deployed.

 

The policy will come into effect when the MB agrees that all of the above pre-requisites have been met.

 

 

 

7.    AOB

 

 

D.Barberis noted that the INFN pledges for 2008, in the Resource Planning Tables, were reduced considerably and this may cause problems for ATLAS and probably for the other Experiments.

 

He asked that whenever a site changes its pledges the Experiments should be officially informed by the site and they should not discover it by themselves via the Resource Planning tables.

 

8.    Summary of New Actions

 

The full Action List, current and past items, will be in this wiki page before next MB meeting.