-- JamieShiers - 21 Feb 2006

Summary of SC4 Workshop in Mumbai

A successful - and sometimes animated - SC4 workshop was held in Mumbai prior to CHEP 2006. Good progress was made on agreement in a number of areas, including storage management, services (support and operation) and the overall goals of the experiments for the SC4 service phase.

The slides - and conclusions of some areas - can be found on the agenda page (see under 'Hot Links' of the SC Wiki).

See also the file "sc4-expt-plans.ppt" on Service Challenge talks and documents page, which includes the issues listed below.

Decisions

  • For support issues, it has been agreed that we will use helpdesk@ggusNOSPAMPLEASE.org (or www.ggus.org) from now on. The existing support lists will be closed down latest May 2006.

Outstanding Issues

  • The details of the T1<->T1 transfers still need to be finalised. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows:
    • All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s.
    • dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s

  • ATLAS:
    • It is understood that the Tier0 exercise in March does not include any transfers to external sites - this is foreseen for June.
    • The data rates presented for data export to Tier1s - 720MB/s - do not include the full ESD to BNL. This was mentioned verbally at the workshop. If confirmed, this would add 80MB/s to the total rate and that to BNL. This brings the total export rate for ATLAS to 800MB/s.
    • Job submission rates per target grid need to be defined.
  • ALICE:
    • An LFC service is required at all sites serving ALICE as a local file catalog (task force communication).
    • The "proof@caf" issue has not been discussed at the Tier0.
    • xrootd is requested at all sites. This is being negotiated on a site by site basis.
  • CMS:
    • A new schedule needs to be produced taking into account the offical dates for SC4 and gLite release schedule.
    • The exact implications of the "trivial file catalog" implementation - in terms of the precise service(s) that sites need to deploy - needs to be defined urgently.
    • There has been some clarification of the job submission plans, namely gLite 3.0 WMS for LCG sites and Condor-G for OSG sites. The split of the job load across the grids / sites also needs to be defined and agreed.
    • CMS stress the need to test the entire data management chain with files >2GB, to ensure that these are fully supported by all relevant components and services.
  • LHCb:
    • The mention of "xrootd" post-SC4 is assumed to imply an evaluation and does not imply a specific service request.

  • General:
    • The detailed schedule and resource requirements need to be discussed and agreed once the above issues are resolved.

Experiment Production Plans

ALICE

The first point of this year?s PDC?06/SC4 plan is the scheduled rerun of SC3 T0 disk ? T1 disk transfers (max 150MB/s). These will be scheduled transfers through the FTD-FTS system and the target T1s are CNAF, IN2P3 Lyon, GridKa and RAL. Data generated during PDC?05 and available at CERN will be used. The amounts of data to be transferred to each centre will depend on the available storage capacity; however a possible scenario is to remove the transferred data on the target SE after it has been successfully transferred. The target duration of the exercise is 150 MB/s aggregate throughput during 7 days.

In parallel to the file transfers, we will continue to run jobs to test the stability of the complete system.

The requirement for LFC as a local catalog at all sites was clarified.

ATLAS

ATLAS' SC4 requests are summarised as follows:

  • March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)
  • April-May (pre-SC4): tests of distributed operations on a ?small? testbed (the pre-production system)
  • Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL)
  • 3 weeks in July: distributed processing tests (Part 1)
  • 2 weeks in July-August: distributed analysis tests (Part 1)
  • 3-4 weeks in September-October: Tier-0 test (Phase 2) with data to Tier-2s
  • 3 weeks in October: distributed processing tests (Part 2)
  • 3-4 weeks in November: distributed analysis tests (Part 2)

CMS

CMS emphasised the requirement to test the entire chain using files large than 2GB (to make sure that there are no hidden limitations still remaining...)

The timeline presented below needs to be aligned with the official SC4 schedule

The overall timeline (copied from previous minutes - see CMS SC4 workshop presentation for details) is as follows:

March 1st CMS expects to be able to integrate the analysis batch submit (CRAB) into gLite 3.0 pre-production as it's available. Plan for 6 weeks of functionality and stability testing. Total resource requirements are modest and can be met by the available pre- production sites

Integration of new CMS Production environment to submit to gLite 3.0 is expected on the same time frame.

This should allow CMS to exercise the two main processing applications needed for the remainder of SC4

March 15 CMS expects the release of PhEDEx that can utilize FTS to drive transfers.

April 1 CMS would like to begin low level continuous transfers between sites that support CMS. The goal is 20MB/s (2TB/day) continuous running. There are three groups identified to supervise the transfer systems. CMS has also developed a heartbeat monitor for PhEDEx.

There is also a ramp to demonstrate particular numbers of TB per day between tiers. Numbers should be agreed by next week.

April 15 Begin production scale running on gLite 3.0 with simulation and analysis applications. The goal by the end of the year is to have successfully demonstrated 50k-100k jobs submitted per day

June 1 We expect a 10TB sample of the new CMS Event Data Model data for transfer and analysis access

May 29 - June 12 Two week period of running to demonstrate the low level functionality of all elements of the CMS computing model.

July-August CMS Expects Production for the 2006 Data challenge at the rate of 25M events per month. Should not require more than the CMS share of computing facilities

September Preparations for Computing Software Analysis Challenge 2006 (CSA06)

October Execute CSA06.

LHCb

In preparation for SC4 production phase (June on), LHCb foresee generating 100M B-physics + 100M min bias events (event generation, detector simulation & digitization). This will require 3.7 MSI2k month required (~2-3 months) and 125 TB on MSS at Tier-0 (keep MC True). It is foreseen to start mid-March ramping up to full production by end-March.

Preparing for SC4 Disk-Disk and Disk-Tape Throughput Tests in April

These are the well-known rates that should be achieved in MB/s.

It is important to emphasise that these are daily averages sustained over extended periods - not one-time peaks.

Site Disk-Disk Disk-Tape
ASGC 100 75
TRIUMF 50 50
BNL 200 75
FNAL 200 75
NDGF 50 50
PIC 100 75
RAL 150 75
SARA 150 75
IN2P3 200 75
FZK 200 75
CNAF 200 75

As usual, we will first run the disk-disk throughput test and then disk-tape.

(The rate for BNL assumes that a full copy of the ESD is exported there.)

(In July, the disk-tape rates go up to full-nominal, i.e. the disk-disk rates in the table above).

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2007-02-14 - FlaviaDonno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback