--
JamieShiers - 17 Feb 2006
Present
Conference-call abandoned as too few sites/experiments had called in by timeout.
Summary of SC4 Workshop in Mumbai
A successful - and sometimes animated - SC4 workshop was held in Mumbai prior to CHEP 2006. Good progress was made on agreement in a number of areas, including storage management, services (support and operation) and the overall goals of the experiments for the SC4 service phase.
The slides - and conclusions of some areas - can be found on the agenda page (see under 'Hot Links' of the
SC Wiki).
Experiment Production Plans
ALICE
The first point of this year’s PDC’06/SC4 plan is the scheduled rerun of SC3 T0 disk – T1 disk transfers (max 150MB/s). These will be scheduled transfers through the FTD-FTS system and the target T1s are CNAF,
IN2P3 Lyon,
GridKa and
RAL. Data generated during PDC’05 and available at CERN will be used. The amounts of data to be transferred to each centre will depend on the available storage capacity; however a possible scenario is to remove the transferred data on the target SE after it has been successfully transferred. The target duration of the exercise is 150 MB/s aggregate throughput during 7 days.
In parallel to the file transfers, we will continue to run jobs to test the stability of the complete system.
The requirement for LFC as a local catalog at all sites was clarified.
GSSDATLAS' SC4 requests are summarised as follows:
- March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)
- April-May (pre-SC4): tests of distributed operations on a “small” testbed (the pre-production system)
- Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL)
- 3 weeks in July: distributed processing tests (Part 1)
- 2 weeks in July-August: distributed analysis tests (Part 1)
- 3-4 weeks in September-October: Tier-0 test (Phase 2) with data to Tier-2s
- 3 weeks in October: distributed processing tests (Part 2)
- 3-4 weeks in November: distributed analysis tests (Part 2)
CMS
CMS emphasised the requirement to test the entire chain using files large than 2GB (to make sure that there are no hidden limitations still remaining...)
The timeline presented below needs to be aligned with the official SC4 schedule
The overall timeline (copied from previous minutes - see
CMS SC4 workshop presentation
for details) is as follows:
March 1st
CMS expects to be able to integrate the analysis batch submit (CRAB)
into gLite 3.0 pre-production as it's available. Plan for 6 weeks
of functionality and stability testing. Total resource
requirements are modest and can be met by the available pre- production sites
Integration of new CMS Production environment to submit to gLite 3.0 is expected on the same time frame.
This should allow CMS to exercise the two main processing applications needed for the remainder of SC4
March 15
CMS expects the release of
PhEDEx that can utilize FTS to drive transfers.
April 1
CMS would like to begin low level continuous transfers between sites
that support CMS. The goal is 20MB/s (2TB/day) continuous
running. There are three groups identified to supervise the
transfer systems. CMS has also developed a heartbeat monitor for
PhEDEx.
There is also a ramp to demonstrate particular numbers of TB per day
between tiers. Numbers should be agreed by next week.
April 15
Begin production scale running on gLite 3.0 with simulation and
analysis applications. The goal by the end of the year is to have
successfully demonstrated 50k-100k jobs submitted per day
June 1
We expect a 10TB sample of the new CMS Event Data Model data for transfer and analysis access
May 29 - June 12
Two week period of running to demonstrate the low level functionality of all elements of the CMS computing model.
July-August
CMS Expects Production for the 2006 Data challenge at the rate of 25M
events per month. Should not require more than the CMS share of
computing facilities
September
Preparations for Computing Software Analysis Challenge 2006 (CSA06)
October
Execute CSA06.
LHCb
In preparation for SC4 production phase (June on), LHCb foresee generating 100M B-physics + 100M min bias events (event generation, detector simulation & digitization). This will require 3.7 MSI2k · month required (~2-3 months) and 125 TB on MSS at Tier-0 (keep MC True). It is foreseen to start mid-March ramping up to full production by end-March.
Site Summaries
- ASGC
- join CMS phedex transfer with CASTOR-SC (with new version of phedex)
- all Castor SC pool nodes migrate to kernel 2.6, will see if this help improving the performance during the CMS phedex rerun (previous we've reach about 80 MB/s)
- new DQ2 deployed at ASGC, start Atlas DM this week.
- complete internal applications for tape procurement (kind of late)
- CMS T1 status report this Thu
- usage report
- schedule for performance issue for SC4
- report of internal testing (disk I/O and disk/tape)
- trouble tracking with BNL third party replication, (CASTOR and dCache)
Preparing for SC4 Disk-Disk and Disk-Tape Throughput Tests in April
These are the well-known rates that should be achieved in MB/s.
It is important to emphasise that these are daily averages sustained over extended periods - not one-time peaks.
Site |
Disk-Disk |
Disk-Tape |
ASGC |
100 |
75 |
TRIUMF |
50 |
50 |
BNL |
200 |
75 |
FNAL |
200 |
75 |
NDGF |
50 |
50 |
PIC |
100 |
75 |
RAL |
150 |
75 |
SARA |
150 |
75 |
IN2P3 |
200 |
75 |
FZK |
200 |
75 |
CNAF |
200 |
75 |
As usual, we will first run the disk-disk throughput test and then disk-tape.
(In July, the disk-tape rates go up to full-nominal, i.e. the disk-disk rates in the table above).
Move of SC Operations into Mainstream
(Mail from Nick (
Nicholas.Thackray@cernNOSPAMPLEASE.ch) follows)
Dear ROC managers
As part of the move to bring the SC sites into the main grid operations stream (with the aim of eventually merging the production and SC sites at each institute) I would like to request that we put the tier-1 SC sites through the usual Site Registration Procedure, as is carried out for all sites joining the EGEE infrastructure
(a link to the policy document is here:
https://edms.cern.ch/document/503198/
).
The list of Tier-1 SC sites (with associated contact details) is:
The information required by the Site Registration Policy is:
1) The full name of the participating institute, applying to become a site.
2) The abbreviated name of the site to be published in the Information System.
3) The name, email address and telephone number of the Site manager.
4) The name, email address and telephone number of the Site Security Contact.
5) The email address of a managed list for contact with Resource Administrators at the site.
6) The email address of a managed list for contact with the site security incident response team.
7) The name of the ROC providing support for the site.
For point 2) I would like to request that we use the naming convention of “SC-xxx-yyy” where xxx is some easily recognized name for the institution and yyy is an optional addition to distinguish different sites at an institute (for example, SC-MyDesk-laptop and SC-MyDesk-desktop).
I would like to reach some agreement on this at next week’s ROC managers’ meeting.
Best regards,
Nick
AOB