-- JamieShiers - 02 Feb 2006

Present

  • Sites: CNAF, FZK, ASGC, RAL, BNL, TRIUMF, SARA, PIC, NDGF, DESY, CCIN2P3

Apologies

CMS (see report below).

Analysis of Tape Throughput Tests

This week we have been running disk-tape tests at >= 50 MB/s with various sites, while continuing the disk-disk tuning for a few sites that did not reach their full potentials last week. FNAL went from 80 to 250 MB/s after configuring yet a newer dCache version (not yet released), and ASCC went from 40 to 110 MB/s after putting in more nodes, switching from ext3 to xfs, and tuning some CASTOR-1 parameters.

The CERN-SARA link suffered a few days of downtime due to a lengthy analysis of an equipment problem never seen before.

The c2sc3srm cluster is needed for other purposes now. We have cancelled the remaining requests that were generated automatically and stopped the load generator.

We will re-enable the channels such that they may be used for traffic using the old WAN cluster as source or destination.

We will set up a load generator for ASCC and can do the same for other sites interested; note, however, that we share the WAN cluster with the experiments, so we must not overload it.

Site Reports

  • CNAF:
    • FTS 1,4 upgrade underway.
    • Castor2 install ongoing.
    • More tape drives

  • FZK: Tape tests run in background - also used for other activites.

  • RAL: didn't take part in tape tests.
    • Focussing effort on T2 transfers - these are now up to speed. e.g. 700 Mb/s to Lancaster.
    • Installing new 2 x 1 gig link to CERN

  • NDGF: no issues

  • PIC: no issues

  • SARA: obtained 50 MB/s - limited by number of drives and number of tapes.

  • ASGC: Would appreciate information from other CASTOR sites on suitable setups and configurations.

  • DESY:
    • DESY has participated in the disk-to-tape re-run since Monday afternoon with a scheduled break from Tuesday morning until Wednesday evening. Please find the summary plot attached (plot 1) showing the dCache to tape throughput for the time interval from 1 February until this morning.
    • Average throughput during Monday night and after restart on Wednesday was 100MB/s. Since usability of our shared 1Gbps WAN link for interactive users was seriously affected we were looking into either cutting back on the SC-3 load or try some advanced technology implemented as part of the functionality in our WAN router, called policy routing that can be used to limit the bandwidth for particular hosts or subnets. Since DESY wanted to maintain the best possible performance for the SC-3 related traffic, however, without the negative side-effects of a saturated link we decided to configure traffic shaping on the WAN router for the dCache pool nodes used in the SC setup.
    • Four pool nodes were assigned to buffer the files that were received from CERN, transparently migrating them to tape (STK 9940b) managed by the Open Storage Manager (OSM). Up to 3 tape drives out of a pool of 20 drives were allowed to be used in parallel. Bandwidth observed between each pool node and the tape movers was close to the limit of the 9940b drives at 30MB/s each over extended periods of times. Therefore bandwidth of imported data was well balanced w.r.t. migration of data to tape (plot 2).
    • The storage nodes used for the Service Challenges are part of DESY's production environment, published as a LCG SE. The latter currently supports 13 different VOs incl. a variety of production grade applications, e.g worldwide grid-based ZEUS MC production, CMS event analysis over the Grid (CRAB) and very I/O demanding applications like Digitization with pile-up for CMS events (60 running jobs require ~400MB/s). Part of last week's program was to let the SE serve all the applications listed above at the same time. We have not seen any performance degradation or interference due to simultaneously active applications - which proves the good scalability of dCache. We have observed, however, that CPU and disk subsystem performance exceeds the capability of a single GigE connection of a disk server. While the GigE link is fully utilized the average CPU consumption of the disk server is at 25% (plot 3).

  • BNL We just stopped SC3 tape run at BNL since it kinda of affected production activitiy today. Anyway, we've already run tape write run for more than five days and we achieved the goal - 50MB/sec. Here's the summary plot of BNL tape write rate (01/30/2006 - 02/04/2006)

  • CCIN2P3
    • Obtained > 50MB/s
    • Used 2 different MSS configuration:
      • Direct access to tapes (Monday to Tuesday): use 5 drives
      • Use intermediate disk in HPSS (Wednesday to Saturday): use 4 drives
    • Issues:
      • Read performance issue on dCache disk while intensive writing (when GridFTP rate is 100MB/s)
      • GridFTP servers stability issue (transfers stuck impossible to kill)

Experiment Reports

LHCb

  • The 1.5TB of seed data for the LHCb T1-T1 exercise is now in place on disk the 6 LHCb Tier 1s. We are therefore ready to begin the T1-T1 exercise and would like to give some advanced warning to the sites and the FTS channel managers. I hope to submit the first set of jobs around 3pm.
  • The first test will involve submitting a FTS job to each of the matrix of T1-T1 channels (each job is of 100 files, each file ~200MB). This will mean each T1 endpoint will have 10 concurrent FTS jobs accessing the endpoint (5 outgoing and 5 incoming). If this first test goes successfully I will continue submitting such a set of jobs regularly at a period of about 1/1.5 hours.
  • Only 1/6 of trabsfer succeeding initially. Various configuration problems / channels set inactive, not quite sure where to submit jobs. LHCb following up issues with sites.

Atlas

  • Internal T0 test last week.
  • Preparing plan for Mumbai.
  • Plasnning for Alice PDC '06.

Alice

  • Adding T2 sites in Russia
  • Seeing problems with long job queues at some sites

CMS

March 1st CMS expects to be able to integrate the analysis batch submit (CRAB) into gLite 3.0 pre-production as it's available. Plan for 6 weeks of functionality and stability testing. Total resource requirements are modest and can be met by the available pre- production sites

Integration of new CMS Production environment to submit to gLite 3.0 is expected on the same time frame.

This should allow CMS to exercise the two main processing applications needed for the remainder of SC4

March 15 CMS expects the release of PhEDEx that can utilize FTS to drive transfers.

April 1 CMS would like to begin low level continuous transfers between sites that support CMS. The goal is 20MB/s (2TB/day) continuous running. There are three groups identified to supervise the transfer systems. CMS has also developed a heartbeat monitor for PhEDEx.

There is also a ramp to demonstrate particular numbers of TB per day between tiers. Numbers should be agreed by next week.

April 15 Begin production scale running on gLite 3.0 with simulation and analysis applications. The goal by the end of the year is to have successfully demonstrated 50k-100k jobs submitted per day

June 1 We expect a 10TB sample of the new CMS Event Data Model data for transfer and analysis access

May 29 - June 12 Two week period of running to demonstrate the low level functionality of all elements of the CMS computing model.

July-August CMS Expects Production for the 2006 Data challenge at the rate of 25M events per month. Should not require more than the CMS share of computing facilities

September Preparations for Computing Software Analysis Challenge 2006 (CSA06)

October Execute CSA06.

Final preparation for SC4 Workshop at TIFR in Mumbai

  • Registration desk open Thursday from 16 - 18 and Friday from 08:00

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng BNLsummary.png r1 manage 2.3 K 2006-02-06 - 11:03 JamieShiers BNL SC3 disk-tape throughput
GIFgif SC3-CRAB-02-2006.gif r1 manage 58.1 K 2006-02-06 - 10:46 JamieShiers plot 3
GIFgif SC3-import-02-2006.gif r1 manage 65.1 K 2006-02-06 - 10:46 JamieShiers plot 2
PDFpdf SC3-re-run-summary-02-2006.pdf r1 manage 410.8 K 2006-02-06 - 14:58 JamieShiers Better versions of DESY plots in 1 file
GIFgif SC3-tape-02-2006.gif r1 manage 13.3 K 2006-02-06 - 10:45 JamieShiers plot 1
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2007-02-02 - FlaviaDonno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback