CCRC'08 Calendar

Month Sites downloads Sites Experiments Experiment Activity Deployment Task Event
Oct 07 T0-CERN
  • The 01/10 from 14:40 until 16:30: CERN CASTORPUBLIC DOWN. LHC users not affected but DTEAM and SAM tests might be.
  • The 04/10 from 9:00 until 13:00: Upgrade of the CASTORLHCB Castor instance at CERN
  • From 26/10 (16:30) until 02/11 (9:00): Draining to move to SLC4 submission
  • The 29/10 from 8:00 to 11:00: intervention on the batch system
  • The 11/10 from (9:00) until (11:59): down of Castor stager Oracle db . all srm 1.1 services will be affected
  • The 17/10 from (10:00) until (13:00): tape service interruption
  • The 16/10 from (13:15) to (13:16): access to LFC database will be stopped. A modification script will then update to the new DESY-HH srm-endpoint name for the Atlas VO. A backup is run prior to the change.
  • From 18/10 (12:15) until 29/10 (19:20)
  • From 30/10 (19:40) to 31/10 (19:20): Reconfiguration
  • From 31/10 (18:35) until 07/11 (11:30): perfomance problems need to be drained
ALICE: FDR phase 1

CMS: CSA07; s/w release 1_7
SRM v2.2 deployment starts CCRC'08 kick-off
Nov 07 T0-CERN
  • The 14/11 from 9:00 to 14.00: Upgrade of the Public CASTOR2 stager at CERN - affecting OPS and DTEAM VO's
  • From 15/11(14:34) until 05/12 (14:34): under deployment
  • The 29/11 from 9:00 until 10:30: ce101-ce123 down, down
  • From 15/11(15:25) until 05/12 (14:25): endpoint not fully configured yet

  • The 13/11 from (8:00) until (19:00): On Tuesday, Nov 13, the INFN - CNAF Tier1 Castor instance will be upgraded to the latest version of the Castor software.
  • The 21/11 from (16:00) until (18:00): Apply Oracle critical patch
  • From 30/11 (17:16) until 03/12 (12:00): due to an emergency intervention on one of our switches on Monday, nd since this involves shared storage by WNs, we shall close batch queues tonight to allow running jobs to complete, and open them again, once hardware maintenance is finished.
  • The 6/11 (from 9:00 until 23:00): Upgrade to dCache version 1.8. All VO''s using the GridKa SE are affected. FTS data transfers are stopped during this time. See also "news" on the GGUS website
ALICE: FDR phase 1+2

CMS: 2007 analysis completed
SRM v2.2 continues (through year end at Tier0/Tier1 sites and some Tier2s) WLCG Comprehensive Review
WLCG Service Reliability workshop 26-30
Dec 07 T0-CERN
  • The 10/12 from (9:00) until (10:00). To apply fix to LHCB LFC DB and upgrade LFC server software to latest version 1.6.7.
  • The 10/12 from (9:00) until (12:00): Scheduled downtime for CERN VOMRS and VOMS services for upgrade
  • From 14/12 (13:39) until 19/12 (14:00): SLC4 upgrade afecting several CEs
  • From 19/12 (14:30) until 31/12 (10:50): SLC4 upgrade affecting several CEs
  • From 31/12 (10:37) until the 04/01 (21:00): SLC4 upgrade affecting several CEs
  • The 13/12 from (9:00) until (16:00): Upgrade to version 1.6.7

ALICE: FDR phase 1+2
11th-21st: 1st Commissioning exercise

CMS: s/w release 1_8
SRM v2.2 continues (through year end at Tier0/Tier1 sites and some Tier2s) Christmas and New Year
Jan 08 T 0-CERN:
  • The 17/01 from (14:00) until (15:00): Intervention on LHCb LFC servers requested by LHCb. The downtime allows to run a database script
  • From 23/01(16:00) until 30/01(16:00) : Migration to 64 bit submission.
  • The 24/01 from (10:00) until (16:00): CASTORCMS intervention
  • The 28/01 from (10:00) until (13:00): Upgrade to latest patch (glite 1589). No user-visible downtime forseen.
  • The 19/01 from (9:00) until (12:25): CASTORATLAS upgrade to version 2.1.6
  • The 29/01 from (9:00) until (22:00): Move to SLC4 and LFC version 1.6.8. No user-visible downtime foreseen.

  • The 04/01 from (9:37) until (13:30): due to power outage CNAF Center was down
  • From 10/11 (9:00) to 11/11 (15:30): CASTOR databases migration to Oracle RAC
  • The 28/01 from (14:00) until (18:00): The LHCb's lfc database will be migrate to a new hardware
  • The 30/01 from (11:30) until (18:30): LFC upgrade to version 1.6.8-1
  • The 31/01 from (10:00) until (14:00): SRM will be migrated to a new hardware configuration
T0: running SRM 1.3-12

CNAF: running SRM 1.3-11. 2 StoRM frontend hosts for ATLAS and CMS and 1 backend. 1 endpoint under definition for LHCb (1 frontend, 1 backend). 3 gridftp server for ATLAS and CMS and 2 for LHCb

running running SRM 1.3-11. Reaching the 90% of 2008 cpu pledged and a few TB short in disk and 500TB of tape. Likely full pledge for 1st April

running SRM 1.3-8 upgrading to 1.3-12

Full 2008 pledged resources available


ATLAS: Pre-tests1:
Raw data distribution from PIT to T0
using rfcp into CASTOR from PIT: T1D0
Raw data distribution from T0 to T2
Using FTS T1D0
SRM v2.2 continues at Tier2s

SRM 1.3-12 released

DPM 1.6.7 current version in use

StoRM porting on SLC4 ongoing
Feb 08
CCRC'08 phase I (from 4th to 29th)
  • From 14/02 (9:57) until 15/02 (19:57): Possible HW error on the machine. Need to drain it for further investigation
  • From 16/02 (8:58) until 29/02 (8:58): we need to drain the box ( because of a possible hardware problem
  • The 27/02 from (9:00) until (13:00): CASTORPUBLIC upgrade to 2.1.6
  • From 28/02 (9:54) until 29/02 (9:54): Possible hardware problem in
  • The 05/02 from (10:00) until (14:00): due to CERN SRM rollback we change the downtime to ordinary database management
  • The 12/02 from (12:00) until (15:00): movement of server rack
  • The 13/02 from (10:00) until (12:00): apply patch to the LFC and FTS Oracle database. Operation should be transparent
  • The 13/02 from (10:00) until (14:00): We need to temporarily power off a disk rack to fix a potential issue. This will affect our GPFS storage pool; hence, jobs utilizing GPFS-based storage may crash during the intervention. Job queues have been closed in preparation of this outage.
  • The 14/02 from (13:00) until (20:00): Services related to INFN-T1 library will be at risk due to power maintenance
  • The 26/02 from (14:30) until (16:30): CPU patch on Oracle database
T0: running Castor 2.1.6-10 for ATLAS and CMS while 2.1.4-10 for LHCb and ALICE. On target to deliver full pledges by 1 April

Castor2.1.4-10. StoRM v1.3.19. Unlikely to have all disk capacity by 1st april. Waiting for further info

Castor2.1.4-10. MoU cpu pledge delivered

: Castor 2.1.4-10

All sites
have plans to upgrade the Castor version after CCRC phase I

dCache 1.8.0-12p4 (private)
All hardware for April is on site and ready for installation. October additional acquisitions are 1150 KSi2K CPU, 600 TB disk and 800 TB tape for ALICE and 50 TB disk for CMS to reach full 2008 pledges

dCache 1.8.0-12p4

dCache 1.8.0-12p4 (private)

dCache 1.8.0-12p4
SARA: dCache 1.8.0-12p4

: dCache 1.8.0-12p4. SRM v2 configuration and space tokens definition implementation

BNL: dCache 1.8.0-12p4

: ordered 3000 KSi2K cpu, 1PB disk and 480 TB of tape meeting therefore the April 2008 pledged resources on cpu and tape, planning for remaining disk in summer, up to 1PB contigent on the real needs

Final disk orders now to be useable in May



ALICE: FDR phases 1+2+3
Startup of the 2nd Commissioning exercise delayed 2 weeks. Same suite of subdetectors as in Dec. T1 recons. included in this phase
From the beginning of the month, FTS transfer tests with dCache T1 sites only. At the end of the month trasfers to CNAF (Castor2) succesfully achieved. Sucessfully transferring to Lyon, SARA, FRK. Pending RAL and small issues with NDGF .
Important topic: T2 storage solutions. Several T2 sites succesfully installing DPM+xrootd. HowTO manual preparation for all T2 sites

Week 1: Together with CCRC08. Setup of ATLAS site services. Testing of the SRM2.2 endpoints.
Week 2: FRD exports:
Week 3: Real CCRC exercise. Data produced with load generator pushed out of CERN. Data distributed to all 10 T1 according MoU
Week 4: T1-T2 replication within cloud. Replication of M5 data to T1 tapes. Reprocessing foressen for the end of the week. T0-T1 transfers until Monday

CMS: CSA08 phase 1: Functional blocks
  • Functionality and performance tests
  • Re-reconstruction at T1
  • Cooldown of magnet
    • private global runs (2 days/week) and private mini-daq
  • 1st week: T0-T1 to 3 T1s. T1-T1 among 3 T1. T1-T2 for 4 T1. T2-T1 to 4 T1
  • 2nd week: T0-T1 to 4 T1s. T1-T1 among 4 T1. T1-T2 for 3 T1. T2-T1 to 3 T1
  • 3rd week: T0-T1 to 7 T1s
  • 4th week: T0-T1 to all T1s. T1-T2 for all T1. T2-T1 to all T1

  • Major Activities: Maintain equivalent of 2 weeks data taking, assuming a 50% machine cycle efficiency. used quasi fake RAW files of ~2 GB
  • New version of DIRAC (DIRAC3) available for CCRC`08
  • LFC: Use of local T1 instance at RAL, IN2P3 and CNAF. Use "streaming" to populate the read-only instance at T1 from T0
  • Raw data distribution from PIT to T0 (rfcp into CASTOR from PIT T1D0) and T0-T1 (FTS T1D0)
  • Week1(16th): Ramp up to half the data rates and continuing until the full data rate achieved for the beginning of the week2. Transfer data for 2h at full rate
  • Establishment of an stable operation procedure for a minimun of 24h period. Mimic LHC beam time efficiency by running data transfers for 6h followed by a 6h break
  • Week2(25th): Data transfer continuing at full rate: 6h on 6h off to mimic LHC operation
  • Reconstruction of raw data at T0 and T1 sites (production of rDST data T1D0 and use of SRM2.2)
  • From 27th LHCb attempts to reconstruct the data already distributed
  • The stripping phase and the distribution of DST to all T1 centres will not occur during the last 2 weeks of February. If progress permits we would like to extend the challenge by a week to test this aspect
SRM v2.2 complete at Tier2s

Castor2: 2.1.6-10 and 2.1.4-10 versions co-existing. 10 software problems reported and 6 problems fixed (in 2-3 days average)

1.8.0-12.p4 available in all T1 sites, while version 1.8.0-13 in development. Releasing date not yet known. dCache team will not force sites to upgrade. the decision is left to sites and experiments

v.1.3.19 available
EGEE User Forum 11-14 Feb
LHCC referees meeting
Mar 08 T0-CERN:
  • The 02/03 from (9:30) until (10:00): lsf master node upgrade to new version and kernel security update
  • The 02/03 from (12:55) until (13:00): still problems with main master: need to reboot it. Related to the previous input
  • The 03/03 from (15:32) until (16:10): repeated LSF master batch daemon crashes. Under investigation
  • The 03/03 from (17:55) until (21:55): still problems with LSF master batch daemon. The problem is being debugged.
  • The 06/03 from (9:00) until (13:00): 2.1.6 CASTORLHCB upgrade
  • The 18/03 (9:00) from until (10:30): castorcms 2.1.6-11 update
  • The 19/03 from (9:00) until (10:30): CASTORATLAS 2.1.6-11 update
  • The 19/03 from (14:00) until (15:30): CASTORLHCB 2.1.6-11 update
  • The 05/03 from (10:00) until (12:00): recompiling the Oracle's views after the apply the Jan-08 CPU patch
  • From 21/03 (9:00) until 08/04 (10:00): T1-INFN downtime due to computing rooms reorganization
  • From 10/03 (10:00) until 12/03 (12:00): (Extension) General Outage for maintenance
  • The 11/03 (from 06:00 until 06:01): General Outage for maintenance
  • The 11/03 (from 06:00 until 11:00): Possible perturbations due to general maintenance of IN2P3-CC
  • The 11/03 (from 06:00 until 18:00): General outage for maintenance
  • The 11/03 (from 06:00 until 18:00): Possible perturbations due to general maintenance at IN2P3-CC
  • The 11/03 (from 11:00 to 12:27): Oracle security patch
  • The 11/03 (from 12:00 until 18:00): Possible perturbations due to general maintenance at IN2P3-CC
  • The 12/03 (from 16:31 until 16:32): Attempt to take this no out of maintenance for SAM
  • From 01/03 (23:00) until 02/03 (12:00): Schedule Power Maintenance and also for new power generator installation
  • From 02/03 (20:42) until 03/03 (7:00): inaccessible file server for quanta HPC cluster
  • The 19/03 (from 00:30 until 17:00): This maintenance include: 1) Castor2 from 2.1.4-10 to 2.1.6-10 2) Hardware migration for Oracle RAC. This will also affect the FTS, LFC and Castor services at ASGC
  • From the 04/03 (14:00) until 05/03 (02:00): dCache upgrade to 1.8.0-12p6
  • The 03/03 from (8:30) until (9:30): Backbone routers receive new firmware
  • The 10/03 (from 7:00 until 8:00): There is a network maintanance but it is unlikely that problems occur
  • From 20/03 (13:00) until 25/03 (11:00): The Informationsystem is broken
  • The 03/03 (from 9:30 until 18:00): network configuration changes
  • From 03/03 (10:00) until 04/03 (15:00): CMS Castor service classes unavailable due to backend servers being removed to swap backplane. This is to fix a hardware issue.
  • The 04/04 (from 09:49 until 10:11): All SRMs paused while central file server is rebooted
  • From 04/04 (14:00) until 05/04 (18:00): LHCb Service Classes unavailable due to a hardware intervention on the backend disk servers. This is to fix a known hardware problem with backplane.
  • The 07/03 (from 14:04 until 14:30): Emergency RAC Backend maintenance
  • The 07/03 (from 14:56 until 16:00): Oracle RAC maintenance work
  • The 11/03 (from 08:00 until 10:40): FTS downtime for Oracle and system patching
  • From 11/04 (10:00) until 12/04 (16:30): Atlas Castor service classes unavailable due to a hardware intervention on the disk servers. This is to fix a known hardware problem with the backplanes. Also the Atlas stager database will be migrated to Oracle RAC during this period
  • From 12/03 (19:40) until 13/03 (17:00): Problem with disk2disk for CMS. The number is growing we're getting more and more diskcopies per file. Castor Team has decided to stopped CMS instance for tonight
  • The 18/03 (from 08:30 until 09:00): RAL Site external network connectivity broken due to installation of new firewall
  • The 19/03 (from 09:30 until 12:00): Downtime of CMS Castor instance at RAL-LCG2 for upgrade to 2.1.6 - abandoned due to encountered problems
  • From 25/03 (11:00) until 31/03 (14:00): - MySQL maintenance
  • The 28/03 (from 10:30 until 17:00): New hardware being added to 3D rack - not expected to interrupt service
  • The 31/03 (from 13:02until 20:00): RAL-LCG2 CPU Farm Unavailable.
  • From 03/03 (12:55) until 21/03 (17:55): Relocating cluster
  • The 07/03 (from 12:00 until 16:00): Oracle patching of the NDGF 3D service
  • The 07/03 (from 13:30 until 13:48): Switch of power supply will lead to a short outage on the host
  • The 13/03 (from 13:00 until 18:10): Upgrade of dCache to 1.8.0-13
  • The 27/03 (from 13:50 until 14:50): Short service restarts on the central dcache nodes
  • The 28/03 (13:43) until the 31/03 (14:43): Site is being upgraded.

  • The 06/03 (from 19:00 until 21:00): Apply Oracle CPU JAN2008 patch, then recompile Oracle views
  • The 13/03 (from 21:00 until 23:00): ATLAS - Apply Oracle CPU JAN2008 patch, then recompile Oracle views
  • The 19/03 (from 17:00 untl 21:00): dcache pool node adding RAM and new 10g cards. Faster disks on SRM. WN bios upgrade

RAL: Move to Castor 2.1.6. Tape migration issues under inverstigation. Repacking testing using CCRC'08 data (Not done). Backend DBs moving to Oracle RAC (Not done). Alice-xrootd (to be done)

I INFN:* Will have the storage beginning of Max and CPU by mid-May

ASGC: 21st March: 2MW generator installed to provide backup power. 19th March: Castor upgrade from 2.1.4 to 2.1.6


ALICE: FDR phases 1+2+3*

*ATLAS: Production for FDR2
(8 weeks)
  • Continuing FTS exercises (raw data distribution to T1)
  • FDR-1 Re-processing (reconstruction at T1 pending)
  • Tests of SRM2.2 at CERN
  • ATLAS conventions for using directories published in twiki for T1 information M6 (3rd-7th)
  • Cleanup CCRC08 data disk and tape
  • atldata Castor pool ready for M6
  • Muon calibration streams setup
  • T0-T2 Michigan (needs to be addedd to BDII and GOCDB) and Munich
  • T0-T1-T2 stream to Rome and Naples via CNAF
  • Need to ask for extra channels
  • Take cosmics data
  • Funtional tests, nomunal reate and file transfers (10th-14th) T1-T1 Lyon and PIC and CNAf-SARA (25th-28th)
  • Throughput tests, push to a max of 200% of the nominal rate (17th-20th)

LHCb: Stripping and DST distribution to all T1 sites

  • CMSSW 1.8.0 sample production (slipped into April)
  • Low i test beam-pipe baked-out (also into April)
SRM: remove the restriction on number of deamons, improve the database garbage collection and provide more admin tools if required

SRM 1.3-15 migrated to SL4 (mid March)

CREAM CE: Once the candidate passes the internal stress tests (INFN in Feb) a patch will be prepared and certificate will follow. Finally it will pass to PPS. All the operations foreseen in March

WMS in SLC4 (release to PPS)

glexec on the WN: internal tests on March. Still 3-4 monts away of production

StoRM: Release >=1.3.20 available since the end of this month
CCRC F2F - GDB review
Easter 21-24 March
Apr 08 T0-CERN
  • The 07/04 from (9:00) until (13:00): CASTORALICE 2.1.6 upgrade
  • The 08/04 from (8:00) until (13:00): Upgrade of Atlas Castor-2 MSS backend at Cern
  • The 14/04 from (11:00) until (18:00): Software upgrade on all Quattor managed servers (all T-0 services) that includes a new kernel. VOBOXEs should be rebooted by the experiment responsible after. The software upgrade includes a new castor client 2.1.6-12 (PPS nodes available)
  • From 17/04 (13:26) until 09/05 (17:15): draining CEs for hardware replacement
  • The 17/04 from (14:00) until (18:00): LFC downtime (due to sheduled DB hardware migration, and hostname change for LOCAL ATLAS catalog)
  • The 22/04 from (8:00) until (14:00): Upgrade of the public Castor service at Cern, affecting OPS and DTEAM VOs
  • The 29/04 from (8:00) until (10:00): Software and kernel upgrades in the batch system and CEs
  • The 29/04 from (9:00) until (12:00): Upgrade of CASTORCMS to v 2.1.7-4
  • From 27/04 (16:00) until 01/05 (20:15): CE Hardware & Middleware upgrade

  • The 14/04 (from 16:30 until 18:30): dCache downtime: PNFS database repartitioning
  • From 15/04 (14:00) until the 16/04 (14:00): Power maintenance at BNL
  • The 01/04 (from 10:30 until 18:00): Upgrade of LHCb instance to Castor 2.1.6
  • From 01/04 (12:00) until 02/04 (16:00): Internal Oracle problem affecting any Put/Get request to srm-v2 node
  • The 02/04 (from 11:04 until 16:04): Oracle patch being installed on SRM nodes
  • The 02/04 (from 15:10 until 17:10): ORACLE Patch
  • The 04/04 (from 10:40 until 17:00): RAL-LCG2: lcgrb01 urgent maintenance needed
  • The 08/04 (08:30) until the 14/04 (13:00): Upgrade of CMS Castor instance to 2.1.6
  • From 09/04 (08:30) until 10/04 (08:10): Upgrade of LHCb Castor instance to 2.1.6
  • From 09/04 (10:46) until 10/04 (08:10): Down for OPS due to back-end instance being upgraded
  • The 09/04 (from 10:48until 20:00): LHCb CASTOR backend in maintenance
  • From 09/04 (12:30) until 10/04 (08:10): RAL-LCG2: OPS tests failure because of LHCb castor 2.1.6 upgrade
  • The 09/04 (from 12:31until 19:00): Down for OPS and LHCB while LHCb castor instance is upgraded.
  • From 10/04 (08:30) until 11/04 (11:00): Atlas instance upgraded to 2.1.6 and Backplane swapout on 12 disk servers in atlasStripInput atlasT0Raw
  • From 11/04 (18:02) until 16/04 (18:00): LHCB instance at risk due to disk2disk problems in castor 2.1.6-12
  • The 14/04 (from 10:00 until 18:00): RAL-LCG2 - lcgmon01 upgrade to SL4 glite 3.1 MON
  • The 16/04 (from 09:00 until 18:00): RAL-LCG2: - LFC MySQL to Oracle backend migration
  • The 22/04 (from 09:30 until 10:00): Short (5 minute) OPN interruption to install fibre tap
  • The 22/04 (from 10:00 until 11:00): RAL-LCG2 OPN routing change
  • The 16/04 (from 04:00 until 06:00): There are limited impacts for bdii, lfc and fts service, since the blade hassis have something trouble about the network module.
  • The 24/04 (from 08:30 until 09:30): service update affecting the LFC
  • From 03/04 (11:00) until 04/04 (17:00): Some dCache pools down due to electrical work - some ATLAS files will be unavailable
  • From 03/04 (12:00) until 04/04 (17:00): NDGF-T1 FTS server downtime due to electrical maintenance work
  • The 24/04 (from 08:00 untl 10:00 )New routing equipment activated and local routing to FTS and some dCache pools changed
  • The 28/04 (from 10:00 until 11:30): Upgrade of dCache to CCRC baseline

  • The 22/04 from (10:30) until (10:45): the hosts management intervention is DELETED
  • From 26/04 (9:00) until 28/04 (12:00): Hardware maintenance affecting various storage devices at INFN-T1
  • The 05/03 (from 11:00 to 13:00): dCache upgrade to 1.8.0-13
  • From 15/03 (9:00) untl 20/03 (8:00): Yearly electrical equipment maintenance in the machine room

  • From 16/04 (19:00) until 17/04 (7:00): Scheduled FTS Database Maintenance
  • The 23/04 (from 19:00 until 20:00): srm switch-over to faster hardware (sync the database and reboot into new srm node)
  • The 03/04 (from 15:50 until 17:50): PIC FTS stopped due to overload of the DB. Unscheduled intervention to fix it
  • The 17/04 (from 15:00 until 17:00): FTS sched downtime due to migration to new h/w of the Oracle backend
  • From 18/04 (14:52) until 18/05 (18:00): Downtime because of hardware failure of ce-test
  • The 19/04 (from 01:00 until 04:00): network maintenance tasks by GEANT and Maintenance by Interoute on the circuit between Aguilana and Geneva.
  • The 22/04 (from 00:00 until 06:00): network maintenance tasks by GEANT
  • From 29/4 (16:00) until 03/05 (00:00): Yearly electrical building maintenance (Draining of the CE queues)
  • From 29/04 (16:32) until 30/04 (12:00): Unexpected LHC-OPN network outage for the CERN-PIC link
  • From 29/04 (20:00) until 30/04 (00:00): Network intervention which may cause intermittent glitches
  • From 30/04 (12:00) until 03/05 (00:00): Yearly electrical building maintenance

  • The 01/04 (from 7:00 to 23:00): Complete downtime for hardware maintenance and basic OS and firmware updates
  • From the 01/04 (23:00) until 02/04 (12:00): LFC migration attempt (MySQL to Oracle) still ongoing
  • The 02/04 (from 12:00 until 15:00): LFC migration attempt (MySQL to Oracle) failed
  • The 16/04 (from 9:00 until 12:00): dCache update to patch level 14
  • The 18/04 (from 7:30 until 22:00): Due to the LFC DB migration from MySQL to Oracle, GridKa/FZK's LFC service will be down on Friday 18/04/2008 from 5:30 UTC to 20:00 UTC (LHCb LFC will not be affected by this).
  • The 18/04 (from 14:52 until 16:50): Due to network problems e.g. with DNS the services at GridKa are shortened at the moment
  • The 25/04 (from 11:00 until 15:00): dcache update to fix broken lcg-utils srm interaction
  • From 03/04 (10:30) until 04/04 (19:00): Upgrade to SL4, because of failover the VOMS service itself should stay available, but the VOMS User Interface won't be available during downtime.
  • The 07/04 (from 19:00 until 22:00): network configuration maintenance
  • The 09/04 (from 9:00 until 15:00): prolonged downtime upgrade dCache
  • The10/04 (from 8:20 until 10:00): unscheduled maintenance dCache
  • The 10/04 (from 13:10 until 20:30): dCache problems again
  • The 14/04 (8:00 until 18:00): Upgrade to SL4 affecting the ATLAS VOBOX and from 8:00 to 10:00 affecting the LFC node
  • The 28/04 (from 10:35 until 19:00): user administration software migration maintenance
  • The 30/04 (from 06:06 untl 10:40): outage due to problems with user administration

*NDGF:* All CPU in place. Will ramp up disk and tape following demand. Confident not to run out of 2008 pledges

*ASGC*: Current CPU capacity = 2730 KSi2K, 2008/9 pledge = 340 KSi2K. Current disk capacity = 1190 TB, 2008/9 pledge = 1500 TB. Current tape media capacity = 800 TB, pledge = 1300 TB.
4th April: ASGC AMS-CERN upgraded from 5 to 10G

2008 CPU pledged resources delivery has to be returned but hopefully ready for May. Tape pledge OK (2470 TB) and 50% of disk pledged

: 2007 pledges resources available between middle and end of May and full 2008 pledges in November

: Reaching 80% of 2008 pledge in terms of cpu and disk by end of April. Ramp up tape capacity steadly to reach full pledge by October

Full pledges expected 1st of April. Storage ramp up - 180 more servers. Move LFC to Oracle. Network reconfiguration (in May)

FZK: no changes from the report of February in terms of pledges




  • CMSSW 2.0 release (production start-up MC samples). Two weeks of testing
  • ICSA08 proceeding: sample generation
    • Planning and discussing with commissionng/physics stakeholders
    • Pre-production started
  • CCRC`08:
    • Transfers: Cessy--> CERN: first successful transfers already during the pre-challenge in Feb. Plan now to demostrate target and sustainability
    • Processing at CERN: Reconstuction at T0 and tape writting succesful in Feb. Test of application of large consitions payloads at T0. Check the Castor setup at CERN. Development of new components
  • Continuing T0-T1, T1-T1, T1-T2 and T2-T1 transfers
  • Analysis: Phase 0 preparation previous to CCRC`08 May
ALICE: FDR phases 1+2+3
  • Finishing the 2nd comissioning exercie by the 10th
  • Week of 21st: Raw data distribution (low scale full chain ests). Reconstruction (Data copied to WN; consitions DB access). Stripping (V low level tests started). Analysis (Testing Ganga/DIRAC3)
  • Week of 28th: Stripping (castor sites dCache sites). DST dist (part of stripping tests). Analysis (Testing Ganga/DIRAC3)

ATLAS: Production of FDR2
  • L1Calo+Calo run: (28th-29th)
  • Thoughtput Tests: (31st-4th) and (28th-30th)
  • Functional Tests: T1-T1 RAL and NDGF (7th-11th) and BNL and FZK (21st-25th)
  • FDR1 Re-processing wit M5 dataset at all T1 sites
  • File merging required at T0
  • Good implementation of SRM2.2 at T2
  • Next ATLAS Jamboree (21th-25th)
Castor2: Version 2.1.7 ready the 1st week of the month. Deployment before CCRC'08 phase II

StoRM 1.4.0 released during the 1st week of April

AMGA -Oracle: probably in production

WLCG SSWG: first proposal for an Addendum to the WLCG SRM v2.2 Usage Agreement circulated the 9th of April.
Finalize this document with the experiments by 21-25th of this month for WLCG MB approval.
Decision on baseline versions for May
ISGC & A-P Tier2 workshop, Taipei

WLCG Collaboration workshop 21-25 Apr
May 08
CCRC'08 phase II (from 5th to 30th)
  • From 08/05 (10:00) until 15/05 (14:00): Draining for change to 64 bit submission. (32 bit resources are being decomissioned.)
  • From 20/05 until 22/05: relocation of physical hardware to other datacentre
  • The 02/05 (from 04/05 until 05/05): Castor 2.1.6-12-1 will be patched with 2.1.6-12-2 hotfix and SRM will be upgraded to SRM 2.1.3-21
  • From 01/05 (20:20) until 02/05 (18:00): Hardware an middleware upgrade
  • The 02/05 (from 16:16 until 17:05): Dcache outage due to a power failure [FIXED]

  • The 01/05 (from 08:30 until 09:30): Possible short interruption iin connectivity during network switch configuration.

  • The 09/05 (from 11:00 until 14:00): Patch upgrade of FTS
  • The 12/05 (from 11:00 untl 11:30): Short outage on NDGF-T1 SRM service due to dCache upgrade
  • From 14/05 (09:00) until 15:05 (09:19): Network maintenance at - some pools and data unavailable
  • From the 01/05 (15:00) until 02/05 (00:00): dCache upgrade
  • From 14/05 (22:30) until 15/05 (01:00): Network upgrade at our RREN (Anella Científica) that could cause a network outage (at both Inetrnet and OPN connections)
  • The 15/05 (from 02:00 until 08:00): GEANT's scheduled maintenance at Interoute's dark fibre
T0 (CERN) + CAF: 15850 KSi2K CPU (full pledge) currently available. 2800 TB on disk available (out of the 5549 TB pledge). Remainder earliest at beginning of June. 12050 TB tape (full pledge) currently available

*PIC:* 1200 KSi2K CPU rising to 2008/9 pledge of 1500 by beginning of June. 600 TB disk rising to 2008/9 pledge of 970 TB by beginning of June. 540 TB of tape rising to 740 TB by 1 July and full 2008/9 pledge of 960 TB by 1 October.

NDGF: 2172 KSi2K CPU (full 2008/9 pledge). 385 TB disk rising to 1079 TB by September. 273 TB tape rising to 930 TB by September

ASGC: CPU expansion proposal was delayed and has been merged with the disk expansion planning. Paper work ready at the end of May. In terms of Tape, new tape procurement paperwork ready in May (beginning of the month) and another month for delivery to reach MoU level.
Discuss with experiments support and Castor team how to best configure Castor to meet CCRc objectives. Add additonal RAC nodes for Castor DB. Investigate DNS HA solutions grid services. Improve number and quality of recovery procedures for on-call for 24x7 operations.

TRIUMF: 910 KSi2K of CPU, 500 TB disk, 390 TB of tape

US-ATLAS: 2400 KSi2K CPU and order for another 3400 that should be available for June and exceed the 2008/09 pledge of 4844 KSi2K. 1100 TB disk and expecting delivery of 1200 TB early May to be available by end May. Remaining 1000 TB to met the 2008/09 pledge of 2136 TB to be ordered for October delivery when additional space will become available. 1800 TB of tape excedding the 2008/9 pledge of 1715 TB

US-CMS: installed CPU will be 3000 KSi2K and increase of 1300 KSi2K to reach 2008/9 pledge expected end of May. 1700 TB disk available and remaining 300 TB to reach 2008/9 pledge ready for May. 1600 TB tape available with another 1000 TB on order. will order up to pledge of 4700 TB as needed

RAL: Increase to full 2008/9 CPU pledge of 3139 KSi2K. Full 2008/9 disk pledge of 1920 TB delived and in place. Enough CCRC requirements and achieve full pledge by end May. Tape media already at 2008/9 pledge level of 2070 TB

CC-IN2P3: Meet the 2008/9 pledge resources of 4240 KSi2K for 5 May start. Currently have 1500 TB disk installed or 63% of 2008/9 pledge so need to acquire remaining 880 TB to meet the pledge. Current planned availability is September but new purchasing rules may lead to delay

NL-T1: 774 KSi2K CPU on 5th May to 1677 KSi2K CPU end May and 2008/9 pledge of 4382 in November. 253 TB disk on 5 May rising to 1059 TB end May and 2510 TB in November. 200 TB of tape media on 5 May rising to 719 TB (as needed on short notice) end May and 1813 TB in November

FZK: 2293 TB of disk, 2449 TB of tape, 4522 KSi2K of CPU.
Storage improvements: dCache upgraded to 1.8.0-14, added more pools, improved TSS interface to TSM tape system.
LFC: moved backend from MySQL to Oracle RAC (3 nodes), new more powerful frontend
Additional CEs and BDIIs

INFN: 1300 KSi2K of CPU with together 1700 during June. 570 TB of disk with remaining 730 TB in June/July (Adding 70 TB now to help with May run). 1000 TB of tape with remaining 500 TB in May



ALICE: FDR phases 1+2+3
  • AliEn version 2-15 in place for the 3rd commissioning exercise (beginning the 18th)
  • Before the startup of the RUN III, reconstruction of Feb/March RAW @T1 with new version of AliRoot
  • MC production:
    • Round 1: 5th-15th
    • Round 2: from 16th
  • Decommissioning SL3+gLite3.1 VOBOXEa at all sites. sites are forced to migrate and validate the VOBOXEs before the 18th
  • Major tasks: Registration of data to Castor and Grid. Replication to T1. Consitions data gathering and publication on Grid. Quasi-online reconstruction, pass 1 at To and pass 2 at T1 (special emphasis). Quality control. ESD replication to CAF/T2
  • Expecting to reach above 80% of monthly volume during the real data taking appoach (70% achieved in Feb.) and 100% p+p rate in the subsequent months to LHC start
  • MSS operation: RAW data chuncks of 10GB (tested in April by the DAQ group). Pre-staging of data sets prior processing and replication (already in production at T0)
  • T2 sites: continuing the deployment of xrootd enabled storage. analysis of MC and ESD from RAW

  • 1st Week: 3 days TRT + 3 days SCT. ID combined running including Pixel DAQ. Functional tests using data generator (First week Monday through Thrusday)
  • 2nd Week: Calo + L1Calo + HLT. T1-T1 tests for all sites (repetition). Second week Tuesday through Sunday
  • 3rd Week: Throughput Tests using data generator (Monday through Thursday). Muon + LiMu + HLT
  • 4th Week: ID+DAQ+HLT. Beam pipe closure. Contingency (monday through Sunday). Remove all test data.
  • Detector commissioning with cosmic Rays: Each week Thrusday through Sunday
  • Reprocessing M5 data: Each week Tuesday through Sunday
  • Clean-up: Each Monday

  • initial iCSA08
    • automated PhEDEx transfer subcriptions to CAF
    • Autmated CAF job triggering
    • CAF workflow integration in CRAB server
    • Systematic monitoring of cmscaf queue and castor pool
  • CMSSW 2.1 release (all basic sw components ready for LHC, new T0 prod tools)
  • CCRC`08:
    • T0-->T1 transfers: check with ATLAS for the best superposition. Tests aim to run during the full challenge
    • T1-->T1 transfers: check with ATLAS for superposition. Test running the whole month adding 1 T1 at a time in a first 2 weeks and let them run each 2 weeks
    • T1--> regional T2 transfers: each region decides on 1 week + another repetition week (if needed). week 4 stands as default repetititon week for all
    • T1 --> non regional T2: all for weeks
    • T1 --> T1 transfers: each region decides on 1 week + another repetition week (if needed). week 4 stands as default repetititon week for all
    • Analysis: Phase 1: controlled job submissions. Phase 2: chaotic job submission. Phase 3: stop-watch
  • a

LHCb: FDR2 = 2 x FDR1

  • Major activities: Maintain equivalent of 1 month data taking assuming a 50% machine cycle efficiency. Run fake analysis activity in parallel to production tape analysis.
  • Testing of Ganga/DIRAC3
  • Conditions DB at T0 and T1: No plans to test replication of consitions DB during CCRC`08
  • LFC: Use of static information replicated using "streaming" from CERN to T1
  • Stripping on rDST files
    • 17.5 DST files produced during the process corresponding to 16TB of data
  • SRM and Dta Access
    • Test copying data locally in May
  • Transfers Pit-> T0 (using rfcp)
    • 84 TB of data from pit to T0 (~52k files)
    • Same 52k RAW files from CERN to be distributed over T1 centres
    • 14% of rDST production at CERN, remaining 86% at T1s
  • Tranfers T0-T1 (FTS)
  • Transfers T1-T1 (FTS)
    • operation not tested during Feb. phase
Storage Baseline Versions
*DPM* version1.6.7-4
CASTOR: SRM v 1.3-20, backend 2.1.6-12
dCache: version 1.8.0-15
StoRM: version 1.3.20 available since the end of March 08

CCRC May recommended versions
LCG CE --> Patch #1752
FTS T0 (T1) --> Patch #1740 (#1671)
gFAL/lcg_util --> Patch #1738
DPM 1.6.7-4 --> Patch #1706
Many holidays (~1 per week)
First proton beams in LHC
Jun 08 qqqqqqq T0 (CERN) + CAF: 2749 TB disk pledge

*PIC*: Remaining CPU resources ready (up to 1500 pledge KSi2K). Remaining disk space (up to 970 TB)

RAL: expecting 1PB of disk

US-ATLAS: 3400 KSi2K should be available by 6 June

ASGC: Delivery of pledged resources (CPU and tape) to reach MoU level. upgrade HK and JP network connectivity to 2.5 Gbps in June improving connectivity to China and Korea

INFN: 730 TB of disk


  • FDR2 Re-processing
  • Full dresss rehearsal: June 2nd through 10th
  • July: ATLAS running??
LHCb: Until July migrating to Dirac-3

  • Cosmic run at 4T
  • CMS week in Cyprus
  • Distributed data transfers: proposal of running a deletion exercise in "non-custodial" data from 10-15 days (under discussion)
Planning for October 08 in terms of SRMv.1:
Change LFC file catalogue entries:

ATLAS wil change catalogue entries per cloud once SRM v1 decomissioned (Done at CERN, inprogress at other T1) .
LHCb needs multiple intervention in coordination with sites

Change experiment`s catalogues entries:
CMS Trivial File catalogue: SRM v2 entries in places for T1 and T2. SRM v1 enteies need to be removed
ATLAS: October 08

Remove SRM v1.1 entris from BDII and trigger refresh of FTS caches:
To be done while sites migrate in October 08

Change FTS configuration so that SRM v2 is the default:
Can ve done while migration takes place: October 08

lcg-utils/gfal with default SRM v2 whenever ready: to be scheduled.
CCRC'08 post-mortem workshop (Jun 13-14)

-- PatriciaMendezLorenzo - 24 Jan 2008

Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r25 - 2008-05-13 - PatriciaMendezLorenzo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback