--
JamieShiers - 10 Nov 2006
Week of 061113
Open Actions from last week: Spikiness of ALICE transfers still to be understood. Latest news: no progress.
A maintenance intervention on the CERN firewall will run on Tuesday the 14th of November 2006 between 6:00AM and 7:30AM CET.
Impact:
Some short instability in off-site network connectivity may be experienced during that time period.
T0-T1 traffic via the LHCOPN won't be affected.
c2public s/w upgrade expected - likely Tuesday morning. Latest news: may slip.
Chair: H.Renshall
Gmod: M.Dimou
Smod: V.Lefebure
Monday:
Log: Nothing
New Actions:
Discussion: ALICE transfers stopped about 19.00 CET Sunday but FTS team report no hanging transfers. New castor-lib-compat (backwards compatible libshift) rollout today.
Tuesday:
Log: ALICE server proxy expired Sunday evening (hence no transfers) and was overloaded 16.30 Monday for an hour.
New Actions: Understand the overnight (Monday/Tuesday) transfer failures.
Discussion: Multi-VO transfer tests to
IN2P3 now scheduled for week of 27 November. A new VO of EELA is to be created (TK). All transfers except to SARA stopped from about 22.00 GMT till 06.00 with reported timeouts in SRM get. Overnight the ALICE
LSF scheduler had a problem.
Wednesday
Log: Stable overnight high transfer rates achieved for ALICE to
IN2P3 (100 MB/s) and CMS to FZK (350 MB/s).
New Actions: PB asked ALICE to check their
MonaLisa monitoring of their file transfer queues. It was reporting 180 queued while FTS was only showing 90.
Discussion: Monday night transfer problems were due to disk failure on the Oracle server hosting castorpublic. This caused dteam SRM requests to hang and eventually block all SRM request slots.
Thursday
Log:
New Actions: castoratlas upgrade to be done this morning.
Discussion: Scheduled castorlhcb upgrade already completed. ML will try TB reboot procedure for HALinux on a myproxy server this afternoon.
Friday
Log:
New Actions:
Discussion: HALinux reboot was successful and will be documented in Technical factors Twiki. VOALICE01 and 02 were reinstalled with RAID-1 instead of JBOD. There are up to 800 nodes now in the
GSSDATLAS TDAQ cluster. More SLC4 batch capacity will be added. Disks will be added to atlprod today and the wan pools will be reduced.