--
HarryRenshall - 29 Feb 2008
Week of 080303
Open Actions from last week:
Monday:
See the
weekly joint operations meeting minutes
Additional Material:
Tuesday:
elog review:
Experiment report(s):
Core services (CERN) report:
DB services (CERN) report:
Monitoring / dashboard report:
Release update:
Questions/comments from sites/experiments:
AOB: No meeting due to pre-gdb
Wednesday
elog review:
Experiment report(s):
Core services (CERN) report:
DB services (CERN) report:
Monitoring / dashboard report:
Release update:
Questions/comments from sites/experiments:
AOB: No meeting due to gdb.
Thursday
elog review: 3 new, all from LHCb. Nikhef job Q limit too short (notified), LFC failures on 4 March with no broadcast (was sent to a restricted set) and PIC FQAN mapping not looking at Voms extensions (fixed).
Experiment report(s):
ALICE: Alice is transferring to 5 sites, leaving only
RAL that as you know it is not setup for Alice and is in the plan for March. Things are therefore going fine and this is the message right now.
ATLAS: Tomorrow M6 cosmics starts, today is cleanup day. Disks at CERN are 100% full.
CMS: They are following up on the recent
LSF failures and will be having a meeting with B.Panzer.
LHCb: Started setting up for the May phase of the ccrc notably the workflow for the stripping jobs. They are negotiating longer (in cpu) job queues at their T1 and for remaining LFC mirrors (at FZK, PIC and SARA) to be ready for May.
Core services (CERN) report:
DB services (CERN) report: Oracle critical patch upgrades (transparent to users) have been done at BNL and CNAF, TRIUMF will be today and FZK on 15 March.
Monitoring / dashboard report:
Release update:
Questions/comments from sites/experiments:
AOB:
Friday
elog review: No new items since Wednesday
Experiment report(s):
ATLAS (SC): In the last few days a new release of the ATLAS site services, to be used during M6, has been deployed though testing is not complete (the last test had a bug). Export of M6 data will start soon with a total of 50 TB to be exported to the Tier 1 until Monday (a rate of about 200 MB/sec). Another activity has been the cleaning up of ATLAS disk pools at CERN, also in preparation for M6.
CMS (AS): Some site srm services are not showing up on the cms dashboard so we must check how sites are publishing them.
Core services (CERN) report:
DB services (CERN) report:
The intervention on Tuesday included a planned downtime for
all applications for 30 minutes, as mentioned on the emails specified below.
At same time there was some planned upgrade of LFC application, which was reported by Ignacio on Wednesday morning meeting. It seems this intervention downgraded by mistake the Oracle Client and LFC was not working for some time after the Database intervention. It was discussed this should have been put to the
StatusBoard and I've checked and is there now (I believe it was added by Ignacio):
http://it-support-servicestatus.web.cern.ch/it-support-servicestatus/ScheduledInterventionsArchive/080304-LCGR.htm
Indeed, Miguel announced the DB intervention, but we overlooked his mail and missed the crucial point that the LFC would be affected, and thus didn't schedule any LFC downtime. Apologies about that.
I've just talked to Miguel, and things are clear for next time (for both sides).
The LFCs reconnected automatically to the database once it was up again.
But they went down again because of another problem indirectly caused by the Monthly Scheduled Linux Upgrade.
This is now fixed, and won't happen again under the same circumstances.
Monitoring / dashboard report:
Release update:
Questions/comments from sites/experiments: Derek Ross (
RAL) asked if
RAL still needed to maintain local RBs and WMS. For ATLAS SC said they do not use any outside of CERN where they only use WMS. We need to clarify this in general.
AOB: