--
JamieShiers - 26 Sep 2005
Week of 050926
Open Actions from last week:
- check possiblility of using LCG Quattor WG componets for LFC/DPM/... (Jan/Vlado/Sophie)
TO TEST
- FTS Quick Fixes for BDII
DONE
- Need to restart castor2 db to apply patch - schedule for gap between alice and cms (Vlado) SCHEDULED, WAITING 4th OCT
On Call: Patricia + David
Monday:
Log:
- Castor pb on saturday 3AM. High load again, but trigger & garbage collector off this time. -> to be investigated (Olof)
New Actions:
- FTS Quick Fixes RPMs done. Installation to do on tuesday morning (Gavin + Patricia)
DONE on lxshare026d
- FTS server not visible from the VO box -> deploy another one (Simone + Gavin).
- Test the new myproxy server installed for FTS + inform the users (Simone + Gavin).
DONE
Discussion:
- Olof : SRM problems at CNAF (reported by Daniele Bonacorsi) investigated by Olof : problems on the target
- Olof : David Cameron problems being investigated as well.
- Jan : 30 TB put on friday on cmsprod -> Everybody : no problem seen so far.
- Simone : new myproxy server installed for FTS
Tuesday:
Log: Nothing
New Actions:
IMPORTANT : Upgrade tomorrow morning :
- CASTOR upgrade to fix last week's problems. Already announced by email (Olof).
- at the same time, database change, in order to stop the number of processes to grow and crash the instance.
Discussion:
Wednesday
Log: Nothing
Actions:
- Network intervention to announce to the SC mailing lists -> to follow up (Jan/Sophie)
NOT SC3 AREA ROUTER - NO PROBLEM
- Check problem with grid-mapfile for unosat on LFC boxes (Sophie /Patricia)
TEMPORARY FIX - WAITING FOR MARIA
Discussion:
- Olof : upgrade exercised yesterday, tests performed. Went fine.
- Ben : will take advantage of the intervention to do add an SRM fix.
- Olof/Jan : another intervention foreseen : switch/router intervention by network group.
Thursday
Log: Nothing
Actions:
- Check why the router intervention didn't get announced to SC mailing lists (Sophie/Jan)
NOT SC3 AREA ROUTER - NO PROBLEM
- intervention on the Pilot FTS (QF) -> today at 10:00 (Patricia/Gavin)
DONE
- Quick Fixes to solve the FTS NDGF (Gavin)
- check access to lxb2088 for Simone
DONE
+ change FTS configuration to use the new myproxy server (Gavin)
- LFC unosat : doesn't work for Patricia anymore -> to check (Sophie/Patricia)
TEMPORARY FIX - WAITING FOR MARIA
Discussion:
- interventions : went fine (db + castor + router). Delay because of Quattor upgrading the RPMs. Pb with 64 bits machines - fixed as well during the intervention.
- NDGF FTS problem : too many pending jobs - query checking for pending jobs got bigger and bigger.
Friday
Log: Nothing
Actions:
- Monday : intervention myproxy FTS + Announce it today (Gavin)
- FTS : apply TODAY less privileges on the DB accounts. No stop of the database required (Gavin).
- Olof : suspecting that Fermilab is still using castorgrid instead of castorgridsc -> to check (Olof).
Discussion:
- Jamie : useful Operations Workshop in RAL. More integration with SFT (Sites Functional Tests). LCG/EGEE/OSG integration on the way. VO boxes.
- Important : Vlado & Jan are not at CERN next week. Olof is their backup.
- CASTOR2.1 development in progress : meeting with RAL, progress satisfactory.
- FTS visible from outside : installed, not yet configured, not yet tested.
- Eric : nothing to report. No growing processes (in CASTOR DB) noticed by the new monitoring in place so far.
--
SophieLemaitre - 30 Sep 2005