Week of 050718
Open Actions from last week:
- LEMON sensors for
- FTS - Gavin
WAITING ON FIO
- Gavin: install last FTS version and test
- Move all users over to castor2 - James
- Hourly tests (like SFT) - Simone FIST VERSION DONE
- Jan/Sophie/Laurence : How to run "central" LEMON sensors
LEAVE
- All: Go through the 2nd level support
- Ben: check oplapro79 to see if system killed SRM process
NOTHING FOUND
- Olof: castor2 - Try with one node moving
DONE
- bdflush parameters. lxshare220d.
- James/Ben: check oplapro80 for access for ben.
DONE
On shift: Jean-Philippe/Jamie
Monday:
Log: Many interventions.
- FTS005 died NO_CONTACT - came back up
- /tmp 92% on sc3-fts - Gavin - FTS dumping core
- castorgridsc - 1 LSF plugin problem.
- poor throughput in LSF - restarting LSF helps.
New Actions:
- Gavin: Investigate core dumping on glite-url-copy
- Check again the operator levels.
Discussion:
- hourly tests - like to try with new version of lcg_utils
- Olof: Wait until this afternoon to move castor1 nodes into WAN pool
Tuesday:
Log:
Actions:
- Testing of L2 procedures - JPB for FTS, JDS for LFC
- All users have now been moved to CASTOR2. Castor1 pools will be moved over today by Vlado
- LEMON sensors? no news
- New FTS version: being installed / tested
- Hourly tests - still need to be deployed
- Daily publishing of FTS plots can now be done properly
- BD flushing: to be done
- Core dumping on glite URL copy: Gav
- Operator levels: to be checked
New actions:
- Move of castor1 nodes into wan pool
- GDB - monitoring talk? One of Gridview team
AOB:
- castor lsf monitoring pages now available - will be added to monitoring pages
- GDB tomorrow is open - please attend!
Wednesday
Log:
- Rate down - not clear why. Accidental temporary removal of PhEDEx pools (affected DESY). Misconfiguration. Understood.
- Problems with Spanish - cert problem. Refreshed CRLs - back now.
- CRL proxy now setup (Thorsten). Recipe given to Vlado. CRLs on all nodes 'soon'.
- CASTOR2 migration - diskservers now in wan pool. Problem now is how to populate them. Chosen on the basis of capability and as they are low compared to IA64 will rarely be selected. Could move to seperate service class temp. and then back. Olof will do this am.
Actions:
- LEMON sensors. Still pending.
- Testing of latest FTS - Paolo.
- LFC smoke tests being checked. Partly a learning process and partly debugging (JDS)...
- Core dumps on gLite URL copy. Mail from gav classifying dumps. Majority inside globus - Ben will investigate - most transfers to Triumf. Maarten can also help... Core filesize set to 1KB - this results on no dump at all... gav will setup cron job to 'archive' core dumps.
- operator levels - no calls on fts production box at night. Need to up level with FIO.
- monitoring talk by member of gridview team.
New actions:
- fts alarms - james -> vlado
AOB:
- problem with oracle backups for castor ns - requires 5 minutes down time. No backup currently being done! can live like this until end July unless we schedule some interventions.
- DB statistics being gathered without restart - action removed.
Thursday
Log:
Actions:
Friday
Log:
Actions: