Week of 050718

Open Actions from last week:
  • LEMON sensors for
    • FTS - Gavin WAITING ON FIO
  • Gavin: install last FTS version and test
  • Move all users over to castor2 - James
  • Hourly tests (like SFT) - Simone FIST VERSION DONE
  • Jan/Sophie/Laurence : How to run "central" LEMON sensors LEAVE
  • All: Go through the 2nd level support
  • Ben: check oplapro79 to see if system killed SRM process NOTHING FOUND
  • Olof: castor2 - Try with one node moving DONE
  • bdflush parameters. lxshare220d.
  • James/Ben: check oplapro80 for access for ben. DONE

On shift: Jean-Philippe/Jamie


Log: Many interventions.
  • FTS005 died NO_CONTACT - came back up
  • /tmp 92% on sc3-fts - Gavin - FTS dumping core
  • castorgridsc - 1 LSF plugin problem.
    • poor throughput in LSF - restarting LSF helps.

New Actions:

  • Gavin: Investigate core dumping on glite-url-copy
  • Check again the operator levels.


  • hourly tests - like to try with new version of lcg_utils
  • Olof: Wait until this afternoon to move castor1 nodes into WAN pool


  • RAS


  • Testing of L2 procedures - JPB for FTS, JDS for LFC
  • All users have now been moved to CASTOR2. Castor1 pools will be moved over today by Vlado
  • LEMON sensors? no news
  • New FTS version: being installed / tested
  • Hourly tests - still need to be deployed
  • Daily publishing of FTS plots can now be done properly
  • BD flushing: to be done
  • Core dumping on glite URL copy: Gav
  • Operator levels: to be checked

New actions:

  • Move of castor1 nodes into wan pool
  • GDB - monitoring talk? One of Gridview team


  • castor lsf monitoring pages now available - will be added to monitoring pages
  • GDB tomorrow is open - please attend!


  • Rate down - not clear why. Accidental temporary removal of PhEDEx pools (affected DESY). Misconfiguration. Understood.
  • Problems with Spanish - cert problem. Refreshed CRLs - back now.
  • CRL proxy now setup (Thorsten). Recipe given to Vlado. CRLs on all nodes 'soon'.
  • CASTOR2 migration - diskservers now in wan pool. Problem now is how to populate them. Chosen on the basis of capability and as they are low compared to IA64 will rarely be selected. Could move to seperate service class temp. and then back. Olof will do this am.


  • LEMON sensors. Still pending.
  • Testing of latest FTS - Paolo.
  • LFC smoke tests being checked. Partly a learning process and partly debugging (JDS)...
  • Core dumps on gLite URL copy. Mail from gav classifying dumps. Majority inside globus - Ben will investigate - most transfers to Triumf. Maarten can also help... Core filesize set to 1KB - this results on no dump at all... gav will setup cron job to 'archive' core dumps.
  • operator levels - no calls on fts production box at night. Need to up level with FIO.
  • monitoring talk by member of gridview team.

New actions:

  • fts alarms - james -> vlado


  • problem with oracle backups for castor ns - requires 5 minutes down time. No backup currently being done! can live like this until end July unless we schedule some interventions.
  • DB statistics being gathered without restart - action removed.


  • FTS: 17:00 daemon died - reasons unknown. Nothing in logs. No core files. watchdog script had a problem due to lockfile(?) Fixed (Paolo) - QF of config service: Gav -> Alberto test. bug open, moved to critical.
  • gridview problem - archiver went down 22:00 - 05:00. Running 2 archives (prod + backup). Will copy data over.
  • Traffic went down at 05:00 to zero - no alarms. Came back after one hour. Gav - check FTS logs. Around 360MB/s since. SARA down. Tried rebooting dCache several times. Complete reinstall of dCache from scratch(!) incl. upgrade to latest version.
  • Mail from Ron - organise a dCache workshop. Suggest to Michael.
  • 1000 files copied into 9 extra ia32 nodes.
  • Ben has tagged version of SRM that fixes many of current bugs. Test now - deploy eventually Monday?
  • Deploy new version of LSF plug-in today (as patch). Should avoid regular restart of LSF.


  • Lemon sensor for FTS - Gav to follow with FIO.
  • Mail from James announcing tape tests. Purge all outstanding jobs. Special tape details for some sites. Ask for others. Probably turn 3-4 on this morning.
  • Testing of new inter-VO FTS on-going
  • FTS/LFC smoke-tests ongoing.
  • SC3 service phase resource scheduling needed for PEB Tuesday.




Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2005-07-21 - JamieShiers
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback