Week of 050718

Open Actions from last week:
  • LEMON sensors for
    • FTS - Gavin WAITING ON FIO
  • Gavin: install last FTS version and test
  • Move all users over to castor2 - James
  • Hourly tests (like SFT) - Simone FIST VERSION DONE
  • Jan/Sophie/Laurence : How to run "central" LEMON sensors LEAVE
  • All: Go through the 2nd level support
  • Ben: check oplapro79 to see if system killed SRM process NOTHING FOUND
  • Olof: castor2 - Try with one node moving DONE
  • bdflush parameters. lxshare220d.
  • James/Ben: check oplapro80 for access for ben. DONE

On shift: Jean-Philippe/Jamie

Monday:

Log: Many interventions.
  • FTS005 died NO_CONTACT - came back up
  • /tmp 92% on sc3-fts - Gavin - FTS dumping core
  • castorgridsc - 1 LSF plugin problem.
    • poor throughput in LSF - restarting LSF helps.

New Actions:

  • Gavin: Investigate core dumping on glite-url-copy
  • Check again the operator levels.

Discussion:

  • hourly tests - like to try with new version of lcg_utils
  • Olof: Wait until this afternoon to move castor1 nodes into WAN pool

Tuesday:

Log:
  • RAS

Actions:

  • Testing of L2 procedures - JPB for FTS, JDS for LFC
  • All users have now been moved to CASTOR2. Castor1 pools will be moved over today by Vlado
  • LEMON sensors? no news
  • New FTS version: being installed / tested
  • Hourly tests - still need to be deployed
  • Daily publishing of FTS plots can now be done properly
  • BD flushing: to be done
  • Core dumping on glite URL copy: Gav
  • Operator levels: to be checked

New actions:

  • Move of castor1 nodes into wan pool
  • GDB - monitoring talk? One of Gridview team

AOB:

  • castor lsf monitoring pages now available - will be added to monitoring pages
  • GDB tomorrow is open - please attend!

Wednesday

Log:
  • Rate down - not clear why. Accidental temporary removal of PhEDEx pools (affected DESY). Misconfiguration. Understood.
  • Problems with Spanish - cert problem. Refreshed CRLs - back now.
  • CRL proxy now setup (Thorsten). Recipe given to Vlado. CRLs on all nodes 'soon'.
  • CASTOR2 migration - diskservers now in wan pool. Problem now is how to populate them. Chosen on the basis of capability and as they are low compared to IA64 will rarely be selected. Could move to seperate service class temp. and then back. Olof will do this am.

Actions:

  • LEMON sensors. Still pending.
  • Testing of latest FTS - Paolo.
  • LFC smoke tests being checked. Partly a learning process and partly debugging (JDS)...
  • Core dumps on gLite URL copy. Mail from gav classifying dumps. Majority inside globus - Ben will investigate - most transfers to Triumf. Maarten can also help... Core filesize set to 1KB - this results on no dump at all... gav will setup cron job to 'archive' core dumps.
  • operator levels - no calls on fts production box at night. Need to up level with FIO.
  • monitoring talk by member of gridview team.

New actions:

  • fts alarms - james -> vlado

AOB:

  • problem with oracle backups for castor ns - requires 5 minutes down time. No backup currently being done! can live like this until end July unless we schedule some interventions.
  • DB statistics being gathered without restart - action removed.

Thursday

Log:

Actions:

Friday

Log:

Actions:

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2005-07-20 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback