Week of 060828

Open Actions from last week:

Chair: James

GMOD: Piotr

SMOD: Ulrich

Monday:

Log: Castor problems over weekend - gridmapfile mapping for maarten changed during sunday to dteamsgm; atals srm-durable filled up -impacted central SRM and all GSSDATLAS services. Problems with VOMS over the weekend- one RAC node went down, and problems when it started up again - ~6hours of problems sunday morning

New Actions:

  • CMS want the DB split for Central LFC to happen on wendesday - OK
  • Investiage problem with VOMS (Phys DB) - DONE
  • Investigate problem with Gridmap generation (James/Jan/Miguel) - DONE
  • CMS see bursts of errors (still continuing monday morning). - Michael to send timestamps, Miguel to investigate.
Discussion:

Tuesday:

Log: Nothing

New Actions:

Discussion:

  • VOMS problem due to not using long JDBC strings and OCI - this prevented the app to failover while one node of the RAC was down. Procedure passed to the service admins
  • Problem on gridmap generation was old bug which is fixed in latest version. Details passed to service manager

Wednesday

Log: FTS Problems overnight. VOMS102 - TOMCAT_WRONG

Actions:

  • FTS procedure to be checked since it didn't work properly overnight when run by operator (Gavin) - DONE
  • VOMS102 - mail was sent to one person, not list. (Harry/James) DONE
  • VOMS102 - why no automated procedure for restart ? (Harry)
  • New version lcg-gridftp-mon with cronjob (Maarten) - PENDING till next week
  • LHCb complain their traffic is not seen in GridView (Harry/Phool) - DONE

Discussion:

  • FTS problems caused by alice overloading FTS WS. Also some problems with stuck agents with DB locks - need to investigate. Dirk will see can they give us alarmas on rowlock contention
  • Michael reports FNAL-CERN is ok again - 21.7TB overnight
  • Phone quality still bad for people dialing in ;(

Thursday

Log: Intervention went ok on LFC. CMS happy

Actions:

  • Monb001 - needs reboot- open Remedy ticket (James) - DONE
  • Some glite rpms will be upgraded today on grid services (mostly R-GMA, edg-mkgridmap) (Jan) - SLC4 DONE, SLC3 on monday
  • Monitoring of essential daemons on RB needed (GMOD, James)

  • Alice have asked to replace their VOBOX (Harry)
  • CMS have asked for 2 'rb102's for production usage. Will need 1 new machine (prob from ALICE) and a WN for the test service currently on rb102. (Harry)

Discussion:

  • Gridview problem was down to how lhcbprod was mapped in GridView.
  • Lock contention still happening on FTS. Under investigation with DB + software experts
  • Security updates for SLC4 inc kernel have been released

Friday

Log: RAS

Actions:

  • FTS - disable atuators when going into maintenance so we don't have to turn off monitoring (Gavin)
  • 4AM Gridftp alarms - use new lemon features to smooth out these transient regular alarms (Jan)
  • Update LFC_SLOWREADDIR to smooth out transient alarms (James) - DONE
  • Alice VOBOX to be replaced today (lxb2065) - Ulrich

Discussion:

  • Perf issues on RAID for RB's - in discussion with Linux.Support
  • C2SC4 cluster will be offline for 1 week starting 11sept. This is to create a castor2 public stager out of it, with a decicated "SC4" pool
  • CMS still in setup phase for the 150MB/s export

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2007-02-02 - FlaviaDonno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback