Week of 051024

Open Actions from last week:

  • Preparation of intervention this coming Sat/Sun/Mon/Tue DONE
  • move VOBOXes to Quattor
  • Pilot to move to 1.4.1 (Gavin) FOR Intervention

  • check possiblility of using LCG Quattor WG componets for LFC/DPM/... (Jan/Vlado/Sophie) TO TEST
  • Escalate problem with lost packets on loopback I/F to linux.support (Olof) IN PROGRESS

On Call: James and Jamie

Monday:

Log:
  • Friday afternoon: FTS channel agent daemon died; no core file. Suspected due to rebuild of password file at same time (rare bug). Restarted quickly.
  • Friday night - 23:10, LHCb stopped contacting FTS. Problem seemed to be on their side.
  • CASTOR2 - one file system full Sunday morning (problem in one of Oracle procedures). Understood. Roll up fix for next Monday. Tim - should this be alarmed to operator logs? Olof - currently not alarmed, maybe should be... (TonyO - yes!)

New Actions:

  • Intervention plan by area for review Wednesday DONE
Discussion:

Tuesday:

Log: Nothing

New Actions:

  • Intervention plan by area for review Wednesday DONE
  • Vlado - report and followup problem on NIC with Linux-support DONE
  • new machine for Romain (Jan VE) - lxshare220d DONE
  • new sensors for FTS - Gavin - detected all agents down at 2AM -need to test (Gavin) DONE
  • lcg-mon-gridftp to 1.2.0 (James) DONE

Discussion:

  • NCM for DPM - Vlado

Wednesday

Log: Nothing

New Actions:

  • Look at queries in FTS for locking problem (Gavin/Paolo)
  • LHCb want T1-T1 channels. Need to discuss with fts-support after. (Gavin/Roberto/James) DONE

Discussion:

  • FTS - restart of agents was due to bad logrotate. will get after 1.4.1
  • FTS - seems to be lock contention on DB or very long queries - LHCb saw problem.
  • LFC still have performance concerns, may still want read-only insecure catalog

Thursday

Log: Problem with R-GMA - Gridview not updated. LCG_MON_GRIDFTP wrong on lxshare26d

Actions:

  • followup bug in lcg-mon-gridftp code non-existent empty log files (Maarten) DONE
  • R-GMA gridftp producers seem ok - followup with second level procedures (James) DONE
  • Shut down FTS on sunday night to drain queues - announce (James) DONE
  • How to diable automatic action for FTS? (Gavin) DONE
  • Create list of machines in intervention (James) DONE

Discussion:

  • SRM released next week will fix the multi-file srm copy problem.
  • Do sites have to upgrade clients to FTS 1.4.1 - no, they're backwards compatible.

Friday

Log: Yujun reported problems with transfer. Castor stagercrashed ~3AM.

Actions:

  • Send out new version of plan (James) DONE
  • Add contact details to plan (James) DONE
  • Meeting at 2pm (James) DONE
  • FTS no transfers to PIC for 3 hours (Gavin) DONE

Discussion:

  • Could not find Yujun's problem - perhaps a network problem
  • stager crashed - was restarted - will be investigated today
  • DB load seen yesterday. Normal in terms of query performance.
  • Configuration for FTS/LFC database limits needs to be checked

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2005-10-28 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback