Project Management

Tape Gateway release plans

Bug discovery of 7/9/2010 (German, Steve, Eric)

  • Definition of a plan of action for a fast release of the tape gateway to production.
    • Constrain: has to be compatible with rtcpclientd so operations can back off on rtcpclientd in case if problem.
  • Conclusion: 14 step plan:
    • Rework on the mighunter (underway by Steve)
    • Multiple tape copies fix (still have to decide for the re-migration post-recall)
    • Locking problem (underway by Eric)
    • Truncation of files (removal of tapecopies by stager before end of migration)
    • Analysis of the removal of the TapeGatewayRequest table (of at least the triggers populating it)
    • Bugfix: Gateway should not spin getting tapes from the VMGR for pending streams that have no tape copies. (Compare streamsToDo vs tg_getStreamsWithoutTape)
    • Back and forth (tapegateway <-> rtcpclientd mechanics ). If DDL is not needed anymore, make it a procedure.
    • Improve logging of gateweay (need service class plus something else (Steve?) )
    • Improve tape bridge logging (need VID+ drive unit)
    • Code review
    • Validation test
    • Plan release and backport accordingly
    • Acceptance test of the backport
    • Run in C2 repack. Target date as of 26 Aug: 1st Nov)

Planning meeting of 26/8/2010 (German, Steve, Eric)

  • Definition of a plan of action for a fast release of the tape gateway to production.
    • Constrain: has to be compatible with rtcpclientd so operations can back off on rtcpclientd in case if problem.
  • Conclusion: 13 step plan:
    • Rework on the mighunter (underway by Steve)
    • Multiple tape copies fix (still have to decide for the re-migration post-recall)
    • Locking problem (underway by Eric)
    • Truncation of files (removal of tapecopies by stager before end of migration)
    • Analysis of the removal of the TapeGatewayRequest table (of at least the triggers populating it)
    • Back and forth (tapegateway <-> rtcpclientd mechanics ). If DDL is not needed anymore, make it a procedure.
    • Improve logging of gateweay (need service class plus something else (Steve?) )
    • Improve tape bridge logging (need VID+ drive unit)
    • Code review
    • Validation test
    • Plan release and backport accordingly
    • Acceptance test of the backport
    • Run in C2 repack. Target date as of 26 Aug: 1st Nov)

Past regular meetings

Thursdays, 15:00 - 16:00 in glassbox 2nd floor (31-2-029)

18/6/09

02/07/09

  • Agenda (ProjectNotes020709)
    • Status of tpcp
    • Status of aggregator
    • Status of gateway
    • Plans for next week
    • AOB
      • Support for multi-segmented files in aggregator/tape gw
      • rtcpd logging: DLF and/or local files and/or syslog...
      • Planning status - any update needed?

13/07/09 (Monday, exceptionally)

16/07/09

  • Agenda (ProjectNotes160709)
    • Status of tpcp
    • Status of aggregator
    • Status of gateway
    • Plans for next week
    • AOB

30/07/09

06/08/09

  • Agenda (ProjectNotes060809)
    • Status of readtp/writetp/dumptp + aggregator + gateway
    • Timelines, and plans for testing until 2.1.9-0 and 2.1.9-1
    • Plans for next week
    • AOB

12/08/09

  • Agenda (ProjectNotes120809)
    • Status of readtp/writetp/dumptp + aggregator + gateway
    • Update on timelines, and plans for testing until 2.1.9-0 and 2.1.9-1. Questions from yesterday's meeting with tape operations:
      • c2itdc availability for our tests?
      • validating 2.1.9 tape servers with 2.1.8 stagers?
      • Dealing with the VDQM "card house" effect when issuing aggregator requests to tape servers not running the aggregator?
        • install by hand aggregatord on 2.1.8 boxes?
    • Plans for next week
    • AOB

27/08/09

  • Agenda (ProjectNotes270809)
    • Test results and problem determination after stress testing on lxcastordev03
    • Plans for next week
    • AOB

03/09/09

  • Agenda (ProjectNotes030909)
    • Test results and problem determination - status
    • Timelines for releases and deployment (as discussed in today's coordination meeting)
    • AOB

25/09/09

  • Agenda (ProjectNotes250909)
    • Catch-up on Gateway/Aggregator test results and problem determination – status
      • Remaining functional tests: multiple tape copies, issues with recalls
      • Changes to rtcpd for bypassing syslog logging of debug messages (and dropping of useless debug messages)
      • Others
    • AOB

01/10/09

  • Agenda (ProjectNotes011009)
    • Feedback from planning meeting
    • Testing / C2ITDC status
      • Mighunter memleak status
      • BUSY and FULL|BUSY problem status
      • Looping tape recalls problem status
      • Tape overflow issue status
      • Aggregator VMGR check status
      • Upgrade scripts
      • Integrating STK drive
      • Next tests to perform (see also previous minutes)
    • AOB

08/10/09

  • Agenda (ProjectNotes081009)
    • Feedback from planning meeting * Testing / C2ITDC status
      • Mighunter memleak status
      • Mighunter performance status
      • Gateway memleak correction status
      • Tape performance issues
      • Aggregator / VDQM request merging status
      • Aggregator VMGR check status
      • SVN branch merging
      • Next tests to be run
    • AOB

22/10/09

  • Agenda: ( ProjectNotes231009)
  • Feedback from planning meeting:
    • 2.1.9-3 release (MigHunter)
    • 2.1.9-4 release (mid-January, tapegateway)
  • C2ITDC status (still down due to nameserver problem)
  • TapeGateway problems to be tackled:
    • sr #108583: Cleaning up the tape tables when tapes are deleted. A possible way out is to restart the file request from the beginning by querying the name server again (if this is not yet the case). This would adress #108583 and also address the problem described in another SR (SR #109980).
    • sr #110702: Internal error on segment check causes high mount rate
    • tg_getFileToRecall() PL/SQL procedure gives a return code of -1 in the case where there are no segments and in the case where the bestFileSystemForSegment() procedure has raised an exception. It would be better if the two cases had different return codes. We need to explicitly log and see the case of bestFileSystemForSegment() raising an exception because it means the system will loop mounting and dismounting the same tapes for recall but never actually doing any recalls.
    • How does the tape gateway invalidate the disk copies of unfinished pending recalls when there is premature end of session sent by the aggregator?
    • A DB deadlock which needs investigation:
      • 2009-10-11T03:26:07.168742+02:00 c2itdcsrv102 tapegatewayd[22111]: LVL=Error TID=22118 MSG="Worker: db error while retrieving file to migrate" mountTransactionId=153522 errorCode="Internal error" errorMessage="Error caught in getFileToMigrate ORA-00060: deadlock detected while waiting for resource ORA-06512: at 'CASTOR_STAGER.TG_DEFAULTMIGRSELPOLICY', line 122 ORA-06512: at 'CASTOR_STAGER.TG_GETFILETOMIGRATE', line 69 ORA-06512: at line 1"
    • Error caught in getFileToRecall ORA-01403: no data found
      • 2009-10-11T13:38:00.756981+02:00 c2itdcsrv102 tapegatewayd[22111]: LVL=Warn TID=22139 MSG="Worker: received end notification error report" mountTransactionId=153880 errorcode=1015 errorMessage="Failed to process RTCPD sockets: File=ClientTxRx.cpp Line=876 Function=throwEndNotificationErrorReport: Client error report : Error caught in getFileToRecall ORA-01403: no data found"
    • The time to get a volume by the tapegateway can last very long. Every 5th request takes more than 10 seconds. This can cause rtcpd to time out while waiting (after 1 minute). What takes the DB so long to answer back? This in principle should be a quick query to answer?
  • SVN merging status / issues
  • German’s migration/recall tests: status
  • AOB

06/11/09

  • Agenda: ( ProjectNotes061109)
    • TapeGateway problems to be tackled (see minutes of previous meeting)
    • c2itdc next tests - "hip" test, migHunter tests, overhead tests
    • AOB

19/11/09

  • Agenda: (ProjectNotes191109)
    • Performance problems and multiple requests - proposed changes to tape aggregator
    • Outstanding issues
    • Testing plans

* Agenda: (ProjectNotes031209)

    • Planning for xmas et al absences:
    • Gateway + aggregatord testing status and outstanding issues
    • Packaging
    • Further tests

Release planning

Time sheets for tape development team

Mailing lists

-- GermanCancio - 29 Jul 2009

Edit | Attach | Watch | Print version | History: r28 < r27 < r26 < r25 < r24 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r28 - 2010-09-07 - EricCano
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DataManagement All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback