Project Management
Tape Gateway release plans
Bug discovery of 7/9/2010 (German, Steve, Eric)
- Definition of a plan of action for a fast release of the tape gateway to production.
- Constrain: has to be compatible with rtcpclientd so operations can back off on rtcpclientd in case if problem.
- Conclusion: 14 step plan:
- Rework on the mighunter (underway by Steve)
- Multiple tape copies fix (still have to decide for the re-migration post-recall)
- Locking problem (underway by Eric)
- Truncation of files (removal of tapecopies by stager before end of migration)
- Analysis of the removal of the TapeGatewayRequest table (of at least the triggers populating it)
- Bugfix: Gateway should not spin getting tapes from the VMGR for pending streams that have no tape copies. (Compare streamsToDo vs tg_getStreamsWithoutTape)
- Back and forth (tapegateway <-> rtcpclientd mechanics ). If DDL is not needed anymore, make it a procedure.
- Improve logging of gateweay (need service class plus something else (Steve?) )
- Improve tape bridge logging (need VID+ drive unit)
- Code review
- Validation test
- Plan release and backport accordingly
- Acceptance test of the backport
- Run in C2 repack. Target date as of 26 Aug: 1st Nov)
Planning meeting of 26/8/2010 (German, Steve, Eric)
- Definition of a plan of action for a fast release of the tape gateway to production.
- Constrain: has to be compatible with rtcpclientd so operations can back off on rtcpclientd in case if problem.
- Conclusion: 13 step plan:
- Rework on the mighunter (underway by Steve)
- Multiple tape copies fix (still have to decide for the re-migration post-recall)
- Locking problem (underway by Eric)
- Truncation of files (removal of tapecopies by stager before end of migration)
- Analysis of the removal of the TapeGatewayRequest table (of at least the triggers populating it)
- Back and forth (tapegateway <-> rtcpclientd mechanics ). If DDL is not needed anymore, make it a procedure.
- Improve logging of gateweay (need service class plus something else (Steve?) )
- Improve tape bridge logging (need VID+ drive unit)
- Code review
- Validation test
- Plan release and backport accordingly
- Acceptance test of the backport
- Run in C2 repack. Target date as of 26 Aug: 1st Nov)
Past regular meetings
Thursdays, 15:00 - 16:00 in glassbox 2nd floor (31-2-029)
18/6/09
- Agenda: (ProjectNotes18609)
- Status of tpcp
- Status of aggregator
- Status of gateway
- Plans for next week
- AOB
02/07/09
- Agenda (ProjectNotes020709)
- Status of tpcp
- Status of aggregator
- Status of gateway
- Plans for next week
- AOB
- Support for multi-segmented files in aggregator/tape gw
- rtcpd logging: DLF and/or local files and/or syslog...
- Planning status - any update needed?
13/07/09 (Monday, exceptionally)
- Agenda (ProjectNotes130709)
- Status of tpcp
- Status of aggregator
- Status of gateway
- Plans for next week
- AOB
16/07/09
- Agenda (ProjectNotes160709)
- Status of tpcp
- Status of aggregator
- Status of gateway
- Plans for next week
- AOB
30/07/09
- Agenda (ProjectNotes300709)
- Status of readtp/writetp/dumptp + aggregator + gateway
- Testing the aggregator and gateway on ITDC - requires deployment (and early release) of aggregator.
- New service review meeting on thursdays:
- Plans for next week
- AOB
06/08/09
- Agenda (ProjectNotes060809)
- Status of readtp/writetp/dumptp + aggregator + gateway
- Timelines, and plans for testing until 2.1.9-0 and 2.1.9-1
- Plans for next week
- AOB
12/08/09
- Agenda (ProjectNotes120809)
- Status of readtp/writetp/dumptp + aggregator + gateway
- Update on timelines, and plans for testing until 2.1.9-0 and 2.1.9-1. Questions from yesterday's meeting with tape operations:
- c2itdc availability for our tests?
- validating 2.1.9 tape servers with 2.1.8 stagers?
- Dealing with the VDQM "card house" effect when issuing aggregator requests to tape servers not running the aggregator?
- install by hand aggregatord on 2.1.8 boxes?
- Plans for next week
- AOB
27/08/09
- Agenda (ProjectNotes270809)
- Test results and problem determination after stress testing on lxcastordev03
- Plans for next week
- AOB
03/09/09
- Agenda (ProjectNotes030909)
- Test results and problem determination - status
- Timelines for releases and deployment (as discussed in today's coordination meeting)
- AOB
25/09/09
- Agenda (ProjectNotes250909)
- Catch-up on Gateway/Aggregator test results and problem determination – status
- Remaining functional tests: multiple tape copies, issues with recalls
- Changes to rtcpd for bypassing syslog logging of debug messages (and dropping of useless debug messages)
- Others
- AOB
01/10/09
- Agenda (ProjectNotes011009)
- Feedback from planning meeting
- Testing / C2ITDC status
- Mighunter memleak status
- BUSY and FULL|BUSY problem status
- Looping tape recalls problem status
- Tape overflow issue status
- Aggregator VMGR check status
- Upgrade scripts
- Integrating STK drive
- Next tests to perform (see also previous minutes)
- AOB
08/10/09
- Agenda (ProjectNotes081009)
- Feedback from planning meeting * Testing / C2ITDC status
- Mighunter memleak status
- Mighunter performance status
- Gateway memleak correction status
- Tape performance issues
- Aggregator / VDQM request merging status
- Aggregator VMGR check status
- SVN branch merging
- Next tests to be run
- AOB
22/10/09
- Agenda: ( ProjectNotes231009)
- Feedback from planning meeting:
- 2.1.9-3 release (MigHunter)
- 2.1.9-4 release (mid-January, tapegateway)
- C2ITDC status (still down due to nameserver problem)
- TapeGateway problems to be tackled:
- sr #108583: Cleaning up the tape tables when tapes are deleted. A possible way out is to restart the file request from the beginning by querying the name server again (if this is not yet the case). This would adress #108583 and also address the problem described in another SR (SR #109980).
- sr #110702: Internal error on segment check causes high mount rate
- tg_getFileToRecall() PL/SQL procedure gives a return code of -1 in the case where there are no segments and in the case where the bestFileSystemForSegment() procedure has raised an exception. It would be better if the two cases had different return codes. We need to explicitly log and see the case of bestFileSystemForSegment() raising an exception because it means the system will loop mounting and dismounting the same tapes for recall but never actually doing any recalls.
- How does the tape gateway invalidate the disk copies of unfinished pending recalls when there is premature end of session sent by the aggregator?
- A DB deadlock which needs investigation:
- 2009-10-11T03:26:07.168742+02:00 c2itdcsrv102 tapegatewayd[22111]: LVL=Error TID=22118 MSG="Worker: db error while retrieving file to migrate" mountTransactionId=153522 errorCode="Internal error" errorMessage="Error caught in getFileToMigrate ORA-00060: deadlock detected while waiting for resource ORA-06512: at 'CASTOR_STAGER.TG_DEFAULTMIGRSELPOLICY', line 122 ORA-06512: at 'CASTOR_STAGER.TG_GETFILETOMIGRATE', line 69 ORA-06512: at line 1"
- Error caught in getFileToRecall ORA-01403: no data found
- 2009-10-11T13:38:00.756981+02:00 c2itdcsrv102 tapegatewayd[22111]: LVL=Warn TID=22139 MSG="Worker: received end notification error report" mountTransactionId=153880 errorcode=1015 errorMessage="Failed to process RTCPD sockets: File=ClientTxRx.cpp Line=876 Function=throwEndNotificationErrorReport: Client error report : Error caught in getFileToRecall ORA-01403: no data found"
- The time to get a volume by the tapegateway can last very long. Every 5th request takes more than 10 seconds. This can cause rtcpd to time out while waiting (after 1 minute). What takes the DB so long to answer back? This in principle should be a quick query to answer?
- SVN merging status / issues
- German’s migration/recall tests: status
- AOB
06/11/09
- Agenda: ( ProjectNotes061109)
- TapeGateway problems to be tackled (see minutes of previous meeting)
- c2itdc next tests - "hip" test, migHunter tests, overhead tests
- AOB
19/11/09
- Agenda: (ProjectNotes191109)
- Performance problems and multiple requests - proposed changes to tape aggregator
- Outstanding issues
- Testing plans
* Agenda: (
ProjectNotes031209)
-
- Planning for xmas et al absences:
- Gateway + aggregatord testing status and outstanding issues
- Packaging
- Further tests
Release planning
Time sheets for tape development team
Mailing lists
--
GermanCancio - 29 Jul 2009