Streams Operations Manual
- connect locally to the RAC node as streams administrator user (strmadmin)
Streams Monitoring Tools
Checking the trace files and alert.log for problems
- messages about each capture process, propagation, and apply process are recorded in trace files for the database in which the processes are running
- run bdump to go to the bdump directory and examine the alert .log file
- trace files related to capture problems: sid_cnnn_iiiii.trc
- trace files related to propagation problems: sid_jnnn_iiiii.trc
- trace files related to apply problems: sid_pnnn_iiiii.trc
Troubleshooting Capture Problems
- Check capture process status:
- you can check whether a capture process is enabled, disabled, or aborted by querying the DBA_CAPTURE data dictionary view:
- select capture_name, status from dba_capture;
- if the capture process is disabled, then try restarting it.
- if the capture process is aborted, then you might need to correct an error before you can restart it successfully. The following query shows when the capture process aborted and the error that caused it to abort:
- select capture_name, status_change_time, error_message from dba_capture where status = 'ABORTED';
- Common capture issues:
- ORA-04031: unable to allocate x bytes of shared memory ("streams pool","unknown object","streams pool","kwqbsinfy:lcr") or ORA-23603 STREAMS enqueue aborted due to low SGA
- Oracle is running low on memory allotted for Streams.
- Check that propagation jobs and apply processes are enabled (and consuming messages).
- Troubleshooting propagation or apply problems, enabling any propagation or apply which might be disabled.
- Alternative: increase the memory dedicated to Streams (streams_pool_size initialization parameter) if possible.
- Re-start the capture process. If the problem persists, bounce the database can help.
- Capture process loops on startup
- select capture_name, state from gv$streams_capture; shows state : 'initalizing' or 'dictionary initialization'.
- there may be a missing logfile which cannot be opened.
- check metalink Note:313279.1
- ORA-01291: missing logfile
- required log files were removed before they are scanned by the capture process.
- determine the missing SCN range (v$logmnr_logs view) and restore the relevant log files.
- ORA-00600: [kwqbcpagent: subid] and ORA-00600 [4450] reported by the qmon processes after node restart
- being fixed.
- workaround: stop and restart the capture process.
- ORA-01280: Fatal Logminer Error
- Also ORA-04030: Out of process memory when ...
- Check memory consumption
- Re-start the capture process
- WARNING: no base object information defined in logminer dictionary!!! or MISSING Streams multi-version data dictionary!!!
- use the values associated with the objn field to determine the objects causing this problem.
- populate the data dictionary for a particular object running exec dbms_capture_adm.prepare_table_instantiation('
%object_name%
'); or exec dbms_capture_adm.prepare_schema_instantiation(' %schema_name%
');
- these errors can be safely ignored
- Capture process (running in Real Time mode) is in "WAITING FOR REDO" state during more than 1 hour (Please note that, if the capture process is running in archive log mode, this is the normal state for the capture process)
- select capture_name, parameter, value from dba_capture_parameters where parameter='DOWNSTREAM_REAL_TIME_MINE';
- In a downstream environment, the following errors are logged on the source database: ORA-00270: error creating archive log and ORA-16055: FAL request rejected.
- Cause: the archive log area is running out of space.
- Check using "~/work/monitor_streams/minimum_archivelog.sql" script which is the minimum archive log necessary for the capture process.
- Remove old archive logs. In a downstream environment, only the archive logs received from the source database (.../archivelog/from%SOURCE%/) are really necessary for the capture process.
- Error Message: Capture latency higher than default limit of 90 mins. Please check. (mail from the Streams monitoring tool)
- Capture process is in "CAPTURING CHANGES" status BUT nothing is being captured (LCRs captured and LCRs enqueued = 0/s) - only in downstream environment
- One (or more) of the source node are down: last archived log files have not been sent to the downstream database
- Check archived log files names missing on downstream and send and register them manually:
- Create or replace source directory (destination directory already created)
- @source
db use dbms_file_transfer.put_file(source_directory, file, destination_directory, file, db_link_downstream); exec dbms_file_transfer.put_file('SOURCE','thread_3_seq_34712.2101.656058743','DEST','log_3_34712_561483088.arc','LHCBDSC.CERN.CH');
- Register file with the capture process @downstream
database: alter database register or replace logfile 'logfile' for 'capture_name'; alter database register or replace logical logfile '+STRMDSC_RECODG1/LHCBDSC/ARCHIVELOG/FROMLHCBR/log_3_34712_561483088.arc' for 'STRMADMIN_CAPTURE_LFC';
- Capture process is in "PAUSED BY FLOW CONTROL" or "ENQUEUING MESSAGE" status:
- check the source queue, probably a big amount of LCRs are spilled over to disk.
- check which site has not consumed the LCRs: select subscriber_address, cnum_msgs, message_lag from gv$buffered_subscribers; (message_lag > 0).
- check if the destination site is down:
- If yes, then contact the destination dba in order to fix the problem.
- If no, then check the propagation and apply status.
- Start the capture process:
- exec dbms_capture_adm.start_capture('
%capture_name%
');
- check capture process status and state:
- select capture_name, status from dba_capture; - must be 'enabled'
- select capture_name, state from gv$streams_capture; - it will move from 'initializing' or 'dictionary initialization' to 'enqueuing message' or 'capturing changes' or 'creating LCR'.
- Note: when the capture process is restarted it might scan old archive log files. The column required_checkpoint_scn in dba_capture indicates the SCN at which the capture process restarts. Low activity period may cause checkpoint to go behind
- script which provides the minimum archive log necessary to restart the capture process: itrac37:~/work/monitor_streams/minimum_archivelog.sql
- Stop the capture process:
- exec dbms_capture_adm.stop_capture('
%capture_name%
');
Troubleshooting Propagation Problems
- Check propagation job status:
- you can check whether a propagation job is enabled, disabled, or aborted by querying the DBA_PROPAGATION data dictionary view:
- select propagation_name, status from dba_propagation;
- if the status is 'disabled' or 'aborted', then check the error message:
- select propagation_name, destination_dblink, status, error_date, error_message from dba_propagation;
- diagnose and correct the problem based on the error message (common issues: 'TNS: lost contact', 'TNS: no listener', 'TNS: connection timeout', etc).
- propagation job is disabled after 16 attempts.
- enable the propagation job.
- if the propagation job is enabled, but is not propagating messages:
- check for errors related to the propagation job (alert log file):
- propagation job might be not disabled yet (before 16 attempts) - check previous point
- or it might be paused by flow control (example: Propagation Schedule for (STRMADMIN.STREAMS_QUEUE_ONL_CAT, "STRMADMIN"."STREAMS_QUEUE_ONL_APT"@ATLR.CERN.CH) encountered following error: ORA-25307: Enqueue rate too high, flow control enabled) - check the apply status then try stopping and restarting the propagation.
- Note: stop_propagation and start_propagation procedures may hang due to the BUG:5330663
OEMAGENT BLOCKING STREAMS PROCESSES: oemagent session holds a shareable lock on the streams queue while the session which is attempting to manage the streams processes has requested an exclusive lock on the same object.
- workaround: disable the oem agent, clean the remaining sessions, stop and start the propagation job, and re-enable the oem agent.
- Enable the propagation job:
- exec dbms_propagation_adm.start_propagation('
%propagation_name%
');
- check propagation job status.
- Disable the propagation job:
- exec dbms_propagation_adm.stop_propagation('
%propagation_name%
');
Troubleshooting Apply Problems
- Check apply process status:
- you can check whether a apply process is enabled, disabled, or aborted by querying the DBA_APPLY data dictionary view:
- select apply_name, status from dba_apply;
- if the apply process is disabled, then try restarting it.
- if the apply process is aborted, then you might need to correct an error before you can restart it successfully. The following query shows when the apply process aborted and the error that caused it to abort:
- select apply_name, status_change_time, error_message from dba_apply where status = 'ABORTED';
- if the apply process is enabled, but changes are not applied:
- check that the apply process queue is receiving the messages to be applied
- select inst_id, queue_name, (num_msgs -spill_msgs) mem_msg, spill_msgs from gv$buffered_queues;
- select apply_name, total_received,total_applied, total_errors, (total_assigned -(total_rollbacks + total_applied)) being_applied from gv$streams_apply_coordinator;
- you can also use the web streams monitor tool or the monitor scripts (queues_status and apply_status).
- no transactions/LCRs are reaching the destination
- verify that the capture process and the propagation job are enabled and configured properly
- transactions are reaching the destination site, but are not being dequeued
- transactions/LCRs are reaching the destination site and being dequeued, but not being applied
- check if any of the apply servers is dequeuing LCRs:
SELECT SERVER_ID, STATE, APPLIED_MESSAGE_NUMBER, MESSAGE_SEQUENCE
FROM V$STREAMS_APPLY_SERVER
WHERE APPLY_NAME = 'APPLY_NAME'
ORDER BY SERVER_ID;
- If these values do not change (or do it very slowly) for one or more apply servers, then the apply server may not be performing well.
- determine the specific DML or DDL SQL statement being run by the apply server:
SELECT t.SQL_TEXT FROM V$SESSION s, V$SQLTEXT t, V$STREAMS_APPLY_SERVER a
WHERE a.APPLY_NAME = 'APPLY_NAME' AND a.SERVER_ID = X AND s.SID = a.SID AND s.SQL_ADDRESS = t.ADDRESS AND s.SQL_HASH_VALUE = t.HASH_VALUE
ORDER BY PIECE;
- determine the operation being performed currently by the specified apply server (returns the owner and name of the table on which the operation is being performed and the cost of the operation):
SELECT p.OPERATION, p.OPTIONS, p.OBJECT_OWNER, p.OBJECT_NAME, p.COST
FROM V$SQL_PLAN p, V$SESSION s, V$STREAMS_APPLY_SERVER a
WHERE a.APPLY_NAME = 'APPLY_NAME' AND a.SERVER_ID = X AND s.SID = a.SID AND p.HASH_VALUE = s.SQL_HASH_VALUE;
- make sure that, for each table to which the apply process applies changes, every key column has an index.
- check the error queue
- select apply_name, total_errors from gv$streams_apply_coordinator;
- check for apply errors by querying the dba_apply_error data dictionary view
- check apply process latency
- check for trace files
- Common apply issues:
- ORA-1031: Insufficient Privileges
- the apply user (strmadmin) doesn't have privileges to manipulate the object for which DML and DDL changes are applied
- explicitly grant the required privileges to the apply user
- re-execute the error transactions using dbms_apply_adm.execute_all_errors('
%apply_name%
');
- ORA-01403 No Data Found
- typically, it occurs when an apply process tries to update an existing row and the old_values in the row LCR do not match the current values at the destination database
- metalink Note: 265201.1
- ORA-26714: User error encountered while applying
- Start the apply process:
- exec dbms_apply_adm.start_apply('
%apply_name%
');
- check apply process status
- Stop the apply process:
- exec dbms_apply_adm.stop_apply('
%apply_name%
');
How to skip a transaction in a Streams environment
On the capture side
- Identify the transaction number:
On the apply side
- Identify the transaction number:
How to prevent a change from being captured
- First, you need to check whether the capture process is configured to capture "tagged" changes or not:
- select streams_name, rule_name, include_tagged_lcr from dba_streams_rules;
- Note that the downstream boxes for ATLAS and LHCB conditions are always configured to capture "tagged" LCRs. LHCB LFC is configured to NOT capture "tagged" LCRs.
- Set a tag in the session which will run the changes.
- EXEC DBMS_STREAMS.SET_TAG(tag => HEXTORAW('17'));
- Make the changes.
- If you are going to make other changes in the current session that you want to replicate destination databases, then reset the tag for the session to an appropriate value, as in the following example:
- EXEC DBMS_STREAMS.SET_TAG(tag => NULL);
How to reduce the high watermark of Advanced Queueing objects
Among the tasks that the Queue Monitor (QMON) process performs is space management on the dequeue IOT/index associated with an AQ table in order to maintain the dequeue performance.
In locally managed tablespaces using automatic space segment management (ASSM) QMON will not perform these space management tasks with the result that the dequeue IOT will continue to grow in size.
For this reason, it is recommended to perform on regular basis a dynamic shrink of the AQ objects. You can use the following script:
-- as sysdba
grant execute on sys.dbms_system to strmadmin;
-- as strmadmin
create or replace procedure deqiotcoalesce
as
v_rebuild_statement VARCHAR2(1000);
pre_rebuild_statement VARCHAR2(1000);
post_rebuild_statement VARCHAR2(1000);
err_msg VARCHAR2(100);
cursor c is
select qt.owner owner, qt.queue_table queuetable
from dba_queue_tables qt, dba_tablespaces ts, dba_tables t where ts.segment_space_management='AUTO' and qt.queue_table=t.table_name and
ts.tablespace_name=t.tablespace_name and qt.owner=t.owner and qt.recipients='MULTIPLE' and qt.owner='STRMADMIN';
BEGIN
sys.dbms_output.enable(1000000);
sys.dbms_system.ksdwrt(1,'Dequeue IOT/index coalesce procedure starting execution at ' ||
to_char(sysdate, 'DD-MM-YYYY HH24:MI:SS'));
BEGIN
FOR c_rec IN c LOOP
pre_rebuild_statement:= 'ALTER TABLE '||c_rec.owner||'.'||c_rec.queuetable||' ENABLE ROW MOVEMENT';
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.'||c_rec.queuetable||' SHRINK SPACE CASCADE';
post_rebuild_statement:= 'ALTER TABLE '||c_rec.owner||'.'||c_rec.queuetable||' DISABLE ROW MOVEMENT';
sys.dbms_system.ksdwrt(1,'Executing '||pre_rebuild_statement||'; '||v_rebuild_statement||'; '||post_rebuild_statement||'.');
EXECUTE IMMEDIATE pre_rebuild_statement;
EXECUTE IMMEDIATE v_rebuild_statement;
EXECUTE IMMEDIATE post_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
pre_rebuild_statement:= 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_P ENABLE ROW MOVEMENT';
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_P SHRINK SPACE CASCADE';
post_rebuild_statement:= 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_P DISABLE ROW MOVEMENT';
sys.dbms_system.ksdwrt(1,'Executing '||pre_rebuild_statement||'; '||v_rebuild_statement||'; '||post_rebuild_statement||'.');
EXECUTE IMMEDIATE pre_rebuild_statement;
EXECUTE IMMEDIATE v_rebuild_statement;
EXECUTE IMMEDIATE post_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_I SHRINK SPACE';
sys.dbms_system.ksdwrt(1,'Executing '||v_rebuild_statement||'.');
EXECUTE IMMEDIATE v_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_T SHRINK SPACE';
sys.dbms_system.ksdwrt(1,'Executing '||v_rebuild_statement||'.');
EXECUTE IMMEDIATE v_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_H SHRINK SPACE';
sys.dbms_system.ksdwrt(1,'Executing '||v_rebuild_statement||'.');
EXECUTE IMMEDIATE v_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
v_rebuild_statement := 'ALTER TABLE '||c_rec.owner||'.AQ$_'||c_rec.queuetable||'_D SHRINK SPACE';
sys.dbms_system.ksdwrt(1,'Executing '||v_rebuild_statement||'.');
EXECUTE IMMEDIATE v_rebuild_statement;
sys.dbms_system.ksdwrt(1,'Executed successfully.');
END LOOP;
EXCEPTION
WHEN OTHERS THEN
sys.dbms_system.ksdwrt(1,'Dequeue IOT/index coalesce procedure ended execution unsuccessfully at ' || to_char(sysdate, 'DD-MM-YYYY HH24:MI:SS'));
raise_application_error(-20001,err_msg);
END;
sys.dbms_system.ksdwrt(1,'Dequeue IOT/index coalesce procedure ended successful execution at '|| to_char(sysdate, 'DD-MM-YYYY HH24:MI:SS'));
EXCEPTION
WHEN OTHERS THEN
sys.dbms_system.ksdwrt(1,'Dequeue IOT/index coalesce procedure ended execution unsuccessfully at ' || to_char(sysdate, 'DD-MM-YYYY HH24:MI:SS'));
raise_application_error(-20002,err_msg);
END;
/
-- as strmadmin
exec deqiotcoalesce();
Streams re-synchronization after Recovery operation
Performing Database Recovery in a Streams environment: Description, Steps and Tests
Streams Split &Merge solution
When one site goes down, the source queue is filled with the LCRs/events which have not been consumed by the down site, affecting to the performance of the whole configuration. In order to avoid that, the following solution is proposed:
- The idea is to split the capture process in two capture processes:
- The original one (real-time downstream capture in our setup) for the sites working well
- New capture process for the site/s unstable/s
- New capture queue and new propagation job are defined
- The original propagation job is dropped so the spilled LCRs are removed from the original source queue
- METHOD #1:
- IMPORTANT: This method only can be used if the following condition is true:
- select oldest_message_number from dba_apply_progress; (destination database) > select max_checkpoint_scn from dba_capture; (source or downstream database) - FOR ALL CAPTURE PROCESSES RUNNING -
- Please use generate_split.sql: to automatically generate the split script
- or generate_split_avoidORA600.sql: To automatically generate the split script and avoid the ORA 600 error when dropping propagation
- METHOD #2:
- For both methods, please check the streams recommendations for capture and propagation jobs and apply them.
- If the Streams environment is running a downstream capture in real time:
- wait until all the standby logs are unassigned: select * from v$standby_logs;
- change capture parameter "downstream_real_time_mine" to "Y": exec dbms_capture_adm.set_parameter('
%capture_name%
', parameter => 'downstream_real_time_mine', value=> 'Y'); Note: This operation will re-start the capture process.
- Once the site is stable again, both capture processes are merged in the original one
Note: you will need to adjust some parameters before generating the scripts
New automatic procedures
If patches 7263055 + 7480651 to fix ORA-600 [KWQBMCRCPTS101] when dropping propagation have been applied, you can use:
- Streams administrator user must have dba privileges granted directly (without using dba role)
- Connected as streams administrator (strmadmin), the following procedures:
- split: it will drop the original propagation for the site which is down (or will be down) and will prepare/create the configuration to synchronize it using
- resynchronize_site: this procedure is automatically executed if the destination site is available. If not, the output will indicate you the parameters you should use when the site is available again.
- Temporary capture is not started automatically, has to be done manually later (when the site is up again)
exec split('STREAMS_PROP_STREVA_STRMTEST','STREAMS_CAP_TEMP','STRM_QUEUE_TEMP','STREAMS_PROP_TEMP');
Original Capture: STRMADMIN_CAPTURE_STREVA
Original Queue: STREAMS_QUEUE_STREVA_CA Primary Inst: 1 Secondary Inst: 2
Source database name: D3R.CERN.CH
Capture Rule Set name: RULESET$_18
Propagation Rule Set name:
Destination queue name: STREAMS_QUEUE_STREVA_AP
Destination db link: STRMTEST.CERN.CH
Destination is down - execute resynchronize_site manually later
exec resynchronize_site('STRMTEST.CERN.CH','STREAMS_CAP_TEMP','STRM_QUEUE_TEMP',1,2,'STREAMS_PROP_TEMP','STREAMS_QUEUE_STREVA_AP','RULESET$_18','');
Stopping original capture....
Original capture process STRMADMIN_CAPTURE_STREVA successfully stopped.
Removing original propagation....
Original propagation job STREAMS_PROP_STREVA_STRMTEST successfully dropped.
Starting original capture....
Original capture process STRMADMIN_CAPTURE_STREVA successfully started.
PL/SQL procedure successfully completed.
-- As 'STREAMS_PROP_STREVA_STRMTEST' is already disabled, you will need to run the following statement manually, when site is up again, to re/synchronize:
exec resynchronize_site('STRMTEST.CERN.CH','STREAMS_CAP_TEMP','STRM_QUEUE_TEMP',1,2,'STREAMS_PROP_TEMP','STREAMS_QUEUE_STREVA_AP','RULESET$_18','');
-- Otherwise, it is run automatically.
-
- Merge: this procedure merges the original Streams configuration with the temporary streams components created during the split procedure.
- It might happen that the capture process re-starts in a old archived log file (depending on the checkpoints frequency and the replication workload).
- The best way to avoid the capture process to start in a very old archived log or scn is:
- stop original capture
- switch log file at source database
- check that oldest apply scn at "split site" > oldest apply scn at "original sites"
- stop temporary capture
- wait until temporary queue is empty (all events have been applied at the destination site)
- move back temporary propagation to original propagation
- start original capture
- Temporary processes must be cleaned manually!
exec merge('STRMADMIN_CAPTURE_STREVA','STREAMS_CAP_TEMP','STREAMS_PROP_STREVA_STRMTEST','STREAMS_PROP_TEMP');
Stopping original capture....
Original capture process STRMADMIN_CAPTURE_STREVA successfully stopped.
Stopping clone capture....
Clone capture process STREAMS_CAP_TEMP successfully stopped.
Stopping clone propagation....
Clone propagation job STREAMS_PROP_TEMP successfully stopped.
Propagation job STREAMS_PROP_STREVA_STRMTEST to destination STRMTEST.CERN.CH has been successfully added.
Starting original capture....
Original capture process STRMADMIN_CAPTURE_STREVA successfully started.
Merge procedure has finished successfully. Please clean temporary processes and queues!
PL/SQL procedure successfully completed.
How to replace the Streams setup if downstream database crashes
The easiest and fastest solution is to use the source database for the downstream capture setup as replacement, because all the archive logs and latest Streams dictionary are available.
(Use a new database as replacement will imply to move all the archive logs manually from the source database to the new database)
- Disable log transport services at source: alter system set log_archive_dest_state_2=DEFER scope=both;
-
(1)
Check the latest Streams dictionary which can be used to re-create the capture process at source: select distinct first_change#, name, first_time from v$archived_log where dictionary_begin='YES' order by 3;
-
(2)
Check the oldest message number applied at each of the destinations: select apply_name, oldest_message_number from dba_apply_progress;
- Streams parameters must be already configured (probably they are not on all instances), just check!
- Streams administrator and appropriate database links are already created.
- Use the scripts available on the Streams scripts Repository to setup the downstream capture and be sure to implement the following changes:
- It is a good idea to create the logminer tablespace, if it does not exist.
- Skip the database links creation, they have been already created (under strmadmin user).
- Create the capture process and indicate the following parameters:
- first_scn = first_change# of latest Streams dictionary
(1)
- the archive log which contains the first change number and all subsequent archive logs must be available
- start_scn = oldest_message_number from the apply side
(2)
- first_scn <= start_scn
- If oldest_message_number is different between all the destinations, you can choose the lowest one or create different captures with different parameters.
- Skip parameter "downstream_real_time_mine" if you are not creating a downstream capture process.
- Check Streams Recommendations and adjust the capture parameters.
- Create the propagation jobs for each destination.
- Start the capture process.
Configure back the downstream database
- Remove old setup: capture process/es, propagation jobs and queue/s.
- Enable log transport services at source: alter system set log_archive_dest_state_2=ENABLE scope=both;
- Check that the archive log files are shipped to the downstream database correctly.
- Build a new Streams dictionary:
SET SERVEROUTPUT ON
DECLARE
scn NUMBER;
BEGIN
DBMS_CAPTURE_ADM.BUILD ( first_scn => scn );
DBMS_OUTPUT.PUT_LINE ('First SCN: ' || scn);
END;
/
-
(1)
select distinct first_change#, name, FIRST_TIME from v$archived_log where dictionary_begin='YES' order by 3;
- Stop capture process. Wait until the source queue is empty. Stop propagation jobs.
-
(2)
Check the oldest message number applied at each of the destinations: select apply_name, oldest_message_number from dba_apply_progress;
- Use the scripts available on the Streams scripts Repository to setup the downstream capture and be sure to implement the following changes: