find which sessions or SQLs generate excessive redo log
Once a while we found excessive redo logs were generated in our production databases. In order to troubleshoot this issue, the first step is to find out sessions or sqls. Depending on whether the issue is still happening or not, we can use different methods to figure out the low-hanging fruit.
a.) if the issue is already gone, we can use logminer to find which operation caused this issue by follwing MOS 300395.1: How To Determine The Cause Of Lots Of Redo Generation Using LogMiner. However, there is not redo statistic in v$logmnr_contents, so we may have to guess which statement generates most of redo records based on execution count.
b.) if the issue is still happening, we have more diagnostic methods to find out sessions or SQLs.
To find out which sessions generate excessive redo,we can calcuate the ‘redo size’ delta from v$sesstat between two snapshots to find out the session which generates more redo, or simply use the Tanel Poder’s snapper.sql.
Sometimes we may find several sessions generate excessive redo when connection pool is used, and we want to find which sql from these sessions cause the issue. unfortunately, there is not a view which can tell us how much redo generated for a specific SQL.
As we know, when a transation changes a block in regular tablespaces, it generates redo records and undo records as well, and the undo consumption is recorded in v$transation(The USED_UBLK column of v$transaction view shows the number of undo blocks used and the USED_UREC column shows the number of undo records used by the transaction), Therefore, we can monitor undo usage to narrow down which SQLs generate excessive redo by sampling the sessions repeatedly.
Below is the code I used.
define snapper_sleep="&1" define snapper_count="&2" set termout on serveroutput on declare cursor cur_trans is select s.sid ,s.sql_id,s.serial#,t.USED_UBLK from v$session s, v$transaction t where s.saddr=t.ses_addr; begin dbms_output.put_line('seq# sid serial# sql_id used_ublk '); for c in 1..&snapper_count loop for trans_rec in cur_trans loop dbms_output.put_line(c ||' '||trans_rec.sid ||' '|| trans_rec.serial# ||' '|| trans_rec.sql_id ||' '|| trans_rec.USED_UBLK); end loop; dbms_lock.sleep(&snapper_sleep); end loop; end; /
Note, one transaction may include multiple DMLs, and the number of undo blocks is calcualted from the start of this transaction, not from the start of current SQL, so USED_UBLK in this veiw may not be the number of undo blocks generated by the current DML. However, we should be able to find out which SQLs generate more undo/redo from samples.
Below is example.
SQL> @test3 0.5 120 old 11: for c in 1..&snapper_count loop new 11: for c in 1..120 loop old 18: dbms_lock.sleep(&snapper_sleep); new 18: dbms_lock.sleep(0.5); seq# sid serial# sql_id used_ublk 43 246 55295 bqnss5311qu48 8 44 246 55295 bqnss5311qu48 11 45 246 55295 bqnss5311qu48 20 46 246 55295 bqnss5311qu48 35 47 246 55295 bqnss5311qu48 39 48 246 55295 bqnss5311qu48 40 49 246 55295 bqnss5311qu48 50 50 246 55295 bqnss5311qu48 71 51 246 55295 bqnss5311qu48 87 52 246 55295 ayqqj8sga53j7 1 55 256 11051 bqnss5311qu48 19 56 256 11051 bqnss5311qu48 42 57 256 11051 bqnss5311qu48 55 58 256 11051 bqnss5311qu48 83 59 256 11051 bqnss5311qu48 93 60 134 4155 bqnss5311qu48 35 61 134 4155 bqnss5311qu48 42 62 134 4155 bqnss5311qu48 56 62 134 4155 bqnss5311qu48 1 63 134 4155 bqnss5311qu48 82 64 202 3243 2hakxczvxbacn 2 65 202 3243 bqnss5311qu48 32 66 202 3243 bqnss5311qu48 59 67 202 3243 9jyxc671ps26n 78 68 202 3243 bqnss5311qu48 83 69 27 56385 9jyxc671ps26n 1 70 27 56385 9jyxc671ps26n 17 71 27 56385 9jyxc671ps26n 30 71 27 56385 9jyxc671ps26n 0 72 27 56385 9jyxc671ps26n 46 73 310 17635 9jyxc671ps26n 2 74 310 17635 9jyxc671ps26n 18 75 310 17635 9jyxc671ps26n 29 76 310 17635 9jyxc671ps26n 38 77 310 17635 9jyxc671ps26n 45 78 310 17635 9jyxc671ps26n 57 79 212 5157 9jyxc671ps26n 10 80 212 5157 9jyxc671ps26n 24 81 212 5157 9jyxc671ps26n 39 82 212 5157 9jyxc671ps26n 54 83 57 3099 9jyxc671ps26n 13 84 57 3099 9jyxc671ps26n 23 85 57 3099 9jyxc671ps26n 39 86 57 3099 9jyxc671ps26n 54 87 256 11051 ayqqj8sga53j7 1 88 256 11051 ayqqj8sga53j7 1 89 256 11051 ayqqj8sga53j7 1 PL/SQL procedure successfully completed.
From above samples, we can see sql_id: bqnss5311qu48 & 9jyxc671ps26n generated more undo/redo.
no trace file was generated for 10053 event
Recently one of our database was upgraded from 10.2.0.3 to 11.2.0.3, and we got the performance issue for a query. When I investigated the performance for this query, I purged the cursor from shared pool, set the event 10053 , and reran the query in order to check why optimizer choose a bad execution plan. However, when I checked the trace directory, nothing was generated there. Then I tried a simple query:
SQL>alter session set events '10053 trace name context forever'; System altered. SQL>select /* test1 */ * from dual; D - X
Still got nothing. I tried 10046 event, and the trace was generated successfully. So I decided to check the non-default parameters set in this database, and found trace_enabled was set to false. After I changed this parameter to true, I was able to get trace file.
When I did the same testing in 10.2.0.3 database, I was able to get the trace file for 10053 event no matter which value trace_enabled was set.
Searched on MOS, and found a bug reported.
Bug 7304656 : TRACE_ENABLED=FALSE DISABLES EVENT 10053 TRACE IN 11G
Looks likely this bug impacts 11G release 2 as well, and it hasn’t been fixed in 11.2.0.3.
ORA-00060 deadlock: TX S wait on X
Development team reported they got the ORA-00060: deadlock error, but they couldn’t figure out what was the root cause, and why a session locked itself, so they requested DBA to assist them.
Below is the deallock graph in the dump file
25 Deadlock graph: 26 ---------Blocker(s)-------- ---------Waiter(s)--------- 27 Resource Name process session holds waits process session holds waits 28 TX-00230025-0001cdbd 31 439 X 31 439 S 29 session 439: DID 0001-001F-0000051A session 439: DID 0001-001F-0000051A 30 Rows waited on: 31 Session 439: no row 32 Information on the OTHER waiting sessions: 33 End of information on OTHER waiting sessions.
Deadlock usually occurs when two or more sessions are trying to access the same resources, but each session holds the the resource that is requested by the other session, so none of them can move forward. So usually in deadlock graph, two or more sessions are involved.
However, in above case, only one session is involved. It is called self dead lock, that is, a session locks itself. Checked their code, and found autonomous_transaction was used. This makes things clear. If main transaction holds the lock, and waits for autonomous_transaction to be completed, but autonomous_transaction is waiting for the lock which is hold by main transaction in incompatible mode, thus self dead lock occurs. In this case, the main transaction of session 439 held TX lock on in X mode while autonomous_transaction was waiting for TX lock in S mode.
Usually, when we do DML on a row, TX lock in X mode is requested, so under which case will a transaction request for TX lock in 4(S) mode ?
At least under following two situations, a transaction will wait for TX lock in S mode
1. There is an unique constraint. Two transactions try to operate on row which has the same value on the unique constraint, if two transactions could be committed, the unique constraint would be violated .
session1: insert one row to the tab_lock table
TEST@db10g>select sys_context('userenv', 'sid') sid from dual; SID ------------------------------ 159 TEST@db10g>create table tab_lock(c1 number unique, c2 number); Table created. TEST@db10g>insert into tab_lock values(1,1); 1 row created.
session2: also insert one row which has same c1 value as session 1 into the tab_lock table,
SYS@db10g>select sys_context('userenv', 'sid') sid from dual; SID ------------------------------ 150 TEST@db10g>insert into tab_lock values(1,2);
session 2 is hung now
session 3: check the lock hold and requested by session 1 and session 2:
select type||'-'||id1||'-'||id2 "resource",sid, decode(LMODE,0,'None',1,'Null',2,'Row Share',3,'Row Exclusive',4,'Share',5,'Share Row Exclusive',6,'Exclusive') lock_hold, decode(REQUEST,0,'None',1,'Null',2,'Row Share',3,'Row Exclusive',4,'Share',5,'Share Row Exclusive',6,'Exclusive') lock_request,ctime,block from v$lock where type in ('TX','TM') and sid in (159,150) order by sid; resource SID LOCK_HOLD LOCK_REQUEST CTIME BLOCK ------------------------- ------ -------------------- -------------------- ---------- ---------- TX-131114-322 150 None Share 6 0 TM-51358-0 150 Row Exclusive None 6 0 TX-262177-323 150 Exclusive None 6 0 TX-131114-322 159 Exclusive None 12 1 TM-51358-0 159 Row Exclusive None 12 0
The transaction in session 2 is waiting for TX lock in S mode
2. There is a foreign key constrain. Doing the DML on child table, but DML on parent table has not committed yet. If DML on child table could be committed, but DML on parent table got rollback, the foreign key constrain would be violated
session 1: delete or update parent table, but hasn’t committed yet;
TEST@db10g>create table p (c1 number primary key); Table created. TEST@db10g>create table c (c1 number,c2 number, CONSTRAINT fk_c1 FOREIGN KEY (c1) REFERENCES p(c1)); Table created. TEST@db10g>insert into p values(1); 1 row created. TEST@db10g>commit; Commit complete. TEST@db10g>insert into c values(1,1); 1 row created. TEST@db10g>commit; Commit complete. TEST@db10g>insert into p values(2); 1 row created. TEST@db10g>commit; Commit complete. TEST@db10g>delete from p where c1=2; 1 row deleted. or update p set c1=3 where c1=2;
session 2: insert or update a row in child table, but c1 value of this row has been deleted or updated by session 1
TEST@db10g>insert into c values(2,2); or update c set c1=3 where c1=1;
session 2 is hung now
session 3: check the lock hold and requested by session 1 and session 2:
resource SID LOCK_HOLD LOCK_REQUEST CTIME BLOCK ------------------------- ------ -------------------- -------------------- ---------- ---------- TX-393220-332 150 None Share 3 0 TM-51443-0 150 Row Share None 3 0 TX-589865-343 150 Exclusive None 3 0 TM-51446-0 150 Row Exclusive None 3 0 TX-393220-332 159 Exclusive None 18 1 TM-51443-0 159 Row Exclusive None 18 0
The transaction in session 2 is waiting for TX lock in S mode
Finally, I can reproduce this issue by the following example:
TEST@db10g>INSERT INTO tab_lock VALUES (1,1); 1 row created. TEST@db10g>DECLARE PRAGMA AUTONOMOUS_TRANSACTION; BEGIN INSERT INTO tab_lock VALUES (1,1); COMMIT; END; / 2 3 4 5 6 7 DECLARE * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource ORA-06512: at line 4
trace file
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-00040018-0000014a 15 159 X 15 159 S session 159: DID 0001-000F-00000013 session 159: DID 0001-000F-00000013 Rows waited on: Session 159: no row Information on the OTHER waiting sessions: End of information on OTHER waiting sessions.
database connection issue caused by wrong TNS_ADMIN setting
Users reported their applications could not connect to databases, and thus stopped working after their databases were patched last weekend. They got “ORA-28040: No matching authentication protocol” error either from the front end or the log files. This error is related with SQLNET.ALLOWED_LOGON_VERSION, which is used to specify which authentication protocols are allowed by the client or database. If the client and database server do not have at least one matching version, then authentication fails with an error. For more information, please refer to: http://docs.oracle.com/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIEAFHA
However, no one touched the sqlnet.ora in both client and server side during patching. TNS_ADMIN is not supposed to be set on database server side, and the file sqlnet.ora in $ORACLE_HOME/network/admin doesn’t include SQLNET.ALLOWED_LOGON_VERSION parameter. So things look weird.
Luckily, Oracle provides a procedure called get_env in dbms_system package , which can be used to assist the investigation under such situation. Environment variables including TNS_ADMIN and ORACLE_HOME can be retrieved from the server by dbms_system.get_env.
sqlplus 'sys/xxx@xxx</a> as sysdba' SQL> var tmp_str varchar2(255); SQL> exec dbms_system.get_env('ORACLE_HOME',:tmp_str); PL/SQL procedure successfully completed. SQL> print tmp_str TMP_STR -------------------------------------------------------------------------------- /orabin/rdbms/10203 SQL> exec dbms_system.get_env('TNS_ADMIN',:tmp_str); PL/SQL procedure successfully completed. SQL> print tmp_str TMP_STR -------------------------------------------------------------------------------- /orabin/rdbms/11.2.0.3/network/admin
From above output, we can see the TNS_ADMIN was set to
/orabin/rdbms/11.2.0.3/network/admin while oracle_home was set to /orabin/rdbms/10203, however, we are not supposed to use this configuration. Obviously someone set TNS_ADMIN by mistake.
I did see SQLNET.ALLOWED_LOGON_VERSION is set to 10 in the sqlnet.ora under /orabin/rdbms/11.2.0.3/network/admin, which is the root cause of our database connection issue. After correcting the settings, all applications are able to login the databases.
restore controlfile failed with RMAN-06026: some targets not found – aborting restore
Tried to restore control file, but got the following error message:
RMAN> { 2> allocate channel t1 device type 'SBT' ; 3> restore controlfile; 4> } allocated channel: t1 channel t1: sid=4388 devtype=SBT_TAPE channel t1: Veritas NetBackup for Oracle - Release 7.1 (2011020313) Starting restore at 09-APR-12 released channel: t1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 04/09/2012 17:02:36 RMAN-06026: some targets not found - aborting restore RMAN-06101: no channel to restore a backup or copy of the control file
From error message, looks like the backup of control didn’t exist, so rman complained “some targets not found” . However, from list command there were several backups of controlfile available.
RMAN> list backup of controlfile; List of Backup Sets =================== ...................... BS Key Type LV Size Device Type Elapsed Time Completion Time ------- ---- -- ---------- ----------- ------------ --------------- 80925 Incr 1 17.75M SBT_TAPE 00:00:27 28-MAR-12 BP Key: 80969 Status: AVAILABLE Compressed: NO Tag: TAG20120328T130241 Handle: df_TEST_1_b8n70nok_779116308 Media: 521043 Control File Included: Ckp SCN: 2583318674299 Ckp time: 28-MAR-12 BS Key Type LV Size Device Type Elapsed Time Completion Time ------- ---- -- ---------- ----------- ------------ --------------- 81464 Full 17.75M SBT_TAPE 00:00:30 28-MAR-12 BP Key: 81466 Status: AVAILABLE Compressed: NO Tag: TAG20120328T131226 Handle: c-381083800-20120328-01 Media: 521043 Control File Included: Ckp SCN: 2583318674322 Ckp time: 28-MAR-12 BS Key Type LV Size Device Type Elapsed Time Completion Time ------- ---- -- ---------- ----------- ------------ --------------- 81740 Full 17.75M SBT_TAPE 00:00:30 28-MAR-12 BP Key: 81751 Status: AVAILABLE Compressed: NO Tag: TAG20120328T131734 Handle: c-381083800-20120328-02 Media: 521043 Control File Included: Ckp SCN: 2583318674495 Ckp time: 28-MAR-12
What’s the problem for above restore command? Was anything wrong with backups. Let’s try to restore controlfile from backup file directly.
RMAN> run { 2> allocate channel t1 device type 'SBT'; 3> restore controlfile from 'c-381083800-20120328-02'; 4> } allocated channel: t1 channel t1: sid=4390 devtype=SBT_TAPE channel t1: Veritas NetBackup for Oracle - Release 7.1 (2011020313) Starting restore at 10-APR-12 channel t1: restoring control file channel t1: restore complete, elapsed time: 00:05:40 output filename=/oradata/test/u01/ctrl1.ctl output filename=/oradata/test/u02/ctrl2.ctl output filename=/oradata/test/u02/ctrl3.ctl Finished restore at 10-APR-12 released channel: t1
Although restoring controlfile succeeded by specifying the backup file, I wanted to figure out what’s wrong w/pervious restore command. Obviously there was no problem for backup or backupset. For rman, there is a debug option which can be used for troubleshooting. Reran the first restore command with debug option to generate a trace file.
rman target <user/pwd> catalog <user/pwd> debug trace='debug.out' RMAN> { 2> allocate channel t1 device type 'SBT' ; 3> restore controlfile; 4> }
The following information was extracted from the trace file:
findcontrolfilebackup:
DBGSQL: EXEC SQL AT RCVCAT begin dbms_rcvman . findControlfileBackup ( allCopies => TRUE ) ; end ; [17:17:44.559]
DBGSQL: sqlcode=0 [17:17:44.562]
DBGRCVMAN: ENTERING findControlfileBackup
DBGRCVMAN: findControlfileBackup onlyStandby is set to FALSE
DBGRCVMAN: thisBackupAge= 0
DBGRCVMAN: findControlfileBackup allCopies is TRUE
DBGRCVMAN: EXITING findControlfileBackup with: UNAVAILABLE
The trace gave the same information that rman couldn’t find any valid backup for controlfile,which was consistent w/the error message we got.
There was some inforamtion about incarnation, just serveral lines above findControlfileBackup .
DBGSQL: EXEC SQL AT RCVCAT begin dbms_rcvman . setDatabase ( upper ( :dbname:dbname_i ) , :rlscn , :rltime , :fhdbi:fhdbi_i , :db_uniqu
e_name ) ; dbms_rcvman . setCanApplyAnyRedo ( TRUE ) ; end ; [17:17:44.500]
……
DBGRCVMAN: ENTERING setDatabase
DBGRCVMAN: setDatabase db_id is null
DBGRCVMAN: this_db_unique_name= TEST
DBGRCVMAN: Fetching incarnation records
DBGRCVMAN: incarnation record id=0
DBGRCVMAN: icprs=2583312497322,icprc=07-MAR-12
DBGRCVMAN: icrls=2583318941550,icrlc=03-APR-12
DBGRCVMAN: icpinc=67118
DBGRCVMAN: icflg=CURRENT
DBGRCVMAN: incarnation record id=1
DBGRCVMAN: icprs=2478183021921,icprc=17-JAN-04
DBGRCVMAN: icrls=2583312497322,icrlc=07-MAR-12
DBGRCVMAN: icpinc=67324
DBGRCVMAN: icflg=PARENT
DBGRCVMAN: incarnation record id=2
DBGRCVMAN: icprs=,icprc=
DBGRCVMAN: icrls=2478183021921,icrlc=17-JAN-04
DBGRCVMAN: icpinc=
DBGRCVMAN: icflg=PARENT
DBGRCVMAN: number of incarnation=3
DBGRCVMAN: EXITING setDatabase
DBGRCVMAN: canApplyAnyRedo is set to TRUE
DBGANY: current incarnation now matches [17:17:44.518] (krmksetdb)
The information here told us that there were three incarnations for this database, and the reset SCN of current incarnation is 2583318941550. If we checked the backupsets against this SCN, we would find the all backups of controlfile were taken before this SCN, that is, there was no backup of controlfile for current incarnation. This is why the restore command failed. Verified the incarnation information in RMAN.
RMAN> set dbid=381083800; executing command: SET DBID database name is "TEST" and DBID is 381083800 RMAN> list incarnation; RMAN>
Couldn’t find any incarnation inforamtion from this command, but there is a view called rc_database_incarnation in the recovery catalog . Next, let’s check that view.
SQL> select DBINC_KEY,PARENT_DBINC_KEY,CURRENT_INCARNATION, RESETLOGS_CHANGE# ,RESETLOGS_TIME,PRIOR_RESETLOGS_CHANGE#,PRIOR_RESETLOGS_TIME,STATUS 2 from rc_database_incarnation 3 where NAME='TEST'; DBINC_KEY PARENT_DBINC_KEY CUR RESETLOGS_CHANGE# RESETLOGS PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------------- --- ------------------ --------- ----------------------- --------- -------- 67118 67324 NO 2583312497322 07-MAR-12 2478183021921 17-JAN-04 PARENT 67324 NO 2478183021921 17-JAN-04 PARENT 82366 67118 YES 2583318941550 03-APR-12 2583312497322 07-MAR-12 CURRENT
We got all the incarnation information for this database,which is consistent w/the inforamtion in trace file. The current incarnation is set to 82366.
OK, let’s reset the incarnation to the previous one and rerun the restore command to see what would happen.
RMAN> RESET DATABASE TO INCARNATION 67118; Database reset to incarnation 67118 RMAN> run { 2> allocate channel t1 device type 'SBT' ; 3> restore controlfile ; 4> } .... channel t1: restore complete, elapsed time: 00:05:45 output filename=/oradata/test/u01/ctrl1.ctl output filename=/oradata/test/u02/ctrl2.ctl output filename=/oradata/test/u03/ctrl3.ctl Finished restore at 11-APR-12 released channel: t1 RMAN>
Controlfile is restored successfully.
If the list command of rman told us the incarnation with the backupset information, it would be more easier for us to find the issue.
script to calculate the shared memory used by oracle
Got the following error when starting a new database on solaris 10.
SQL> startup nomount; ORA-27102: out of memory SVR4 Error: 22: Invalid argument
There are more than 20 databases running on this server. The shared memory reached the shrehold defined in /etc/project.
The following script was used to calcuate the total allocated shared memory to Oracle on the server.
PROJECT="oracle" SharedMem=`ipcs -Jbm | grep $PROJECT | awk '{print $7}' ` total=0 num=0 for shm in ${SharedMem[@]} do total=`expr $total + $shm` num=`expr $num + 1` done echo "shared memory segment used by $PROJECT: number of segments: $num total size: $total Bytes"
significant unaccounted-for time
User reported one of their reports ran very slow. Multipass sort was observered from AWR report, and the sort operation in execution plan took most of time, so the issue seemed related w/the workarea.
Database version: 10.2.0.3.
Automatic pga memory management was used for this database, and the pga_aggregate_target was set to 24M only while sga had 2G. Obviously this paramater was set too low, which could be confirmed from pga advisor too. The performance of this query got improved a lot by increasing the PGA.
However, I was more insterested in the ET unaccounted-for time since lots of ET unacounted-for time was found for this query. From below report, ET unaccounted-for time was two times of CPU time, and Non_idle wait time was 0, so what led to this significant ET unaccount-for time? Why non idle wait time was 0 if we had to sort to disk?
Moreover, when this query was running, I didn’t see lots of “direct path read temp” from EM console, which was one of my expectations when Oracle was short of work area and had to sort to disk.
From top activity, I only saw some CPU usage, but no “direct path read temp” at all, so where was the “direct path read temp”? Did “direct patch read temp” never happen? If it did happen, why it didn’t show up here?
So I collected the session event statisitcs before and after runing. According to result below,this “direct path read temp” did occur, and the session waited on this event for more than 1 million. But if you checked total waited time, how could Oracle do million reads in less than 2 seconds?
SQL> select c.name event, b.TOTAL_WAITS - a.TOTAL_WAITS TOTAL_WAITS, b.TOTAL_TIMEOUTS - a.TOTAL_TIMEOUTS TOTAL_TIMEOUTS, b.TIME_WAITED_MICRO -a .TIME_WAITED_MICRO TIME_WAITED_MICRO from run_events a, run_events b, V$EVENT_NAME c where a.EVENT_ID = b.EVENT_ID and a.runid = 'before' and b.runid = 'after' and a.event_id=c.event_id and b.TOTAL_WAITS - a.TOTAL_WAITS > 0 order by TIME_WAITED_MICRO desc; EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED_MICRO ------------------------------ ----------- -------------- ----------------- SQL*Net message from client 2 0 1740927244 direct path read temp 1216782 0 1842493 direct path write temp 33136 0 112112 SQL*Net more data from client 1 0 23 SQL*Net message to client 2 0 6
The answer comes from Oracle document for this event(http://docs.oracle.com/cd/B16240_01/doc/doc.102/e16282/oracle_database_help/oracle_database_wait_bottlenecks_direct_path_read_pct.html)
Although it describes the “direct path read”,”direct path read temp” is similiar.
For this database, synchronous I/O was used, which also surprised me.
SQL> show parameter filesystemio NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ filesystemio_options string NONE
Thus when the session performs “direct path read temp”, it is blocked, but the session doesn’t wait on “direct path read temp” event at this moement. It shows a a wait on “direct path read temp” when it picks up the data when I/O request is completed, and this wait will return immediately since no physical I/O is required and the data are ready. Therefore, although Oracle session waits on I/O to be completed, but this time doesn’t get counted as “direct path read temp”. Actually it doesn’t get counted either in any wait event, or in CPU time, so this is the reason we saw significant unaccounted-for time here, and why the total waited time is less than 2 seconds although we got more than 1 milion “direct path read temp”.
This behaviour can be confirmed from ASH and AWR report also.
From ASH, the most elapsed time of this sql was spent on CPU+Wait for CPU,but the AWR report for the same time period reported CPU time was only 35% of elapsed time .In ASH the sessions get counted on CPU if they are not known to be on a WAIT event(http://oracledoug.com/serendipity/index.php?/archives/1432-Time-Matters-DB-CPU.html), so even if a proccess is sleeping or in running queue, the time would be calcualted as CPU+Wait for CPU. However, AWR gets CPU time from OS level, so only the time actually spending on CPU is counted. This significant unaccounted-for time was the time the process was in sleeping or in run queue.