Hint: gather plan statistics

My favourite method when analyzing poor performing SQL is the hint gather_plan_statistics. You can get so much more of valuable information when diagnosing explain plans. Here's how the story usually goes.

You are faced with a poor performing SQL statement. First thing you do - check the explain plan and it looks OK. Indexes are being used, estimated rows are low, so why the heck is the statement producing a lot of LIO?


---------------------------------------------------------
| Id | Operation | Name | E-Rows |
---------------------------------------------------------
| 1 | NESTED LOOPS | | 1 |
| 2 | TABLE ACCESS BY INDEX ROWID| TABLE_1 | 1 |
|* 3 | INDEX RANGE SCAN | INDEX_1 | 1 |
|* 4 | TABLE ACCESS BY INDEX ROWID| TABLE_2 | 1 |
|* 5 | INDEX UNIQUE SCAN | INDEX_2 | 1 |
---------------------------------------------------------

You re-run the statement with the hint gather_plan_statistics and explain plan again, using dbms_xplan.display_cursor and setting the third parameter to RUNSTATS_LAST, which shows you the last run time statistics:

select *
from   table( dbms_xplan.display_cursor ( '8g51ffxg663m3' , null , 'RUNSTATS_LAST' ) ) ;

You get additional columns:

  • Starts: how many times was the step performed
  • A-Rows: actual rows proccessed
  • A-Time: actual elapsed time spent
  • Buffers: number of logical reads


--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                |         |      1 |      1 |    501 |00:00:00.09 |   27921 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TABLE_1 |      1 |      1 |   9456 |00:00:00.03 |     255 |
|*  3 |    INDEX RANGE SCAN          | INDEX_1 |      1 |      1 |   9456 |00:00:00.01 |      57 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| TABLE_2 |   9456 |      1 |    501 |00:00:00.10 |   27666 |
|*  5 |    INDEX UNIQUE SCAN         | INDEX_2 |   9456 |      1 |   9222 |00:00:00.06 |   18444 |
--------------------------------------------------------------------------------------------------

Now it's obvious that the estimated rows are way off and you can start working on that.

enq: TM - contention

Every now and then I'm being asked by my fellow coworkers to diagnose locking issues, aka. "the application is frozen". A quick glance at gv$session or gv$active_session_history shows a lot of sessions waiting for event "enq: TM - contention".

select inst_id, sid, serial#, event
from gv$session

select inst_id, sample_time, session_id, session_serial#, event
from gv$active_session_history

Documentation tells us that this kind of wait event can be caused by missing indexes on foreign key constraints. Scripts exists to find those constraints but it certainly doesn't make sense to blindly create indexes for all foreign keys. For example: you have a code table for currencies (USD, EUR,...) and a table for item prices with a foreign key constraint pointing to currency. You rarely delete a currency or query item prices by currency, so there really is no point in indexing the currency column. Additional space will be consumed for the index and additional work will be done on insert/update/delete of item prices to update the index.

When locking takes place basically you want to know two things:
  • which index on which table is missing and
  • which sql statement caused the locking.

Step # 1 - find the table


First, you query the ASH (active session history) and find the object id, which is the table that was locked due to a missing index. The object id is reported in p2 column.

select inst_id, sample_time, session_id, session_serial#, p2 object_id
from gv$active_session_history
where event = 'enq: TM - contention'

INST_ID SAMPLE_TIME SESSION_ID SESSION_SERIAL# OBJECT_ID
1 9.7.2009 19:31:30,419 28 38 15705
1 9.7.2009 19:31:29,419 28 38 15705
1 9.7.2009 19:31:28,419 28 38 15705
1 9.7.2009 19:31:27,410 28 38 15705
1 9.7.2009 19:31:26,409 28 38 15705
1 9.7.2009 19:31:25,409 28 38 15705
1 9.7.2009 19:31:24,409 28 38 15705

You get the table name from all_object view.

select object_name
from all_objects
where object_id = 15705

Now that you know the table, you can focus on the foreign key constraints of that particular table. The ones that don't have indexes are candidates for the next step - finding out the sql that started the locking.

Step # 2 - find the sql


All the sql statements that haven't aged out of the SGA are in the gv$sql view. Chances are, you will find the statement that caused the lock. By searching for delete operations on referenced tables, you will soon find the statement.

select inst_id, sql_text, sql_id, first_load_time, parsing_user_id, module, action, program_id
from gv$sql
where upper(sql_text) like 'DELETE%[REFERENCED TABLE]%'
order by first_load_time desc

Since the statement must have been run before the locking took place, the first_load_time must be sometime before the first sample_time from ASH. From my experience, the difference is always about a second:

  • first_load_time in gv$sql is 9.7.2009 19:31:23
  • earliest sample_time in v$active_session_history is 9.7.2009 19:31:24,409

Of course, this is not always the case. The delete operation could happen minutes or even hours ago but the transaction is still active, thus still holding the lock on the table. By that time, the statement could have been removed from the SGA. Despite this, I was able to track down the sql statement and the missing index successfully with this method.


Once you know which sql is causing the problem, it's up to the developers to create that index or to prevent the users to issue the delete statement. A solution is to flag the row as deleted and then delete it during night time.

Interpreting histogram values

Querying the user_histograms view you might get null values in endpoint_actual_value column. You can work around this issue and get the value by converting endpoint_value. In the case of a varchar field, you can use hexstr function as Tom Kyte suggests. In the case of a date field, you can get the value with to_date function, specifying the format mask 'J' (Julian):

select table_name table
, column_name column
, endpoint_number number
, endpoint_value value
, endpoint_actual_value actual_value
, to_date (endpoint_value, 'J') date_value
from user_histograms
where table_name = :table_name
and column_name = :column_name
order by 1, 2, 3;

TABLE COLUMN NUMBER VALUE ACTUAL_VALUE DATE_VALUE
------------------------------------------------------------
TABLE COLUMN 0 2454774 3.11.2008
TABLE COLUMN 1 2454775 4.11.2008
TABLE COLUMN 7 2454778 7.11.2008
TABLE COLUMN 8 2454783 12.11.2008
TABLE COLUMN 9 2454790 19.11.2008
TABLE COLUMN 10 2454792 21.11.2008
TABLE COLUMN 11 2454797 26.11.2008
TABLE COLUMN 12 2454799 28.11.2008
TABLE COLUMN 13 2454802 1.12.2008
TABLE COLUMN 24 2454804 3.12.2008
TABLE COLUMN 26 2454806 5.12.2008
TABLE COLUMN 27 2454813 12.12.2008
TABLE COLUMN 28 2454816 15.12.2008
TABLE COLUMN 29 2454817 16.12.2008
TABLE COLUMN 30 2454820 19.12.2008
TABLE COLUMN 31 2454823 22.12.2008
TABLE COLUMN 32 2454825 24.12.2008
TABLE COLUMN 35 2454830 29.12.2008
TABLE COLUMN 36 2454831 30.12.2008
TABLE COLUMN 37 2454837 5.1.2009
TABLE COLUMN 38 2454859 27.1.2009
TABLE COLUMN 68 2455002 19.6.2009

You can obtain a "graphical" representation of the histogram by using analytic function LAG to get the difference of endpoint values and padding a string accordingly:

select to_date (endpoint_value, 'J') date_endpoint_value
, rpad('X', endpoint_number - lag(endpoint_number,1,0) over (order by endpoint_number), 'X') bucket_size
from user_histograms
where table_name = :table_name
and column_name = :column_name
order by date_endpoint_value;

ENDPOINT_VALUE WIDTH
------------------------------------------------
03.11.2008 X
04.11.2008 X
07.11.2008 XXXXXX
12.11.2008 X
19.11.2008 X
21.11.2008 X
26.11.2008 X
28.11.2008 X
01.12.2008 X
03.12.2008 XXXXXXXXXXX
05.12.2008 XX
12.12.2008 X
15.12.2008 X
16.12.2008 X
19.12.2008 X
22.12.2008 X
24.12.2008 X
29.12.2008 XXX
30.12.2008 X
05.01.2009 X
27.01.2009 X
19.06.2009 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

For some fancy graphics you can use Oracle APEX. Based on the previous query you can generate graphs like this one:



You can easily build an APEX application for graphic representation of all your histograms.

Let's go!

I think it's about time for me to start blogging. Mainly about Oracle, but maybe some other stuff will come around as well.