Full-stack Philosophies

James Morle's Blog

RSS Feed

The Oracle Wait Interface Is Useless (sometimes) – Part One: The Problem Definition

Posted on 2:25 pm November 9, 2009 by James Morle

So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.

I thought it would be logical to start this process by producing a more concrete problem definition, so that's the focus of this part. It's unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!

So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow - that's quite a claim, isn't it? This isn't the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap's works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here's why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.

So what's wrong with the Oracle wait interface? Just one thing, actually - it is designed to  provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using gettimeofday(2) system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense - disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that's the beginning of the justification that the wait interface is useless,  in a nutshell.

But hang on, isn't 50 microseconds really, really fast? Well no, actually, it isn't. Taking Intel's Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range - three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) - in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it's very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.

Time for an example. Actually, this is a variation on the example that made me start thinking about this - I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!

OK, so a user just called you, complaining that his query is simply not ever completing. Following the usual course of action, we might take a few samples from v$session_wait (or v$session from 10g onwards) to look at the current wait state for the process, just in case it's something obvious:

SID EVENT                          PROCESS                  STATE

---- ------------------------------ ------------------------ -------------------
 8 SQL*Net message to client      15032                    WAITED SHORT TIME
Well, that isn't too revealing. Let's now turn on extended sql tracing to get a list of all wait events as they transition:
SQL> oradebug setospid 15033
Oracle pid: 20, Unix process pid: 15033, image: oracle@elise03.sa.int (TNS V1-V3)
SQL> oradebug event 10046 trace name context forever, level 12
Statement processed.

After a few seconds, let's see if there is anything in the trace file:

*** 2009-11-09 10:54:36.934
*** SESSION ID:(8.10393) 2009-11-09 10:54:36.934
*** CLIENT ID:() 2009-11-09 10:54:36.934
*** SERVICE NAME:(SYS$USERS) 2009-11-09 10:54:36.934
*** MODULE NAME:(sqlplus@elise03.sa.int (TNS V1-V3)) 2009-11-09 10:54:36.934
*** ACTION NAME:() 2009-11-09 10:54:36.934

Received ORADEBUG command (#1) 'event 10046 trace name context forever, level 12' from process 'Unix process pid: 15082, image: <none>'

*** 2009-11-09 10:54:36.935
Finished processing ORADEBUG command (#1) 'event 10046 trace name context forever, level 12'

No, this is not truncated output - there is nothing in this file at all, apart from the actual invocation of the tracing. OK, so what's next? How about looking at 'top' to see if the process if busy:

top - 10:53:15 up 2 days, 19:50,  4 users,  load average: 1.04, 0.55, 0.57
Tasks: 148 total,   2 running, 146 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.8%us,  0.4%sy,  0.0%ni, 73.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3995468k total,  3877052k used,   118416k free,   115432k buffers
Swap:  6029304k total,   170388k used,  5858916k free,  3289460k cached

PID   USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15033 oracle    25   0 2901m 1.1g 1.1g R 99.9 28.5   3:01.79 oracle
3284  oracle    15   0 2903m  54m  46m S  1.7  1.4  60:19.33 oracle
15071 oracle    16   0 2902m  24m  21m S  1.3  0.6   0:00.04 oracle
15069 oracle    15   0 12740 1112  816 R  0.7  0.0   0:00.07 top

Crikey, our process is more or less consuming 100% of a CPU! So, we are not waiting for anything, but the user still has no results, and the process is very busy doing something. Let's now try the next logical step - let's truss/strace the process:

[oracle@elise03 trace]$ strace -tp 15033
Process 15033 attached - interrupt to quit
11:28:06 gettimeofday({1257766086, 104118}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:06 gettimeofday({1257766086, 104402}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 105930}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 106186}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 110887}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 111143}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
Not much to see here. Those system calls are emitted every two seconds, and are almost certainly just the result of Oracle updating the statistics for the database time model (v$sess_time_model), but we're really not doing much at all in terms of system calls. Hang on a minute, the database time model - surely that will help us here?
SQL> select stat_name,value from v$sess_time_model where sid=8;
STAT_NAME                                          VALUE
--------------------------------------------- ----------
DB time                                       2802976117
DB CPU                                        2801767067
background elapsed time                                0
background cpu time                                    0
sequence load elapsed time                             0
parse time elapsed                                 74853
hard parse elapsed time                            66626
sql execute elapsed time                      2802910770
connection management call elapsed time            21308
failed parse elapsed time                              0
failed parse (out of shared memory) elapsed t          0
hard parse (sharing criteria) elapsed time         62605
hard parse (bind mismatch) elapsed time                0
PL/SQL execution elapsed time                       5377
inbound PL/SQL rpc elapsed time                        0
PL/SQL compilation elapsed time                    18303
Java execution elapsed time                            0
repeated bind elapsed time                             0
RMAN cpu time (backup/restore)                         0

19 rows selected.

SQL>

At last, we have some kind of a symptom, and confirmation that the user is indeed actually trying to do useful work rather than just spinning on the CPU. The symptom is that there is an increasing amount of time being allocated as 'sql execute elapsed time'.  In my opinion, this is where the time model statistics fail to deliver when the opportunity is presented to it. There are just 19 statistics reported here on 11gR2, and the only help we are given from its output is that we are using a lot of DB time, a lot of DB CPU and a lot of sql execute time. So we can surmise that we are doing a very CPU intensive query, and that's about it - no finer granularity provided and this would seem to be the logical place for such instrumentation...

OK, so we are now just confirmed - we have a nasty query that is just using CPU and never waiting for anything. Let's not start guessing at this stage what the problem is, let's try and find out the real answer. At this stage, we might jump to v$sql_plan_statistics_all to find out what is going on. These statistics are only updated when statistics_level is set to ALL, and even then do not update until the statement is complete. In our little example here, the query is already running - we can't set statistics_level=all, and we can't wait until the query completes - it might never do so! Of course, it's probably OK to get the user to kill the query and restart with statistics_level=all, or even the ctrl-c would allow the stats to update. However, let's assume neither of those things are possible, as this is only an example case.
So what techniques can we use to find the problem? One of them might be the new SQL Execution Monitoring in 11g (which looks very nice), if you have the required license - but that is a very specific case where the problem is a SQL execution problem. What if it is not a SQL execution problem? We need a more general method for finding the answer. That's the subject of part two of this blog, over to Tanel for Part Two!


23 comments on “The Oracle Wait Interface Is Useless (sometimes) – Part One: The Problem Definition

    • Alex,

      Yes there was, and I could locate the statement successfully in the shared pool and associated views. The query is a deliberate disaster area, designed to do bad things in a certain way. :)

      James

      • Something like
        select /*+ use_nl(a b) */ count(*) from (
        select 1 num from dual connect by level <= 100000) a,
        (
        select 1 num from dual connect by level <= 100000) b
        where a.num=b.num;
        ?

    • What James has set up with his blog entry (and I will continue) is that even though you could look into the execution plan of the problem statement, this doesn't fill the information gap WHY the query is slow. The execution plan only shows you what kind of operations and in which order would be done when someone runs the statement, but it doesn't tell you what's actually happening and where the time is going.

      So, the execution plan may reveal what's wrong if there's something obvious and you get lucky - but this only works in subset of problem cases and we don't want to be dependent on luck!

      Stay tuned to my "reply" to this post :)

      • Exactly. This example just happens to be something weird with SQL. It might not be, though, and we are trying to speed up and generalise the diagnostic procedure for any situation. For example, what if it was some PL/SQL (and so no plan) just burning CPU one day, when it always used to run fine? PL/SQL also has diagnostic/profiling tools, but we just want to jump in more generally than that to find the specific problem, even if the problem cannot be reproduced in development.

      • Good news, so I will be equipped before the presentation. I make most out of presentations when the topic is familiar.

        I will attend only 1 day and attend all your presentations I hope you won't get tired after 3 presentations on the trot. See you in Birmingham

  1. Fun read - Nice set up for the example !
    It's refreshing to see a straight forward logical approach to the problem (instead of just dumping out reams of reports like statspack or awrrpt etc)

  2. I faced a similar issue long time back. And I tried the exact same ways mentioned in the above note to debug the issue. I even tried to "nice" the process using "priocntl". I realised the mistake when i looked at the plan. Alas, there was a Cartesian Join in it !

  3. I get a whiff of dtrace and systemtap scripts from the kitchen. Tanel's latchprofx script helped me a lot in debugging latching issues. I know that his new secret project will be equally helpful for all the DBA's.
    regards
    srivenu

  4. Pingback: Blogroll Report 06/11/2009-13/11/2009 « Coskan’s Approach to Oracle

  5. James,

    Here is a technique I use when troubleshooting runaway query. It is based on data from v$sql_plan_statistics already mentioned in your article. In my opinion, plan statistics have tremendous value and potentially a good alternative to 10046 traces. The biggest obstacle to a wide adoption is the horrible slowness caused by high CPU overhead in "statistics_level=all" mode. But it is OK to do that at session level, even in production system. So, the execution of the query is staged in a separate session where we can set statistics_level=all and recreate run environment. Here is the full sequence of actions:

    - Find SQL_ID / Child_number of runway SQL from v$session;
    - Find plan_hash_value from v$sql using SQL_ID and CHILD_NUMBER;
    - Assuming the query is using bind variables, capture bind variables from v$sql_bind_capture. There can be problems capturing bind variables, for brevity I will skip details. If no bind variables, skip this step;
    - In a separate session "alter session set statistics_level=all";
    - Start executing SQL in this session. If there were bind variables, set same values and, even more importantly, same datatypes as in Capture step;
    - If the query is absolutely hopeless, cancel execution (send break) after some time, even 10-15 secs should be sufficient;
    - Find SQL_ID of just executed query by looking in v$session.prev_sql_id. If sql text was not changed, the SQL_ID should be same as in step 1. Very important: it should have same Plan_hash_value, otherwise revise previous steps;
    - Examine plan statistics from v$sql_plan_statistics. The beauty of plan statistics is that it provides data even on partial execution. It can be a separate article dedicated to the interpreting plan statistics. Once becoming familiar with them and having a decent viewer, one should be able to see absolutely clear where the problem is. No guess work, it's all in numbers. 10-15 seconds of execution should be enough for the query to get to the steps where it will be running for lo-o-ong time, they can be easily identified.

    This recipe looks rather tedious, and it really is, unless the tool is used that does all that things. The point is, once I was free from all that routine, I was able to troubleshoot not only runaway queries (which are rare anyway) but simply expensive queries with a lot of gets/execution. I should add that it makes sense to use this technique in really complicated cases or when in doubts. Most often, problems are simple and a quick look at the execution plan is enough to spot the cause.

    • Hi Sergei,

      Again, that's a SQL-specific approach, but yes, if you can execute the problem query in a separate session with statistics_level=all, it will reveal where the work is taking place.
      Regards

      James

      • This is IF the problem is due a SQL execution issue anyway... That's only a subset of problems you experience. Sh*t can happen during parsing, opening/closing cursors, taking locks, latches, searching for segment space, allocating/freeing space etc etc etc etc etc you get the point!!!

        A truly systematic approach works in every case, so the starting point should NOT be just looking into execution plan in hope of finding something from there. This can actually mislead you.

        Looking into exec plan/and its execution profile is definitely a very useful thing to do - but not just as the STARTING POINT of troubleshooting...

  6. Tanel,

    What would be your approach in the context of the article (... a user just called you, complaining...)? More often than not, that will be a SQL problem. So doing the SQL sanity check is logical before doing more fancy stuff. The execution plan can be misleading, but not for experienced troubleshooter. The plan statistics help to interpret plan correctly. Actually, the plan statistics were mentioned in the article and, kind of, dismissed. I wanted to say, that they still could be used in the situation with very good results. I agree it is not always SQL, so what would you do?

  7. Hey James,

    Can I see 3 pstack outputs from that session supposedly "trying to do useful work rather than just spinning on the CPU?"

    That leads me to question number two. How can a query "simply not ever completing" ever be construed as "useful work?"

    Sorry if I'm sounding like a troll. I know you'll take it out on me the next time we beer.

  8. I agree with Kevin, mm.. for this case I'll make use of Tanel's "os_explain" tool to convert the Oracle kernel functions on pstack to a more readable format ala Explain Plan so I would know which row source is taking too long (if this is an SQL) or the specific function where it hangs/spin on CPU (if let's say doing a simple EXPDP)..

    But before this, I would look at (or diagnose) it end-to-end. Meaning, I'll have some tools that will help me to monitor/instrument the OS (cpu,memory,disk), application & database. And I should be able to look back and get some trends, get meaningful data, and must be able to correlate something from the different circles (areas) that I'm looking into. Much like the Venn diagram, when you find the overlapping areas and you were able to correlate it to the problem (with facts,numbers,& figures), then that's probably the culprit!

  9. Pingback: Beyond Oracle Wait Interface - Part 2 | Tanel Poder's blog: Core IT for Geeks and Pros

Leave a Reply