Full-stack Philosophies

James Morle's Blog

RSS Feed

Log File Sync and AWR – Not Good Bedfellows

Posted on 11:32 am June 2, 2010 by James Morle

I'm sure many of you are already enlightened enough about the dangers of ratios and averages, the most famous blunder of all being the 'cache hit ratio'. It's not that all ratios and averages are useless, but it is surely the case that most are useless, and that all can be misleading. This posting is about an example of a misleading average, one that hides the truth from the reader unless they look elsewhere.

I am actually a big fan of AWR (and Statspack). They are chock full of misleading averages and ratios, but that does not mean they don't have value. Often it is more a case of reading what is absent from the report, reading between the lines and so forth, that makes them so valuable. I'm also frequently asked to fix problems that are no longer occurring - without the history that AWR provides this would be an impossible task, assuming that Active Session History would therefore also be unavailable.

I was recently asked to look over some AWR reports from a performance testing environment  to determine what was going on. As usual, I started by looking over the top section of the reports to get a feel for the workload and to look at the wait events. Here's the load profile:

                 Per Second   Per Transaction
                 ----------   ---------------
Redo size:       140,604.47            325.65
Logical reads:    16,712.80             38.71
Block changes:       904.42              2.09
Physical reads:        7.59              0.02
Physical writes:      70.76              0.16
User calls:        2,069.02              4.79
Parses:	             342.31              0.79
Hard parses:           0.07              0
Sorts:                52.54              0.12
Logons:	               0.47              0
Executes:            342.82              0.79
Transactions:        431.77

A few things immediately stand out on this load profile:

  • this is not a really busy system
  • there is virtually nil I/O to/from the datafiles
  • it is using bind variables (hard parses is low)
  • it is still making parse calls for every statement (parses=executes)
  • it commits or rollbacks very aggressively (transactions high compared to executes)

It's the latter point that is most notable here. Transactions is actually higher than executes, meaning there are more commit/rollback calls than there are calls to execute SQL. So, we know we have an application that is committing pretty much after every statement. Let's move on to the wait events:

Event                       Waits  Time(s)  AvgWait   % Total Wait Class
-----                       -----  -------  -------   ------- ----------
log file sync             357,552   22,427       63    90.4   Commit
CPU time                             1,636      6.6
log file parallel write   275,265      972        4     3.9   System I/O
read by other session       7,848      533       68     2.1   User I/O
db file parallel write     87,473      325        4     1.3   System I/O

It's pretty clear from the wait events that this really is a commit-heavy application. I validated this with the developer - each transaction is basically an INSERT or an UPDATE followed by a commit. But here's the mystery - why are the 'log file sync' wait times so much higher than the 'log fiel parallel write' times? An average write time of 4ms isn't fantastic by any stretch of the imagination, but it's not bad, and where are the other 59ms going to? I had a bit of a think about this and found the answer - the truth is hidden between the lines, and disguised by the averaging inside AWR.

The key to this mystery is what the specific wait events mean, and which sessions wait for them:

  • log file sync - user sessions wait on this event after a commit is issued
  • log file parallel write - LGWR is the only process that waits on this event, and it is the actual I/O (and associated setup) that stamps the redo buffer information onto disk

The process flow goes a bit like this for a session making a commit:

  1. User session issues a commit and goes into a 'log file sync' wait
  2. LGWR is posted to flush the redo buffer to disk
  3. LGWR issues the I/O and waits on 'log file parallel write'
  4. LGWR returns from I/O, clears wait, and posts the user session to signal completion
  5. User session clears 'log file sync' wait

There are other operations that I have deliberately omitted from that description for clarity (they are not really relevant for the purposes of this discussion). In the case where there are many sessions wanting to commit, the process is slightly different, because of Oracle's group commit (or piggyback commit) functionality. If a second session issues a commit while the LGWR is busy writing the redo for the first session, it too will wait on log file sync. If a third, fourth, fifth and sixth session issue commits, they will also wait on log file sync. They will wait for the LGWR to finish writing the redo buffer required for the first session, then immediately get posted to write the redo for the newly waiting sessions. Theses sessions (2-6) will continue to wait until this second write is complete.

What this means is that we have a strange kind of queue. It can only be two entries deep (the session currently being taken care of by LGWR and the session(s) that committed after the first session) and infinitely wide (all the subsequent sessions are handled as a single request). What it means in terms of AWR, however, is that the averages get distorted beyond all recognition.

I'll tell you now that the underlying problem with this system was a response time problem on redo writes, validated by looking the LGWR trace file which was littered with 'slow write' warnings (what a great feature, introduced in 10g!). I believe the write cache on the storage was getting saturated, but have not yet been able to confirm that with the client's storage team. So why was this not being reflected in the 'log file parallel write' wait events? The reason is because the slow writes were very slow (some greater than one second), but the majority of the writes were very fast. Consider the following set of write times:

  • 3000 writes take 0.5ms
  • 20 writes takes 1000ms
  • 2000 writes take 1ms

The total time for all those writes is 23,500ms. Divide that by the number of requests (5020), and the average write time is 4.68ms, close to the number reported in the AWR report. The trouble is that 85% of the time is spent in only 0.4% of the calls, which is dragging the average write time down (even in very rapid AWR sampling this would be the case). That's the first part of the problem.

The second part of the problem is what happens when one of these outlier 1000ms I/O times takes place. We can see that the average (!) execute rate was 342/s, and let's assume that all of these executions were DML that had a commit after them. We could use the Transactions rate, but that will also include a bunch of rollbacks, and I happen to know from the developer that each execute is a DML operation. So, we have an arrival rate of 342 commits per second and occasional peaks of I/O time of 1000ms. As soon as one of these spikes occurs, 342 sessions will start to accumulate a portion of that 1000ms (depending upon where in that period they issued the commit). If we assume that they arrive evenly spread out across that 1000ms, we can use the area of a triangle to calculate the total time clocked up to log file sync during that period:

  • Area of right-angled triangle = 1/2 base x height
  • Base=1000ms
  • Height=number of sessions waiting, maximum 342
  • Answer=171,000ms

So, the total wait time attributed to log file parallel write is only 1000ms during this period, but the total wait time attributed to log file sync is 171,000ms. You see what I did there? I concealed the calculation of an average into something called 'area of a triangle'. Shh, don't tell anyone.

When this slow write is over, the sessions will once again wait for the fast writes, perhaps one or two sessions at a time waiting for sub-millisecond writes, which means the average time is not dragged down quite so far as the 'log file parallel write' wait event. Let's apply the waits to the same theoretical model as the log file parallel write to see:

  • 3000 single log file sync waits at 0.5ms = 1500ms
  • 20 periods of log file sync madness described above = 3,420,000 ms
  • 2000 single log file sync waits at 1ms = 3000ms

Total number of waits here is 5000+6840 (this is 20* 342)=11,840 total waits, compared to 5020 for the log file parallel write. The total wait time, however, is 3,424,500 ms, making an average wait time of 289ms which would be reported by AWR for 'log file sync', compared to a 4.68ms 'log file parallel write' time. That's somewhat more exaggerated than the actual figures in the AWR report, but it does highlight the issue quite well.

AWR is a useful tool, but issues like this are masked. Use ASH or session based tracing for these kind of problems, and the answer will become clear!


23 comments on “Log File Sync and AWR – Not Good Bedfellows

  1. Hi James,

    Very interesting post and I would like to discuss two things.

    1. What about inter-session communication between LGWR and number of sessions ? Just wondering if this time should be to taken into consideration. If LGWR has to send confirmation to ex. 100 sessions ?

    2. Average on every statistic can mask real problem. If we are going to use average you should take a look on standard deviation, use median or best of all add histogram to that value.

    regards,
    Marcin

    • Hi Marcin,

      Point 1 is valid, and may be a contributor. But I think the mathematics speak for themselves as to where the time is going.
      Point 2 is the whole 'point' of the article - averages (means) suck!

      Cheers

      James

    • Yeah I did consider that aspect at first, but the type of workload, number of processes, amount of free CPU cycles, and relatively small write sizes makes the 'LGWR busy' argument incorrect in this case. And technically speaking, the 'log file sync' wait event is not actually waiting for an I/O at all - it's waiting for a post from the LGWR!

        • Hi,

          I don't have one saved, but I did indeed trace the LGWR process to confirm this. However, it was not really necessary in this case because the LGWR now dumps warnings anyway as soon as the writes exceed 500ms.
          Cheers

          James

  2. James,
    Very nice post very informative. I have One question

    Did you see anything in Wait Event Histogram part of AWR report in terms of spreading of the wait duration ?

  3. Pingback: Log Buffer #191, A Carnival of the Vanities for DBAs | The Pythian Blog

  4. Hi,
    Really an interesting post.I wanted to know as you said the application is using bind variables because hard parse is very low, but still making parse calls for every statement (parses=executes), what could be the probable reasons for the same.Why parses=executes?

    Regards,
    Anand

    • Anand,

      When 'hard parsing' is low but 'parsing' remains high this means that the application is still making explicit parse requests. Oracle still has to do much of the work of the parse in this case. It's normally because of a naive application that always follows the parse->bind->execute->fetch phases for a statement, event though the parse phase only needs to be done once, like this:
      1 parse()
      2 execute() -> bind() -> execute() -> fetch()
      3 cancel()
      4 repeat from (2)

      Does that help?

      James

      • Hi James,

        Thanks for the input.I have applications where parse->bind->execute->fetch is happening for most of the queries, what should i recommend or look into?

        Regards,
        Anand

        • Hi Anand,

          I would recommend that you first investigate whether or not this is causing you a real problem. Trace some sessions and see where the time goes, and check for the cost of the parsing in v$sysstat and v$sys_time_model. If parsing is causing you a problem, you need to look into the application. There are many ways to improve this, but the best one by far is to fix the application so that it parses a given statement only once (or few times) and re-uses that same cursor. This is as simple as moving things like prepareStatement() to the outside of loops, and having a cancel() at the bottom of the loop (Java example).
          Regards

          James

  5. Insightful, as usual 🙂

    One question: under what circumstances can the #waits on log file parallel write to be higher than log file sync? ( ie, opposite to the example above) Using your explanation of the process flow of a commit, I can't think of a reason for this, but I have seen it.

    Thanks.

    • Hi Andy,

      Yes, this can also happen, for the same reason (the relationship of lfs to lfpw). If your sessions are committing infrequently but many of the log writes are slow, then there is a reasonable chance that many of the sessions will wait on lfs for only a portion of the lfpw time. lfpw will also be happening for the other, non-commit, reasons such as log buffer space and timeouts, so the slow writes will continue even in the absence of the commits...
      Cheers

      James

  6. I've personally come across this type of issue around 18 months back on a 10g RAC 2-node database in an active/active configuration. We used to experience bizzare slowdowns in the system, even under low workoads.

    We scratched our heads for many months and looked at it from all sorts of angles. Our I/O sub-system was performing well and was reflected in low 'log file parallel write' times. I finally traced the 'log writer' and found the issue when I noticed sporadic slowdowns were tied up in 'control file write' event. Upon further investigation I found our control file had bloated to its maximum of 330MB or so. I then rebuilt the control file and we've never experienced a slowdown again.

    Why the bloated control file? That's another blog, the one I don't write, entry all unto itself.

  7. Hi James,

    Are these assumptions?

    Consider the following set of write times:

    •3000 writes take 0.5ms
    •20 writes takes 1000ms
    •2000 writes take 1ms

    If not,how did you get those information?

    Thanks

    • Hi Antony,

      The quoted numbers are a simplified model based upon actual observations on a real production system. Exact numbers can be obtained from v$event_histogram, though you would not be able to differentiate any 0.5ms writes this way - it would require stats from the OS or from strace/truss.
      Cheers

      James

  8. Pingback: Использование SSD дисков для redo logs | Oracle mechanics

  9. Pingback: How not to troubleshoot log file sync waits | Oracle Diagnostician

  10. Pingback: Troubleshooting ‘Log File Sync’ Waits | Oleksandr Denysenko's Blog

Leave a Reply