James Morle's BlogRSS Feed
Log File Sync and AWR – Not Good BedfellowsPosted on 11:32 am June 2, 2010 by James Morle
I'm sure many of you are already enlightened enough about the dangers of ratios and averages, the most famous blunder of all being the 'cache hit ratio'. It's not that all ratios and averages are useless, but it is surely the case that most are useless, and that all can be misleading. This posting is about an example of a misleading average, one that hides the truth from the reader unless they look elsewhere.
I am actually a big fan of AWR (and Statspack). They are chock full of misleading averages and ratios, but that does not mean they don't have value. Often it is more a case of reading what is absent from the report, reading between the lines and so forth, that makes them so valuable. I'm also frequently asked to fix problems that are no longer occurring - without the history that AWR provides this would be an impossible task, assuming that Active Session History would therefore also be unavailable.
I was recently asked to look over some AWR reports from a performance testing environment to determine what was going on. As usual, I started by looking over the top section of the reports to get a feel for the workload and to look at the wait events. Here's the load profile:
Per Second Per Transaction ---------- --------------- Redo size: 140,604.47 325.65 Logical reads: 16,712.80 38.71 Block changes: 904.42 2.09 Physical reads: 7.59 0.02 Physical writes: 70.76 0.16 User calls: 2,069.02 4.79 Parses: 342.31 0.79 Hard parses: 0.07 0 Sorts: 52.54 0.12 Logons: 0.47 0 Executes: 342.82 0.79 Transactions: 431.77
A few things immediately stand out on this load profile:
- this is not a really busy system
- there is virtually nil I/O to/from the datafiles
- it is using bind variables (hard parses is low)
- it is still making parse calls for every statement (parses=executes)
- it commits or rollbacks very aggressively (transactions high compared to executes)
It's the latter point that is most notable here. Transactions is actually higher than executes, meaning there are more commit/rollback calls than there are calls to execute SQL. So, we know we have an application that is committing pretty much after every statement. Let's move on to the wait events:
Event Waits Time(s) AvgWait % Total Wait Class ----- ----- ------- ------- ------- ---------- log file sync 357,552 22,427 63 90.4 Commit CPU time 1,636 6.6 log file parallel write 275,265 972 4 3.9 System I/O read by other session 7,848 533 68 2.1 User I/O db file parallel write 87,473 325 4 1.3 System I/O
It's pretty clear from the wait events that this really is a commit-heavy application. I validated this with the developer - each transaction is basically an INSERT or an UPDATE followed by a commit. But here's the mystery - why are the 'log file sync' wait times so much higher than the 'log fiel parallel write' times? An average write time of 4ms isn't fantastic by any stretch of the imagination, but it's not bad, and where are the other 59ms going to? I had a bit of a think about this and found the answer - the truth is hidden between the lines, and disguised by the averaging inside AWR.
The key to this mystery is what the specific wait events mean, and which sessions wait for them:
- log file sync - user sessions wait on this event after a commit is issued
- log file parallel write - LGWR is the only process that waits on this event, and it is the actual I/O (and associated setup) that stamps the redo buffer information onto disk
The process flow goes a bit like this for a session making a commit:
- User session issues a commit and goes into a 'log file sync' wait
- LGWR is posted to flush the redo buffer to disk
- LGWR issues the I/O and waits on 'log file parallel write'
- LGWR returns from I/O, clears wait, and posts the user session to signal completion
- User session clears 'log file sync' wait
There are other operations that I have deliberately omitted from that description for clarity (they are not really relevant for the purposes of this discussion). In the case where there are many sessions wanting to commit, the process is slightly different, because of Oracle's group commit (or piggyback commit) functionality. If a second session issues a commit while the LGWR is busy writing the redo for the first session, it too will wait on log file sync. If a third, fourth, fifth and sixth session issue commits, they will also wait on log file sync. They will wait for the LGWR to finish writing the redo buffer required for the first session, then immediately get posted to write the redo for the newly waiting sessions. Theses sessions (2-6) will continue to wait until this second write is complete.
What this means is that we have a strange kind of queue. It can only be two entries deep (the session currently being taken care of by LGWR and the session(s) that committed after the first session) and infinitely wide (all the subsequent sessions are handled as a single request). What it means in terms of AWR, however, is that the averages get distorted beyond all recognition.
I'll tell you now that the underlying problem with this system was a response time problem on redo writes, validated by looking the LGWR trace file which was littered with 'slow write' warnings (what a great feature, introduced in 10g!). I believe the write cache on the storage was getting saturated, but have not yet been able to confirm that with the client's storage team. So why was this not being reflected in the 'log file parallel write' wait events? The reason is because the slow writes were very slow (some greater than one second), but the majority of the writes were very fast. Consider the following set of write times:
- 3000 writes take 0.5ms
- 20 writes takes 1000ms
- 2000 writes take 1ms
The total time for all those writes is 23,500ms. Divide that by the number of requests (5020), and the average write time is 4.68ms, close to the number reported in the AWR report. The trouble is that 85% of the time is spent in only 0.4% of the calls, which is dragging the average write time down (even in very rapid AWR sampling this would be the case). That's the first part of the problem.
The second part of the problem is what happens when one of these outlier 1000ms I/O times takes place. We can see that the average (!) execute rate was 342/s, and let's assume that all of these executions were DML that had a commit after them. We could use the Transactions rate, but that will also include a bunch of rollbacks, and I happen to know from the developer that each execute is a DML operation. So, we have an arrival rate of 342 commits per second and occasional peaks of I/O time of 1000ms. As soon as one of these spikes occurs, 342 sessions will start to accumulate a portion of that 1000ms (depending upon where in that period they issued the commit). If we assume that they arrive evenly spread out across that 1000ms, we can use the area of a triangle to calculate the total time clocked up to log file sync during that period:
- Area of right-angled triangle = 1/2 base x height
- Height=number of sessions waiting, maximum 342
So, the total wait time attributed to log file parallel write is only 1000ms during this period, but the total wait time attributed to log file sync is 171,000ms. You see what I did there? I concealed the calculation of an average into something called 'area of a triangle'. Shh, don't tell anyone.
When this slow write is over, the sessions will once again wait for the fast writes, perhaps one or two sessions at a time waiting for sub-millisecond writes, which means the average time is not dragged down quite so far as the 'log file parallel write' wait event. Let's apply the waits to the same theoretical model as the log file parallel write to see:
- 3000 single log file sync waits at 0.5ms = 1500ms
- 20 periods of log file sync madness described above = 3,420,000 ms
- 2000 single log file sync waits at 1ms = 3000ms
Total number of waits here is 5000+6840 (this is 20* 342)=11,840 total waits, compared to 5020 for the log file parallel write. The total wait time, however, is 3,424,500 ms, making an average wait time of 289ms which would be reported by AWR for 'log file sync', compared to a 4.68ms 'log file parallel write' time. That's somewhat more exaggerated than the actual figures in the AWR report, but it does highlight the issue quite well.
AWR is a useful tool, but issues like this are masked. Use ASH or session based tracing for these kind of problems, and the answer will become clear!