TOC PREV NEXT INDEX

Scale Abilities Ltd Logo


Part V
Implementing Oracle
Chapter 9
Scalable Transaction Processing
9.1 Scalable Applications
This chapter demonstrates techniques for gaining maximum scalability from a single system image. This should be considered an essential prerequisite before considering Parallel Server. If the system does not perform and scale adequately as a single entity, attempting to accept Parallel Server will provide even worse scalability.
9.1.1 Application Design
The application design is the single most efficient area to optimize. Unfortunately, database engineers are often left to tune the database into efficient operation without input into the application. This is like trying to tune a lawn mower into a racing car. If the fundamental design of the application is flawed, little can be done on the server side to make the application scale and perform adequately. This is why the first portion of this chapter is dedicated to things that the database server engineer can do to point the application developers in the right direction.
Maintaining Concurrency
When an application is being designed for a large user population, it is important that the system be built to provide maximum concurrency. In order to satisfy this requirement, the application developer needs to develop the application using many of the same rules employed by the developer of an operating system. One of the most vital things that the application developer needs to do is to eliminate data-based contention in the application.
Data-based contention occurs when many users of the application are competing to write to the same physical records. Oracle must maintain the data integrity of the database, and so locking is utilized to prevent concurrent writes to the same piece of data.
Oracle employs row-level locking as the default locking mechanism for maintaining data integrity. This means essentially that when a write is to occur to a particular row in the database, the database automatically locks only that row. The rest of the rows in the table, and indeed in the data block, will be accessible to other sessions for read and write. This is a major plus for the database engineer, because it means that false lock conflicts do not occur in Oracle. However, the system is still not immune to problems if locks are allowed to persist, and this should be carefully explained to the application developer. In an order-processing system, for example, the application might be designed to work as follows.
To ensure that an order is taken in its entirety, without encountering out-of-stock problems at commit time, the designer of the application decides that the inventory required for the order should be decremented as the order is taken. The entire order can then be committed as a unit at the end of the order, in the safe knowledge that all the inventory is available.
This design would work fine as a single-user system, but this really defeats the object. The problem with this design is that the first user to take an order for a particular item will hold a lock on the inventory for that item, and until the order is complete, this lock will be held and no other users will be able to use this item in their orders.
There are at least two more scalable ways to implement this system. The first way involves not doing anything until the very end of the transaction, leaving all locking operations until right before the commit. This way, locks are held only for short periods, and so lock collisions are prevented. Unfortunately, this design is also defective, because this design can easily result in deadlocks (more on this later in this section).
The second option is to update and commit each line item as it is entered. This also holds locks for very short periods but is not subject to deadlock because only one row is locked at any given time. However, if the session is terminated halfway through the order entry process, the inventory is corrupted because the stock is decremented without the order being completed. With a small change, this design can be made to work.
If the application creates a new order record at the start of the order, immediately followed by a commit, then there is a master record for all work that follows. When the line item is updated to reflect the decrement, a record can be inserted into an ORDER_HISTORY table at the same time, referencing the order number. As an INSERT cannot be blocked by other locks, this cannot result in a deadlock, and the order is committed as it goes along. If the session is terminated, the entry process can be continued at a later stage by querying the order record on screen and continuing the order.
With this design, all locks are held very briefly (before being committed) and yet the atomicity of the transaction is maintained. In addition, the transaction is completely restartable.
Restartability is very important in building any large system, for both online processing and batch processing. In the case of online processing, things must be restartable, or at least recoverable from a business transaction perspective, in order to lessen the impact of the inevitable system or network failures. In the case of the order entry application, it would not be acceptable for the stock to be unaccounted for in the event of a failure, nor is it practical to have developers manually patch the data after every such incident.
In the case of batch processing, all batch jobs need to be written under the assumption that they will be killed off part way through. This sounds severe, but batch processing has a nasty habit of interfering with the response time of online sessions, and it is likely that there will be occasions of batch overrun that necessitate the killing of such jobs. If the job is not restartable, it will need to be started from the beginning the next evening, and will be even less likely to clear the backlog plus the new transactions.
In the first design presented, the design was deemed to be prone to deadlocks. Deadlocks present a real problem in database systems, and thus systems should be designed around them as much as possible. In the example, we could have two order-processing users entering the following order:
User 1 User 2
1 x Foamy Soap 6 x Green Sponges
3 x Green Sponges 3 x Foamy Soap

If both of these sessions were to attempt their commits simultaneously, each would manage to update and lock the first line item in its order. When each session got to its next line item, the other user would already have this record locked, and a deadlock would be evident. There would be no way out of this situation, other than for one of the sessions to rollback and reattempt the operation after a backoff period. Oracle would intervene at this point and carry this out against one of the users, at which point an error would be reported to the session: ORA-00060: deadlock detected while waiting for resource.
The creation of a nonblocking application is the responsibility of the application designer, not the database engineer, and especially not the user. After all, if a user is able to provoke deadlocks and general lock contention, then the user will do so all day long.
9.1.2 Scalable SQL
Many texts refer to efficient SQL, but the real goal in building very large systems is to produce scalable SQL. The term "scalable" can be interpreted in many different ways when referring to SQL statements, and we will cover each in turn.
Poor Efficiency
Inefficient SQL is bad for everyone. It not only causes poor response times for the user that submits the call, but also affects all other users as a result of one or more of the following:
· Overuse of processor resources
· Overuse of disk resources
· Abuse of the buffer cache
SQL statements can be classified as inefficient for a variety of different reasons, the impact of which is dependent on the particular blend of inefficiency that the statements provoke. For example, a full table scan requires a great deal of CPU in addition to the disk bandwidth it consumes; a long index range scan pollutes the buffer cache and causes a great deal of scattered I/O.
Most new applications should now be using the cost-based optimizer (CBO), which is required in order to use the new execution plans and new features such as partitioned tables effectively. All discussions in this book assume the use of the CBO, not the rule-based optimizer.
The ultimate goal for data retrieval in a TP system is to go straight to the location of the single required row and read it in; for now, we are not concerned about whether the record is cached or not. In most cases, this kind of lookup is not possible, because some kind of overhead is required to locate the row initially. If we assumed that B-tree indexes were this lookup mechanism, the number of Oracle blocks we would expect to need in order to read the data would be from two to five, depending on the depth of the B-tree.
If all queries in the system could be written this way, life would be good. Unfortunately, it is a rare day that queries can be written this efficiently. In practice, the act of normalizing the data model implies that several fetches could be necessary in order to retrieve the single item of data required, probably in the form of a multitable join. In addition, user-friendly applications require that the user be able to search using wildcards to some degree and that certain data applies only to specific ranges of dates. All of these requirements push up the cost of data retrieval significantly.
Oracle 8.1 makes a significant improvement in the usability of the CBO in a development environment by providing static plans and user-definable plans for the CBO to use. This allows a small development database to be set up to provide the same execution plans from the CBO as from the production database. Before this innovation, developing SQL for a CBO environment was difficult to do in a reliable way.
One thing that should be considered essential when writing SQL for the CBO, whether tested against a stable plan environment or not, is the use of optimizer hints. It is almost always true that the developer knows a great deal more about the data than the CBO is able to, and the use of hints takes some of the gamble out of the CBO. The specific gamble in question is that of the CBO electing to use a different execution plan than expected, owing to a change in one or more of the statistics. The CBO is more than capable of sabotaging a query that was previously not a problem, particularly in very dynamic environments, such as a system that has recently gone into production. New systems are especially prone to erratically changing plans as a result of the rapid proportional growth of some of the tables in the system, compared with the tables it joins against.
Using a simple query against V$SQLAREA, offending SQL can easily be identified:
col gets_per_exe form 9999999
SELECT hash_value,buffer_gets/executions gets_per_exe
FROM V$SQLAREA
WHERE executions>0
AND buffer_gets/executions>10
ORDER BY 2 desc
/
HASH_VALUE GETS_PER_EXE
---------- ------------
3359911567 2865
4279157786 1188
607327990 121
32929334 102
3013728279 56
2303392424 51
3565234031 37
1425443843 34
3382451116 34
731738013 32
2407357363 26
1592329314 24
3441224864 20
1714733582 19
1428100621 15
2918884618 14
4261939565 14
3127019320 13
1737259834 12
365454555 11
20 rows selected.
SQL>

Included in the V$SQLAREA view is the MODULE column. This column should be populated using the DBMS_UTILITY.SET_APPLICATION_INFO procedure by every module in the application. This is often the only way to determine the source of an SQL statement. Once this column is populated, the query can be confined to online modules, because batch jobs will always dominate the list if this is not the case.
When the priority for fixing these statements is being determined, consideration should be given to the execution rate. If the query that uses 102 buffer gets per call is executed 10,000 times more than the query that uses 2,865 gets, it is clearly a higher priority.
One of the best ways to tune a statement is to put a trace on a session that calls it, using the method detailed under "Tracing Sessions by Setting Events" in Section 3.5.3. Once a tracefile that contains the query has been obtained, a simple test case can be put together:
PARSING IN CURSOR #1 len=38 dep=0 uid=22 oct=3 lid=22 tim=0 hv=3024638592 ad='83842e20'
select * from sys.dual where 1234=:b0
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=0
BINDS #1:
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=24 offset=0
bfp=4014b328 bln=22 avl=03 flg=05
value=1234
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1650815232 p2=1 p3=0
FETCH #1:c=0,e=0,p=0,cr=1,cu=4,mis=0,r=1,dep=0,og=4,tim=0

The query can be pulled out of the tracefile and put into a script, using bind variables as in the original query.1 The tracefile can also be checked for the number of buffer gets it is expected to need for those particular bind combinations. This is determined by checking the cr= token in the FETCH line. A reproducible test case can now be constructed for this query:
variable b0 number
begin
:b0:=1234;
end;
/
alter session set timed_statistics=true;
alter session set events '10046 trace name context forever, level 12';
select * from sys.dual where 1234=:b0;
exit

This test case should produce exactly the same result as the original query in the application.
When looking at high buffer gets, consider the following potential causes:
· Data skew in the underlying table/index
· Index column ordering
· Cost of joins
The first two items are related, in that they both concern cardinality problems in the table. It is important to keep indexes as selective as possible when using B-tree indexes.
The cost of the join can make or break a query, and it is occasionally worthwhile to denormalize the model a little for specific performance cases. For example, if an intermediate table is used, would the addition of another column make driving off this table more efficient?
One common misconception among SQL programmers is that it is good to try to do everything in one statement. There are some cases in which this is correct, but more often it is a very bad thing for performance. If the data is known, it is often better to break complex correlated subquery statements into two cursors and glue them together with procedural logic. Unfortunately for the programmer, this often means more typing in order to set up the variables required for coding.
Another thing to look out for is the number of records being returned by a query. For most kinds of online usage, it is not useful to return a large number of rows for the user to look through. If anything, this normally points to poor user-friendliness on the part of the application. Lists of values are best kept to a small number of rows, where "small" means a maximum of about 30 records. Obviously, there are cases in which this is inappropriate or impossible, but these situations should be exceptions to the general rule.
Large data sets can be spotted by once again using the V$SQLAREA view. With the same type of query that was used to obtain the number of buffer gets, we can obtain the average number of rows returned by a query:
col rows_per_exe form 9999999
SELECT hash_value,rows_processed/executions rows_per_exe
FROM V$SQLAREA
WHERE executions>0
AND rows_processed/executions>1500
ORDER BY 2 desc
/
HASH_VALUE ROWS_PER_EXE
---------- ------------
603042213 30651
3490405195 12721
3189300438 9600
647382473 6840
3399070700 4962
893287616 3807
327344171 3529
3137157321 2668
2437259511 2666
2056336900 2513
1767743086 2415
1623323819 2317
3691742970 1853
3575556459 1749
88760771 1692
1074455672 1520
980033159 1520
17 rows selected.
SQL>

In this example, the average number of rows per execution was restricted to 1,500 in order to reduce the list returned. It can be seen that some of these queries are returning well over a reasonable number of rows. Once the MODULE filter has been applied, this list can be worked on in order to ensure that reasonable requests are being made to the database.
Avoiding High-Contention SQL
In addition to making SQL "reasonable" and efficient, other factors need to be considered in order to ensure that the SQL does not create excessive contention in the database. These factors fall into two main areas:
1. Write-read contention: reading sessions affected by writers
2. Read-read contention: sessions competing to pin the same physical block
When a heavily read table is updated, the blocks are subject to cleanout at the next read, as discussed in Section 5.5.3. This can cause big slowdowns for the next reader of the block. The following tkprof output shows the cost of a given query for which no block cleanout is required:
select sum(MONEY)
from
MY_CASH where CASH_ID between 10000200 and 90004611
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.78 1.17 417 420 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.79 1.18 417 420 0 1

The important fields to look at are the cpu and elapsed on Fetch. Compare them with the same query that needs to perform block cleanout on each block as it passes through:
select sum(MONEY)
from
MY_CASH where CASH_ID between 10000200 and 90004611
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.01 0 0 0 0
Fetch 2 1.18 3.02 417 838 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.18 3.03 417 838 0 1

The block cleanout is clearly expensive, even in this small example. The implication of this is that a relatively fast update performed during online operation of the database can affect the response times of queries for a long time afterwards. In addition, if many other sessions are also trying to read the data, they will queue up waiting on the event "buffer busy waits"-that is, waiting for the cleanout to be performed by the first session that accesses the block.
In addition to contention between writers and readers, contention between two readers can also occur. Although this occurs only at significantly higher read rates than write-read contention, it can still present response time problems if not managed adequately. Consider the following query:
SELECT /*+ INDEX (test_tab,ti1) */ b from test_tab where a in ('AA','AB'
,'AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR'
,'AS','AT','AU','AV','AW','AX','AY','AZ','A[','A\','A]','A^','A_','A`','Aa','Ab'
,'Ac')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.03 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.01 0 71 0 35
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.04 0.04 0 71 0 35

In this query, the B-tree is traversed for each of the values specified in the IN clause. For this particular example, the BLEVEL2 of the index is 0, meaning that it is so small that there are no root or branch blocks-only a single leaf block. Therefore, there is one consistent read for each of the rows, and one for the table reference (plus an extra get as a result of a second fetch attempt by SQL*Plus). The conclusion, therefore, is that the B-tree must be traversed from root to leaf for every fetch from it.
While this does not present a large problem for this particular (small) query, it can present a problem in larger tables/indexes. For example, if an index has a BLEVEL of 2, this means that there is a root and a branch block above every leaf. Therefore, three blocks must be traversed to find any given leaf. The key problem is that every access has to go through the root block-a classic bottleneck.
Although we are reading only the root block, we need to take out a latch on the hash chain that the dba belongs to in order to pin the buffer. If several queries with long IN lists all hit this index at the same time, latch contention will result on the hash chain. This shows up as an increased number of misses and sleeps in v$latch and results in response time problems as Oracle serializes on one latch:
select hladdr "LATCH ADDRESS",
dbarfil "FILE#",
dbablk "BLOCK#",
gets,
lc.sleeps
from x$bh bh,
v$latch_children lc
where lc.addr=bh.hladdr
and state!=0 /* ie not FREE */
and sleeps>10000
order by sleeps desc
/
LATCH_AD RFILE# BLOCK# STAT GETS SLEEPS
-------- ---------- ---------- ---- ---------- ----------
A2F07930 311 26123 XCUR 90757897 103036
9C0A1080 157 7683 XCUR 373660716 95946
9C18A048 305 133251 XCUR 97899571 62771
9C18A048 304 476892 XCUR 97899571 62771
9C18A048 322 206671 XCUR 97899571 62771

This script pulls out all hash chain latches that have been slept on more than 10,000 times since start-up, identifying "hot" chains. Depending on the number of hash chains configured for the instance, one or more blocks will be reported in the output. Although it is likely that only one buffer is being contended for, all other dbas that hash to that chain will also be affected by the slow down. It can be seen from the output that the hash chain latch at address A2F07930 has been slept on about twice as many times as the latch at address 9C18A048, for the same number of gets. This indicates some kind of burst access for this buffer, such as large IN lists in a query.
If it is a root block that is contended for, the block number will correspond to the first block of the index segment after the segment header. Once a problem of this type has been identified, action can be taken to improve it. Such problems are discussed in Section 9.2.
Sharing SQL
One of the major changes in Oracle for release 7 was the addition of the shared pool and shared SQL areas. Prior to this, all SQL was parsed privately for each user who executed it. Shared SQL allows far greater throughput to be achieved on the database server, as a result of the reduction in work required to interpret the request. Full implementation of shared SQL should be considered essential for a large-scale application, because the cost of not doing so is very high-higher than it was prior to the introduction of the shared SQL facility.
Contrary to this need is the misuse of the dynamic SQL feature of Oracle. This feature is now present in all of the Oracle tools, including PL/SQL and Pro*C, and allows dynamic construction of SQL statements based on the flow of the code and the variables passed to it. Unfortunately, this very powerful facility is mostly abused as an easy way of constructing queries, resulting in unique pieces of SQL for every request. The proper use of dynamic SQL is to allow more structured program flow and to implement complex requests, not to embed literals in statements and to break the SQL sharing.
Dynamic SQL can still be employed in large-scale systems as long as its use is carefully managed. The important thing to remember when using dynamic SQL in a piece of code is to keep the number of potential permutations as low as possible. For example, consider a module to which are passed a variable number of parameters that are eventually needed as predicates in the where clause. Anywhere from one to 400 parameters could be passed to this module, all of which must be supplied to the query in the module. The temptation is to construct the where clause of the statement dynamically for every parameter supplied:
inputs : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... 399, 400
sprintf(preds,"WHERE xyz in (");
for (i=0;i<nparams;i++) {
sprintf( tmps, "%d,",inp[i]);
strcat(preds,tmps);
}
sprintf(preds+strlen(preds)-1,")");

Although easy to do, this is about the worst thing that can be done here, because it creates a totally unique statement for every execution. There are basically two levels of optimization that can be performed on this statement. The first simply uses the same code base as the original, but constructs the IN clause out of bind variables and assigns variables to those variables at the same time. While not ideal, this cuts down on the number of possibilities enormously, because any call with the same number of parameters will reuse the same cached version of the SQL statement.
A better way to do this is to define a finite number of cursors. The first cursor may allow for as many as ten parameters, the second up to 20, and so on. If a standard increment of ten parameters were maintained, this would result in a maximum of 40 cursors in this example, which would be very much more manageable.
In order to do this, the number of input parameters determines which query should be used according to the following formula:

Pro*C code for constructing a finite number of "buckets" of bind variables would look a little like this:
All the detail of setting up for dynamic SQL method 4 has been omitted, concentrating instead on the actual construction and binding of the statement. The trick in this particular example is to bind the unused placeholders as NULL. No value can equal NULL, even NULL (IS NULL is the only test that will identify a NULL value). Therefore, binding the unused variables to be NULL ensures that no data is returned for them.
The "bucket" that would be created for an input parameter count of 1 to 10 would look like this:
WHERE EMPNO IN (:b0,:b1,:b2,:b3,:b4,:b5,:b6,:b7,:b8,:b9)

The Cost of Binding
Limiting the number of permutations of a given statement is the only way that dynamic SQL can be adopted. In the very dynamic case of any number of inputs discussed above, this could also have been achieved by creating a single cursor with hundreds of bind variables. The unused variables could then be bound as NULL as in the bucket example above. However, this brings about another problem and should also be avoided.
There is a measurable cost associated with each bind operation. Each one results in work being performed on both the client side and the server side, and if the number of redundant (NULL bound) variables is high, an enormous amount of CPU can be wasted doing redundant work. To demonstrate this, I wrote a Pro*C program that executes a simple SELECT statement 500 times. The query looks like this:
SELECT one_column FROM my_table WHERE one_column IN ( :b0, :b1 );

Both of the bind variables were set to be NULL, and so no data was returned by this query. I also wrote another program that behaved exactly the same way, but bound 400 NULL variables instead of two. Table 9.1 shows the results when this program was run against a PA-8200-based database server.
The 400-bind version of the code took 333 times as much CPU to perform the same net work (returning no data). Some of this CPU will be used parsing the much larger SQL statement, and some will be spent performing the actual binding operations. At the end of the day, it doesn't matter where it is being spent-excessive binding should be avoided wherever possible.
Poorly Bound SQL
When writing SQL for an application, it is easy to fall into the trap of poorly bound SQL. This is not another reference to the use of bind variables, but to the careful definition of data sets within an SQL statement. A single query may have a minor footprint one day and a huge footprint the next day, all based on the data supplied to the statement. This falls into two main categories:
· Poor date definition
· Poor input enforcement
Poor Date Definition
Most transactional systems have an abundance of date-related rules. The dates affected by these rules can range from dates on which transactions are made to dates that determine the eligibility of certain records in the database. The implication of this is that there are also an abundance of queries in the system that are constructed as follows:
SELECT xyz FROM abc WHERE SYSDATE between START_DATE and END_DATE;

As soon as such a query is written, it is guaranteed to perform a range scan of some description on the index in the date columns. If the date ranges are understood by the developer, and the data is protected from skew through the application, this still does not present a huge problem. However, it is more common for date ranges to grow quickly out of control, particularly on a new, growing system. This results in increased I/O on the system, a polluted buffer cache, and wasted CPU.
In one such scenario, certain information is considered active only if the supplied date lies between the start and end dates for that information, such as seasonal types of products in a supermarket. A certain type of chocolate Easter egg, for example, is available for order by the store manager only between January and April of the current year. The query that retrieves available products excludes items that do not have a start_date in the past and an end_date in the future.
The problem with this type of data is that it rapidly becomes very nonselective because of the way in which it is administrated by the users of the system. In this example, it is likely that 99 percent of all the available products would be made active forever, making any index that leads on the start and end dates completely useless. The server has no choice but to range scan most of the index, going back to the table for more information when necessary.
It is important to prevent this kind of data skew if the application is to scale well: The more data that goes into a table, the more work it takes to retrieve data from that table. If there is another attribute of the query that is more selective, that attribute should be made the leading edge of the index. Otherwise, it is likely that a full table scan is a better alternative than the huge index range scan. However, this is typically not a good solution either, and a design change needs to be investigated.
In the case of this example, it may be more practical to have a B-tree index on an "inactive flag," which is updated periodically by a batch routine. The flag would have a value only if the product was inactive for the current period, therefore keeping the number of keys in the index low. More typically, however, it is not just SYSDATE that is compared against the start and end dates, but rather some arbitrary supplied date. In this case, the adoption of an inactivity flag would not provide the required result.
Poor Input Enforcement
The most common problem in many large systems is the lack of enforcement of controls on user input. If user input is not carefully controlled, the users have the power to wreak havoc on the system with very large, unbounded queries.
The classic unbounded query is a name search. Names are typically indexed columns and are normally quite selective apart from well-known exceptions. In a search for a customer profile, for example, the provision of a few last name characters and the first initial dramatically reduces the number of keys that need to be scanned in the index. If users are allowed to be unreasonable here, they will be unreasonable here-they simply don't know any better. Good practice in the name example is to require three characters of the last name before a wildcard can be added. Names of fewer than three characters don't really need the wildcard functionality to speed the user along.
I encountered an interesting case study in a furniture store one day. A price ticket was not visible for a certain table, and so I asked an employee to look it up on the computer. Unfortunately, the name of the table began with an accented character, and the employee was supplied with a U.S. keyboard that had no accented characters. I noticed that the application was Oracle, with the familiar face of SQL*Forms showing itself on the terminal.
The employee proceeded to input the only string he could in order to find the item: "%bo". Of course, while the server was grinding its way through the entire B-tree index on the product name, the employee was whining and complaining to me that the computer system was broken and that he wished they would get a new one. This is a classic problem-in this instance complicated by the need for a special European keyboard on the terminal, combined with some application field enforcement.
9.1.3 Transaction Processing Monitors
Traditional client/server applications are known as two-tier applications. Of the two tiers, the lower tier is the database, and the top tier is the application program (see Figure 9.1).
In this example, Oracle8i and SQL*Forms 4.5 have been used, in a classic two-tier pairing. In the upper tier, SQL*Forms has all of the logic to display the forms to the users, to implement the business logic, and to access the database. In the lower tier, Oracle8i consists of the database itself and a transaction manager.
The term "transaction manager" is defined by The Open Group3 as a piece of software that coordinates a transaction and enforces its atomicity. In formal terms, a transaction manager coordinates the transaction across all the involved resource managers, which is The Open Group's name for any shared, consistent resource. The resource manager in this case is the actual Oracle database.
In a single-database environment, the transaction manager used is the Oracle Server transaction manager, and the resource manager is the physical database. All the atomicity required for a two-tier transaction is managed within the Oracle Server, with the application logic issuing commit calls directly to the database.
The upper tier in this example contains the Forms application, which in itself consists of two logically different components: business logic and presentation logic. The same physical executable is responsible for displaying the forms to the user, following the business logic, and performing data access to the database.
This type of architecture has several problems:
· Every user of the system needs a direct database connection, wasting valuable resources on the database server.
· The same developers are responsible for both the presentation logic and the business logic.
· There is operational anarchy: No operational control over transaction flow is possible, meaning that any user can execute any part of the application at any time.
· Reuse of common business logic is difficult because of the lack of a defined interface.
· Interfacing to any other software is difficult.
These disadvantages offset the benefits of this solution: rapid development times, relatively uncoordinated development, and use of 4GL development tools. These problems are also difficult to avoid within a two-tier architecture. The way to solve them is to adopt a transaction processing monitor, creating a multitier architecture.
What Is a Transaction Processing Monitor?
A transaction processing monitor, or TP monitor, is often described as middleware. This means that it is software that resides between other software components. In the case of a TP monitor, this middle position lies between the presentation logic and the business logic, as shown in Figure 9.2.
In its simplest form, a TP monitor provides the following services:
· Standard, shared interfaces between software modules
· Communication and queuing facilities
The TP monitor provides a platform on which the presentation layer of an application can be created in complete isolation from the underlying business logic by providing these facilities. The TP monitor effectively provides the "glue" that holds the presentation layer and the business logic together.
In a TP monitor environment, things get renamed a little. The presentation software is called the client. The business logic exists in discrete programs called servers. These servers are advertised to the client as services.
The model provided by the TP monitor is that of a service provider and a client. The TP monitor provides all the communications and interfacing between the client and the servers, and takes care of advertising services to the clients.
For example, assume that the three-tier application in Figure 9.2 is the model used for a banking application. A Pro*C or OCI server could be created within the middle tier that provides the account balance for a given account number. This server is advertised by the TP monitor as an available service, and client connections can call on this service when required. This is achieved using a standard call from the TP monitor's libraries, passing the account number as a parameter for the server. The actual language, physical location, presentation style, and other features of the client requesting the information are totally irrelevant. The banking application architecture may look a little like the arrangement shown in Figure 9.3.
There are two completely different types of clients in this architecture. The first is the automated teller network, which may consist of a variety of different presentation types and operating environments across the country. The second is the cashier within the bank, running the full suite of banking services. There are three servers available in this example: a debit server that debits an account, a credit server that credits an account, and a balance server that returns the current balance of the account.
The clients connect to the TP monitor only; there is no direct connection to the database. When the clients connect to the TP monitor, the advertised services are made available to them. The clients then send requests to the servers, in order for the server to perform the work on their behalf.
A request may be synchronous or asynchronous, depending on the requirement. If the client does not need to wait for completion of the requested task, an asynchronous call is appropriate. The server may send a response back to the client or may just complete the task. Again, this depends on the function of the server; the debit server may not send back any information, whereas the account balance server must do so.
The TP monitor also provides queuing services for the application. If the debit function is not required to take immediate effect, for example, the requests to that server may pass through a queue before being processed by the actual server. This queue can be opened or closed by the application administrator, thus controlling the throughput of the system within the application tier.
The use of request queuing is an important attribute of the TP monitor from the database server perspective. If the server system is capable of only a known number of transactions per second, the flow of requests to the database can be restricted to this number of transactions by restricting the number of available servers, keeping outstanding requests in a serial queue on the application server. This means that the waits are passive in the application tier, rather than spinning in an active wait on the database server, trying to get resource that is not available.
Further gains are made on the database server as a result of the reduced connection count on the database server. The reason the number of connections is reduced is that there are no longer any user connections into the database. Only the application servers connect to the database, thus reducing the number of required connections. The number of the application server processes required to perform the work requested by all the clients may be as few as one-tenth the number of clients, owing to the amount of "think time" that occurs in OLTP applications. The reduction of connections to the database means reductions in memory usage, scheduling overhead, context switching, and file descriptors on the server.
When an application is distributed over a wide area, a TP monitor becomes important for another reason: its communication model. When a user is a long distance from the database server, latencies are induced by the sheer distance the network covers. If the application were two-tier, and the user were 10,000 miles from the database server, these latencies would be very serious and certainly noticeable to the end user.
If the application were character-based, this would result in a delay between the keystrokes typed on the user's dumb terminal and the echo sent back by the UNIX client in the datacenter. If the application were running nearer to the user, such as on a local PC, the delays would be incurred through the use of Net8 over a wide area. Owing to the highly conversational nature of SQL*Net, several round trips can be required for each and every SQL statement sent to the database, multiplying the network latencies accordingly.
A TP monitor gets around these problems somewhat by shifting the communication model into a request/response mode. This means that the client needs only to send a single request and receive a single response for a given function. This imposes only one set of latencies over the wide area, as opposed to a large number if the communication were like Net8 or terminal character echo.
In addition to providing these benefits, TP monitors allow seamless integration of homogeneous and heterogeneous systems. An application can access a UNIX/Oracle system and a mainframe system within the same transaction, transparently to the user. This capability is provided by the TP monitor abstraction from the actual resource manager; the presentation layer is not concerned with where the data resides. The TP monitor can also facilitate intersystem data bridges and interfaces to third-party services such as computer-telephone integration (CTI) through the same abstraction.
Optionally, a TP monitor can take on the role of transaction manager, communicating with Oracle over the standard XA interface. This becomes particularly important when multiple middleware servers are associated with a single transaction, or when multiple resource managers (database) are involved in the transaction. This moves the responsibility for ensuring the atomicity of the transaction to the TP monitor, which must communicate with all of the systems involved using a two-phase commit protocol. This kind of integration comes at a price, however, and should be used only where necessary, such as in our banking application (see Figure 9.4).
This more closely resembles the way in which a real banking application would need to be architected. Moving the transaction manager into the middle tier is the only way that interaccount funds transfers, for example, could be implemented. The two servers involved in such a transaction, the debit and credit servers, are performing components of an atomic transaction, and therefore must either both succeed or both fail.
Alternatively, a "transfer server" could be created, performing both the debit and credit operations within a standard Oracle transaction. This would work well also, except when multiple databases were thrown into the mix. If, for example, there were a savings account on one database and a checking account on another, the transfer transaction would have to be coodinated between two servers. In this instance, there would be no alternative except locating the transaction management in the middle tier.
In summary, a TP monitor provides the following benefits for a large-scale application:
· There is lower server resource usage owing to reduced database server connections.
· The application developer can be focused on business logic or presentation logic, depending on skillset and assignment.
· The queuing facility allows "throttling" of work requests to prevent thrashing on the server.
· Common business logic is easily reused by different modules owing to the standard interfaces.
· Bridges and integration of other products can be developed using the same interfaces as those used in the rest of the application.
· Distributed applications can be built across heterogeneous and homogenous systems.
The deployment of a TP monitor also carries several disadvantages:
· An all-or-nothing approach of decoupling all users from the database (and connecting them into the TP monitor) is required to gain the most benefit.
· Application servers are generally developed in a language such as C, rather than using one of the rapid development tools.
· Skills in the development of multitier applications are not as prevalent as those used in two-tier applications.
· Software licences are required for the TP monitor software.
These negatives are easily overshadowed by the positives of using multitier technology for very large applications. For UNIX systems, there are few choices in TP monitor selection, with BEA Tuxedo being far and away the most popular. Tuxedo offers all the functionality associated with a TP monitor and runs on a variety of UNIX platforms.
Object Request Brokers
The latest development in multitier applications is the object request broker, or ORB. One of the common standards for an ORB is the Common Object Request Broker Architecture (CORBA) specification set out by The Object Management Group (OMG). The ORB presents a combination of the concepts of the TP monitor and object-oriented programming, providing a framework for distributed objects.
Distributed objects allow fully encapsulated software modules to be written, with the ORB acting as the glue that holds all the objects together. If the ORB conforms to CORBA 2.0 standard or greater, it can also provide transaction management services and security, thus providing an enhanced TP monitor environment. This new environment can be built on heterogeneous systems and in virtually any language that the ORB can interface with through the Interface Definition Language (IDL).
IDL is a language-neutral interface language, used to interface the software components to the ORB. Using IDL, stub code can be produced to allow for compile-time dependency checks and to allow software components to be produced in isolation from the complexities of the ORB and the remote procedure call (RPC) mechanisms associated with it.
Much of the actual technology supporting ORB operation is still in its infancy, though many of the industry's largest independent software vendor (ISVs) are adopting this strategy for the future. Most notably, Oracle8i provides significant ORB support. Of course, care should be taken when adopting an ORB as the middleware architecture for transactionally intensive systems, because the supporting software is relatively new. When software is as new as this, it takes time before it is ready for heavy loading.
9.1.4 Purge
A database should never be confused with a data dump. In OLTP-based systems, this means that the database must reach a level of equilibrium in its size; data must be deleted at the same rate at which it is inserted.
In a typical transactional database, there are two main types of data:
1. Reference data
2. Transactional data
The reference data is not normally subject to purging, because it reaches a static size early in the life of the application. The transactional data, on the other hand, grows every day the system is used and can become unmanageable very quickly. A banking application, for example, inserts many rows into each account, which are mostly read only once-when the monthly statement is produced. In this way, transactional tables can be viewed as insert-mostly tables.
After a period of time, this data no longer needs to reside in the production database and should be removed in order to maintain a static database size. The period of time and the rules for deletion have to be defined by the business, because there are likely to be operational and legal reasons driving the rules. The actual deletion of the data should be performed as a scheduled batch process, using deletion code that has been carefully developed.
The reasons for performing a thorough and regular purge are numerous. First, there are the date-range driven queries, as described in Section 9.1.2. As the database grows larger, these queries will become more resource-intensive as a result of the increased date range in the database transactional tables. Often this data is scanned by badly written queries even though the business is no longer concerned with it.
Second, there is the clear problem of available disk space on the database server. Things cannot be allowed to grow unconstrained, because this would result in an ever-increasing amount of disk connected to the system. In addition to this, the database must be backed up on a regular basis, and this becomes more difficult to achieve when the database is larger than the backup subsystem can deal with in the backup window. At this point, the backup needs to be spread out over several nights, complicating the recovery process in the event of a failure.
Lastly, the batch window will grow in direct proportion to the database size. This occurs because batch jobs that operate on transactional data are likely to perform a full table scan of the table in question, which would be the optimal access path for a batch job that processes a large proportion of the table. If the table were to grow without purge, the proportion of records that the batch job should consider would become slight in relation to the number of old records, thus slowing down the batch routine.
Rapid Deletion Techniques
Deletion of data often comes in tandem with insertion of data elsewhere; it is rare that data can simply be removed without storing it elsewhere first. New features in Oracle8 and Oracle8.1 have helped in both of these areas, making purge a good deal easier to implement today than in prior releases.
The first of these features is the use of partitioned objects. Partitioning a table sometimes allows the easiest method of deletion of all-the ability to drop a whole range of data from the database. This is not always available, because it depends on whether your partition key is also your purge key (i.e., date stamp) and also on whether all the indexes built on the table are partition-local (that is, no index covers more than one partition).
The ability to remove a partition can be combined with the new capability of Oracle8.1 to detach a tablespace and reattach it to another database. This allows very rapid relocation of purged data into another database for further processing. The procedure for doing this is as follows:
1. "Exchange" the designated partition with a standard table.
2. Set the tablespace to be READ ONLY.
3. Export the metadata from the catalog using
exp userid=<user/pass> transport_tablespace=y tablespaces=<tablespace>

4. Copy the tablespace datafile(s) to the target database.
5. Import the tablespace into the target using
imp userid=<user/pass> transport_tablespace=y datafiles='<tablespace datafiles>'

6. Check that all is OK.
7. Drop tablespace in production, including contents.
8. Manipulate imported tablespace in target database.
The impact on production response times for this method is close to zero, because no deletes or inserts occur. Of course, this method requires the purge criteria to be based on the same key as the partition key, because nonpartitioned columns could be spread across many partitions.
While it is often impractical to purge by partition key, an alternative is possible when the purge criteria are strictly timestamp-based and the partition key is a system-generated one, such as a sequence number. Transactional tables often meet these criteria, because the data is inserted at one end and is used less and less as it gets older. In this case, a correlation can be made between the age of the row and the partition key, assuming that the partitioning is based on the primary key. For example, although an ORDERS table may be partitioned on its primary key of ORDER_NUM, the value of ORDER_NUM will only ever increase over time. Therefore, a preliminary query can be executed to determine the corresponding key value for a given timestamp, and the purge can be executed against that derived value. This allows the partition relocation method of purging to be used.
If a partition-based purge is not possible, the only alternative is to execute deletes against the table, having previously extracted that data for insertion into the reporting environment. This is never an easy operation, because the delete process requires several resources:
· CPU
· Locks
· Undo allocation and CR blocks
· Block cleanout
Of these, CPU is the least problematic. The other three resources directly affect the operation of online sessions and batch processing, and so must be carefully managed. The first of these is the required allocation of locks to perform the delete. Normally, this should not be a huge problem either, because the data to be purged should not be getting actively updated anyway. The only legitimate activity on these rows would be reads from full table scans. However, if there are badly written batch routines out there that lock irrelevant data, they will interfere with the purge routines, and vice versa.
There are two attributes to be concerned with for the undo allocation. First, there is the requirement to have sufficient rollback segment space to allow a rollback of the entire operation. In the case of a delete, the rollback segment must include sufficient information to rebuild the entire row, should a rollback be necessary. This means that large delete operations require a large amount of rollback space with which to work.
The second attribute of undo allocation is the provision of consistent read views to other sessions. Though the rows are deleted using row-level locks, the block itself is managed by Oracle at a full block granularity. This means that any access to that block must go through the CR mechanism, regardless of whether or not the row being read is deleted. In addition, until the commit of the delete session, all other sessions connected to the database are able to read the data, thus requiring block reconstruction from the rollback segment.
If other long-running queries take place on the table being purged, all the undo information for the delete must remain for the duration of that query, even across commits by the delete session. If there is insufficient space for this, the long-running query will eventually get a "snapshot too old" error and terminate. Likewise, any updates or deletes of data in the table by online sessions must be retained in the rollback segment of the online session until the delete has completed its transaction.
Finally, there is the unavoidable issue of block cleanout. This will need to be done at the end of the purge session, whether explicitly by the purge routine or implicitly by unsuspecting users. Clearly the best solution is to perform a scan of the data explicitly after the purge, in order to perform the block cleanout operation. If CPU is available, this can be performed in parallel, speeding up the operation.
The actual delete process itself can be approached in three ways:
1. Serial delete session
2. Many delete sessions on the same data
3. Oracle parallel DML
If speed of delete is not an issue, the first option is best. This is the simplest option to develop and maintain, and so should be adopted where volumes permit. Where the volumes are too high for a serial delete, one of the parallel options can be used. The first parallel option is to use multiple delete sessions, each working on different portions of the same data set. This is fairly easy to develop but requires careful planning in order to keep the sessions in physically separate blocks. Even if the physical rows are kept distinct between sessions, multiple sessions that delete from the same block will result in unnecessary CR block creation, slowing down the process.
Probably the best option is to use Oracle's parallel DML functionality, which allows the purge code to be mostly written as the serial delete scenario while Oracle takes care of parallelizing the deletes in an efficient manner. In MPP architectures, this includes disk affinity-sending the delete streams to the node that is physically connected to the disk. In addition, the parallel delete slaves use a modified two-phase commit protocol to ensure that they effectively operate in the same atomic transaction. Block cleanout is still necessary after parallel DML and should be performed explicitly by the purge routines (i.e., perform a scan of the data).
9.2 Tuning Transaction Processing Systems
9.2.1 Goals for Tuning
When tuning a system, it is important to have clear goals set before you start. Without such goals, tuning effort will be misdirected, difficult to learn from, and hard to quantify. Perhaps most important is the learning factor; there are lessons to be learned in every tuning exercise, and it's important to be able to understand the impact of individual changes and come to a conclusion about how effective a specific tuning exercise can be.
The thing to do when planning a tuning exercise is to identify the "low-hanging fruit." Picking off the easy wins at the start allows more time for the more complex tuning along the line. The kinds of candidates for initial tuning are
· Code with high execution rate
· Resource-intensive code
The first of these is simple to determine, by querying the V$SQLAREA view for statements with a high execution count. The second is more difficult, as it is not practical to enable the timed_statistics parameter in a busy production database. Therefore, this kind of statement must be found by indirect means:
· High number of buffer gets per execution
· High number of disk reads per execution
· Very large statements
· Latch contention
· Lock collisions
You are looking for things that prevent the system from delivering good response times, either directly from the query execution time or indirectly by starving other sessions of system resources.
Once the targets have been identified, a careful evaluation of options needs to be made-that is, where the best fix for each problem lies.
9.2.2 Where to Tune
The effectiveness of tuning varies depending on what is being tuned. There is a hierarchy of impact (see Figure 9.5), where the item at the top of the hierarchy makes the largest difference and the item at the bottom makes the smallest difference (unless it is grossly wrong) .
This is an important concept to grasp and to sell to others. It is a common misconception of the uninformed that application and requirement problems can be fixed by tuning the system. This is not true, and it is important that this be understood by all. If the system is performing badly, all areas of the hierarchy must be checked in parallel. A good rule of thumb, however, is that the impact of tuning increases by an order of magnitude (a factor of 10) as you move up each stage of the hierarchy.
Requirement
This is the high-level definition of the system. The customer typically submits a wish list, and this list is then refined into a deliverable set of requirements. Most of the time, some requirements slip through that are not essential to the customer and are extremely detrimental to the performance of the database.
One common example is the infamous default query-by-example screen. In such a screen, the user is presented with several attributes that can be either filled in or left blank. If they are left blank, all combinations of that field will be returned. If the management of a hotel had a centralized front of house system, they might have a screen containing the fields ROOM_NO, DATE, and HOTEL_NAME, for example. The customer might have specified that any of these fields be left blank in order to return all the combinations, thinking that they might come in useful one day. This is a simple request but is completely unnecessary in reality. Why would a receptionist in one hotel need to query who was staying in every room 301 for 1-APR-1998 in every hotel?
These screens need to be carefully evaluated and to be revisited after the selectivity is assessed on real data. It is unusual for a user to prefer to keep such functionality in preference to good response time.
Another example of requirement tuning might be where asynchronous processing occurs automatically for user convenience, preempting the next request. In an order entry system, for instance, there may be functionality that goes out and retrieves line-item descriptions and pricing for products associated with the one last entered. The user enters "Electric Drill," and the system also gets the pricing and descriptions for various types of drill bits. As there are likely to be many types of drill bits, the majority of this preemptive processing is redundant-the customer will not be ordering all of them. Careful analysis needs to be performed in order to ensure that the automatic calls are commonly used by the user rather than passed over.
Design
When the requirement is defined, an application enters the design phase. This is the molding of all the pieces of the requirement into a cohesive application plan. Included in this phase is the database design, covering the logical model right through to the first draft of the physical implementation. Errors at this stage can have enormous knock on effects.
One frequent mistake is the "space-biased" normalization approach. Database purists typically lean hard on the side of full normalization of the data model. They have the honorable intention of reducing the data duplication, thereby reducing the risk of logical (application level) corruption and keeping the size of the database down.
A better approach is often the "time-biased" approach to normalization, in which normalization is pursued vigorously where it remains efficient to retrieve the data but relaxed somewhat where excessive table joins are likely to result in poor response times. Changes in the data model are almost impossible to implement after the application has been written, because the scope of the changes becomes too large. Therefore, it's vital to get this part of the design correct up front.
Application
Once the requirement has been defined, the application is written. Typically, a large application is written by many people of various programming beliefs, styles, and talents. With that mixture, it is inevitable that a proportion of poor quality SQL and logic makes its way through. Nonscalable SQL (as described in Section 9.1.2), unnecessary calls, and poor logical flow can result in the database server being asked to do far more work than is necessary for the required result.
This is the easiest part of the system to identify from the database view of the world; all the SQL statements are recorded nicely in the shared pool, along with a selection of their execution statistics. Ordered lists produced from the shared pool are a good way to begin this process, because this will quickly identify the worst of the bunch. Working through this list is a very effective way of prioritizing the work required for improving the system response time.
Tuning of the application can typically yield 25 percent to 50 percent of the system CPU, depending on the severity of the initial problem. This even includes applications that are fundamentally well written: If a single statement that accounts for 20 percent of the CPU on the server is improved by 50 percent in a tuning exercise, then 10 percent of the system CPU will be returned as a result (20% ¥ 50% = 10%).
Database
The database is a service provider-it receives requests and processes them as quickly as it is able to. If the request is unreasonable, it will take a long time to service, and other requests will suffer as a result. This is a simplistic definition of what a database does, but serves well for putting things into perspective. The tuning that can be done at the database level can be viewed as improving only what is already there. Unless the database was very badly set up at the outset, it is unlikely that very large gains can be made simply by tuning it.
This declaration does not cover database work that should occur at the design stage-such work is undeniably effective. Nor is it saying that magic cannot be worked from the database angle; it certainly can, in most cases. However, it cannot make a square wheel round, and the higher-level aspects of the system should be considered with a higher priority than the database tuning.
System
The system is one level further removed from the database. It is so far removed from being able to make a big tuning impact that it is often a binary answer: Is the system configured correctly or not? If the system is not configured correctly-for example, if it allows heavy paging or swapping to occur-this will result in poor performance. If it is configured adequately, probably only very small percentage improvements can be made overall.
9.2.3 Tuning Modes
Tuning can be approached in several ways. These ways, or modes, are associated with different goals but often enjoy a great deal of synergy. Reducing system utilization, for example, frequently results in improved system response time. There are, however, different approaches to reaching this goal.
For a transaction processing system, there are three main modes of tuning:
· Reducing the utilization of the system
· Improving response times for specific areas of the application
· Increasing the throughput of batch processing
Reducing System Utilization
Reducing the overall system utilization is a very broad goal. Depending on the particular resource that needs to be reduced, different strategies can be followed. For a large database system, now that memory is less of an issue,4 the resources in question are CPU and I/O.
For CPU resources, it should first be established whether the system is also experiencing high I/O loads. Even if the I/O system is keeping up with the load without straying into high service times, the system could well be struggling to service all the requests and associated interrupts. Therefore, a CPU issue may in fact be a side effect of an I/O issue.
Assuming that there is not a high I/O loading, what is the cache hit ratio in the buffer cache? If the hit ratio is very high (>98 percent), the system could be doing a great deal of work without the I/O system making this evident. In this case, a more detailed analysis of the application's use of the system needs to be performed. This analysis should include a discovery of which modules perform the highest number of buffer gets (assuming that the application uses SET_APPLICATION_INFO in order to identify itself to the database) as an indicator of where to look. If the application is using a TP monitor, which server connections use the most CPU on a minute-by-minute basis?
When the system is not overloaded by one particular module of the application, this can mean one of many things:
· The system is undersized (requirements problem).
· The application is in need of tuning across the board (design problem).
· There is resource contention (database or design problem).
· Processing is occurring in the wrong tier (design problem).
An undersized system will almost certainly not benefit sufficiently from any tuning of the system, database, application, or design. If the system is simply too small for the required workload, changes need to be made in the requirements definition, or else additional compute resource needs to be purchased.
If all the modules of an application appear to be loading the system heavily, this is often attributed to a design problem with the application. For example, if many queries in the system need to incorporate a subquery or multitable joins, it is likely that the data model will need to be looked at. If the model looks sound, it could be that the coding standards laid down for the application were insufficient or not followed correctly. This will result in much of the application doing "naughty" things to the database server. In addition, the cardinality of indexes needs to be carefully monitored as tables grow. This is covered under "Improving Response Times."
The management of resource contention can become a very CPU-intensive task on a system that experiences heavy contention. Statements that run very fast in quiet periods will run slowly while contention occurs. Specifically, active waits on locks (i.e., latches) will use CPU just to try to acquire the latch with the minimum of wait time. In these situations, the contended latch must be quickly identified and action must be taken to reduce the contention. Such action varies from latch to latch, but typically involves the following:
· Library cache latch contention. Reduce parse rate by keeping cursors open, sharing SQL more effectively.
· Cache buffer chains. Determine why particular chains are being accessed more than others, and remedy.
Both of these forms of contention also directly impact the response time of the query; they will be covered in "Improving Response Times."
Finally, multitier application architectures allow processing to occur in several places. Often, the database is used to perform processing that is not data-dependent, thereby consuming precious resource from the least scalable part of the system (the single database). Examples include
· Packages with large amounts of non-data-related logic
· Use of SYS.DUAL to submit various ad-hoc requests to the database, such as getting the date, splitting strings using SUBSTR, and so on
This kind of practice should be discouraged whenever possible. The database server is often viewed by application programmers as a mysterious powerhouse that will run their code at lightning speed. Apart from this rarely being true, they also need to understand that they have only a very small piece of the total capacity of the server. Therefore, as much processing as possible should be performed in the application tier, saving the database server for real, data-related requests.
Improving Response Times
This approach targets improvement in specific pieces of code in the application. The improvement can come from changes in the SQL or some kind of physical intervention from the database end. As this is not an SQL tuning book, we will concentrate on database-centric modifications that improve statement response time.
Indexing
The most fundamental aspect to consider when trying to improve response time is the indexing strategy for the associated tables. Indexes are often defined inappropriately or have become inappropriate over time as a result of changes in the cardinality of the indexed columns. It is vital that a B-tree index be very selective (i.e., large number of distinct keys), because an index range scan very quickly becomes much more expensive to process than even a full table scan. If a particular query is found to have gone bad over time, it is almost certainly because of a change in the selectivity of the index. At this point, aggregate count() queries should be run on the underlying data in order to determine where the skew lies. The factor of performance speedup that can be expected from index improvements ranges from 1.5 to 1,000.
Queries can also create, or be exposed to, database contention, slowing down query response time as Oracle resolves the contention. The most common contention points for very large-scale Oracle systems are the library cache latches, the cache buffer latches, and the redo latches.
Library Cache
Library cache latch contention is normally caused by badly written SQL in the application. Starting with version 7.2, Oracle provides multiple child library cache latches to protect the library cache. The latch used by a given statement is determined by the hash value of the statement. The hashing algorithm used by Oracle for SQL statements is very effective, and so the child latch acquisitions are very well distributed among the available latches.
With at least as many latches as CPUs, the system is now capable of parsing SQL at full system bandwidth. However, if a large proportion of the statements in the application are not shared, the contention on the latches will increase for the following reasons:
· Hard parsing requires more time under the latch to complete in comparison with soft parsing.
· Memory management in a fragmented shared pool increases the work performed under the library cache latch.
The contention is not caused by any inherent serialization in Oracle, but as a result of the application consuming much of the system CPU cycles processing parse requests. The symptoms in this case include
· Increased miss and sleep counts for library cache latches
· Rapid consumption of free memory in the shared pool
In addition, overallocation of shared pool memory for the library cache results in contention for free memory by the other components of the shared pool. Most notably, the dictionary cache becomes squeezed down by the aggressive requirements of the library cache. The impact of this is that the system performs a great deal more recursive SQL than if the dictionary cache were allowed to grow to its optimum size.
Buffer Cache
The buffer cache is similar to the library cache in its latch coverage (cache buffer chain latches) and is subject to similar contention on a small handful of the many latches that cover it. The most common causes of contention for cache buffers chain latches all relate to index reads:
· B-tree indexes built on very small tables
· Multiple blocks covered by a common latch
· Root and branch block contention resulting from common search paths through the index
When an index is built on a very small table, it consists of a small number of blocks (possibly only one block). In this case, it is more efficient to do a fast full table scan because of the simpler management of such blocks in the cache. Dropping the index is the most effective solution of this problem.
The number of child latches used by Oracle is determined by the setting of _db_block_hash_buckets, which defaults to be a quarter of the configured db_block_buffers. This means that blocks share the same chain every db_block_buffers/4 in the default case. Access of any block in any given datafile is preceded by a hash of its dba (data block address), which resolves which buffer chain this block belongs to (see Figure 9.6).
On page 446, we looked at a query that will pull out any particular hot latch. If one of the child latches is being slept on a great deal more frequently, this can mean that one of the following is true:
· The latch covers more than one "hot" block.
· The latch covers one extremely "hot" block.
Whether or not the latch covers more than one hot block can be determined by running the query several times in succession and determining whether the same file and block numbers persist in the output. If so, the system will likely benefit from a finer granularity of cache buffer chains. In order to do this, the value of _db_block_hash_buckets simply needs to be increased accordingly, and the instance restarted.
If a single block is causing the problem, the solution will depend on the type of block that is contended for. The most common type of block that undergoes this contention is an index root or branch block, as previously discussed, and has typically been the most difficult problem to address prior to Oracle8.
In Oracle8, partitioned objects were introduced to increase the manageability of very large data objects, in addition to providing several performance improvements resulting from partition elimination. A less common use of partitioned objects is to directly address a problem that was not solvable prior to version 8-index root block contention. The theory is that if the root block is heavily contended for, but blocks further down the tree are not as highly contended for, then the index would benefit from having multiple root blocks in order to distribute the load over more blocks.
The multiple root blocks are created using standard partitioning of an index, where the partition key must be the leading edge of the index key. Although the index itself may well be very small (10MB, for example), it may be reasonable to partition the index into as many as 20 partitions. Although this somewhat increases the complexity of the database layout, it makes the index very much more scalable that it was before.
In addition to contention for buffers because of high read activity, contention can also result from contention for free buffers. A free buffer is required to write any new block into the cache, including any changes in an existing one. The only buffers that are classed as "free" are those that are not currently pinned and contain no unwritten data (i.e., must not be dirty). If there are no free buffers in the cache, Oracle must write dirty buffers to disk in order to allow those buffers to be made available for use by other sessions.
All these data block writes are performed by the database writer(s), and so this operation involves an expensive posting of the database writer in order to clean the dirty buffers. At this stage, the session that requires the free buffers is waiting for the operation to complete (free buffer waits), thus increasing the user response time.
If sessions are frequently left waiting for the database writer(s) to perform work, this directly impacts the response time for all users. Therefore, it makes sense to try to keep the cache reasonably clean at all times in order to preempt the requests for free buffers. This is achieved through the use of the db_block_max_dirty_target parameter in the init.ora.
Setting this parameter tells Oracle to try to limit the maximum number of dirty buffers to the specified number. Therefore, whenever the actual number of dirty buffers rises above this number, the database writer(s) will start to clean the cache (starting at the cold ends of their respective LRU chains), until the number falls below the threshold again.
The number of database writers and the number of LRUs relate directly to Oracle's ability to keep the number below the threshold. If there are insufficient writer processes to keep the cache clean, it will not be possible to keep it clean. In this case, the threshold should be raised or the number of writer processes increased.
This tuning of the cache also helps to reduce the impact of a checkpoint operation. If there are fewer dirty buffers in the cache at the time of the checkpoint, there is less work to do under the checkpoint.
Redo Allocation
Oracle maintains only one significant point of serialization,-the redo allocation latch. Recall from Section 6.8.2 that the redo allocation latch is the only latch that covers the redo buffer. Therefore, the commit rate of the system is ultimately governed by this latch. Because a commit is really the only operation that must wait for acknowledgment of completion, it is very important that the redo management process is as fast as possible.
There are several mechanisms for mitigating the impact of this serialization. First, there is the batching of multiple commits into a single "group commit." This is provided by default and is not associated with any tuning. Second, there is the existence of the redo copy latches. During tuning for a good response time, it is vital that virtually every copy into the redo buffer occurs under one of the redo copy latches instead of under the redo allocation latch. Therefore, the setting of log_small_entry_max_size should be kept very low in transactional systems to force copying under the redo copy latches.
Increasing Batch Throughput
Nearly all transactional systems have a Jeckyl and Hyde existence. During the day they run many thousands of comparatively tiny transactions, whereas at night they run a substantially fewer number of very heavy transactions during the batch cycle. Both of these modes of operation are suited to different types of tuning, and yet no system outage can be taken in between in order to change things.
The only approach that can be taken in this situation is to tune the system for the OLTP users and to accommodate the conflicting batch requirements in ways that do not affect the online user. In recent Oracle releases, the number of parameters that can be dynamically altered on the fly has increased dramatically. The situation in Oracle8i is that the profile of a session can be dramatically altered while retaining the tuning of the remainder of the system.
Of note in the list of parameters that can be changed on the session level are
· db_file_multiblock_read_count
· sort_area_size
· sort_area_retained_size
· sort_multiblock_read_count
· hash_join_enabled
· hash_area_size
· hash_multiblock_io_count
All of these parameters can be "opened up" to allow far greater resource consumption by the batch processes. As there are many fewer batch processes, the additional impact on the system is not as large as that of changing these parameters for all database connections.
In addition to tuning the batch sessions for more optimal execution, operational factors must be considered. Perhaps one of the largest impacts on the throughput of a given batch process is the interaction on it by other batch processes. The actual form of interference from the other batch jobs varies from maintaining consistent views of the data to block cleanout and transaction locks to CPU and disk resource contention.
Keeping batch processes from disrupting each other is quite a challenge in itself, but certain measures can be taken from the programmatic and tuning perspectives to make this job easier. The first thing to do is to carefully analyze the use of transaction locks within the batch programs. Batch routines should not assume exclusive access to a table unless this is actually 100 percent true-it only takes one outstanding lock to put the brakes on any access to a given table. Performing long select...for updates on a table in batch can be as bad as doing so during full online use.
Related to this is the commit frequency of the program. A commit certainly has an overhead for the system, but it is very lightweight in comparison with the restriction of having large amounts of uncommitted data in the database. This results in a great deal of consistent read activity for other programs and several outstanding data locks. Reasonable commit frequency for batch jobs varies among cases, but committing every 10,000 rows or so is reasonable for mass updates, and less for more in-depth processing.
Extending regular commits further, the job should also be designed to be fully restartable. It is almost inevitable that a job will occasionally run past the normal batch window and into the online processing peaks, at which times there is no alternative but to kill the job. If the job is not restartable, this can cause problems that range from manual cleanup to long rollback operations, and almost always means rerunning the job from the beginning the next evening.
Batch processing benefits greatly from the use of Oracle Parallel Query. This extends beyond simply speeding up the individual jobs themselves, allowing a different perspective to be taken on the organization of the batch cycle. Without parallel query, a typical batch cycle will employ "operator parallelism," which means that the batch operator fires off many batch jobs in parallel in order to get the most out of the multiple processors in the system.
The effect of this disorganized use of the system and database resources is that a great deal of overhead must be incurred to manage read consistency. Parallel query allows a different approach to the batch cycle wherein all the large jobs employ extensive parallel processing to use all the available processors on the machine. While any one of these jobs is executing, no other batch is run-the jobs are run sequentially in parallel.
Now that the jobs are running a great deal faster within themselves, there is no need to have many other jobs running at the same time-they will complete faster in series without the overhead of consistent read.
9.3 Chapter Summary
Scalable transaction processing is all about careful design, understanding, and balance. The application needs to be designed to take into account the number of users and the concurrency required as a result. The application developer, database engineer, and system administrators need to understand what is required to build a very large, concurrent system. Above all, the overriding principle is one of balance: Many thousands of database operations occur every second, and it is only through carefully balancing the requests and tuning that these operations can coexist.
This chapter has presented several aspects of transaction processing that need attention. There are plenty of others, however, most of which are application-specific. Therefore, the important factor in making a scalable system is to understand the concepts presented in the previous sections of this book. Once these concepts have been established, and their application has become more practiced, most of the problems that arise can be resolved using logical thought processes.
9.4 Further Reading
Andrade, J. M., et al. The TUXEDO System. Reading, MA: Addison-Wesley, 1996.
Slama, D., et al. Enterprise CORBA. Upper Saddle River, NJ: Prentice Hall, 1999.
Various. Oracle8i Tuning. Oracle RDBMS Documentation.
1
The reason that bind variables must be used is that the optimizer does not consider the values when servicing queries. This enhancement is proposed for a future release of Oracle8.

2
The B-tree level, as reported in dba_indexes.

3
The Open Group is the standards body that defined the XA specification, formerly known as XOpen.

4
Memory used to be an issue in large systems owing to both the price of the memory and the 32-bit address limitations of the operating system. The typical memory size of a large UNIX system in 1994 was 2GB, compared with 32 to 64GB today.



Scale Abilities Ltd
http://www.scaleabilities.co.uk
Voice: +44 1285 644533
info@scaleabilities.co.uk
TOC PREV NEXT INDEX