TOC PREV NEXT INDEX

Scale Abilities Ltd Logo


Chapter 6
Oracle8 in Operation
6.1 Introduction
This chapter will cover the other side of the Oracle database-the process and memory architecture. These are the aspects of Oracle that make it a scalable database as opposed to a static filestore, and a good understanding of these concepts is essential for building a very scalable Oracle system.
The combination of these operational aspects of Oracle form the Oracle kernel.
6.1.1 The Oracle Kernel
The Oracle kernel is not a kernel in the same sense that an operating system kernel is. All execution on the host operating system occurs in user mode, with standard system calls made to access system functionality. Oracle does not have any special privileges with regard to memory visibility or system function.
Instead, the Oracle kernel is a set of processes and shared a memory segment upon which they operate. The processes and shared memory combined form an instance of Oracle. The physical side of Oracle on which the instance works is an Oracle database. Although these two entities are very much intertwined, Oracle Parallel Server provides ways to allow multiple instances to operate on the same database. This will be covered in Section 6.9.
Having declared that Oracle is not like an operating system kernel, it is time to contradict myself in some respects. The Oracle kernel is very much like an operating system kernel in that it performs the following functions on behalf of user connections:
· File I/O (through the operating system)
· Caching
· Process scheduling
· Runtime memory management (especially when using multithreaded server)
However, Oracle does not provide the following operating system kernel functions:
· Virtual memory
· Cache coherent multiprocessor support
· Timesliced process execution
· Direct hardware interfacing
These functions are all provided by the operating system and are fundamental to the operation of Oracle. They are so fundamental, in fact, that most hardware vendors provide a variety of special facilities in their operating system kernels for Oracle to use. These facilities include
· Asynchronous I/O to take the I/O scheduling burden away from Oracle
· Physically locked virtual memory that is not subject to any further virtual memory management, such as paging
· The ability to stop processes from being preempted by the operating system (OS) kernel for short periods while holding latches
In this chapter, we will look in detail at some of the services provided by the Oracle kernel.
6.2 Process Architecture
This is the most straightforward part of the Oracle system to understand. An Oracle instance consists of two different types of processes:
· Shadow (or server) processes
· Background processes
All processes in an Oracle instance are considered to be part of the Oracle kernel. In fact, all background and shadow processes are invoked from the same binary image-that of $ORACLE_HOME/bin/oracle. In this way, Oracle can rely on the fact that all the connected processes will follow the same rules about accessing objects in the SGA, and will not corrupt other sessions as a result of invalid writes to the shared areas.
6.2.1 Shadow Processes
The shadow processes constitute the server-side portion of a user TWO_TASK connection and therefore are also known as "server processes." There are two different architectures provided in Oracle8 for database connections: the dedicated server and multithreaded server (MTS). Whichever option is adopted, the shadow process is the process that performs the actual request servicing within the database.
Each process attaches to the Shared Global Area (SGA) for read and write use of the global caches. The process waits for requests to come over the network either directly (dedicated server) or through a dispatcher (MTS). When a request comes in from the client application, the server process interprets the request through the Net8 software that forms part of the process.
When the Net8 work has been completed, the process acts directly on that request on behalf of the user. Although the shadow process has Oracle kernel privileges and can do anything at all to the database or SGA, the process will not allow this unless the user connection is authorized to do so. However, when the user session has permission, the shadow process will do all the I/O, sorting, package execution, and whatever else the user session has requested.
In this way, the shadow process can be considered a proxy worker for the client application. The client application never accesses the SGA or the database itself directly, but only through the shadow process. It is the responsibility of the shadow process to enforce the security of the database and to service all work on behalf of the user.
Dedicated Server
Dedicated server processes are private to the user. That is, there is one process for each connected user, and that process is dedicated to the processing of that user's requests only.
The oracle executable is renamed on execution to become something like
oracleSID (LOCAL=NO)

or
oracleSID (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))

for remote and local connections, respectively. Note that the actual oracle executable on disk is not renamed; only the name of the running process is changed.
The dedicated server process maintains its own private working space and caches in a private piece of memory called the Process Global Area (PGA). The working memory is also known as session memory and is the portion of memory that stores the context of the session for the user session.
The dedicated server process listens on the direct network connection that was setup when the user connected to the database. The network connection is private between the dedicated server shadow process and the user application client-no other processes can use this network connection.
With one physical process for each user connection, large dedicated server-based systems can put a large burden on the host operating system in scheduling the processes, in addition to an increased overhead in memory management.
Multithreaded Server (MTS)
The multithreaded server removes dedicated shadow processes from the system by switching user connections in and out of a pool of shared servers.
In Figure 6.1, three users are connected to the database using both dedicated server and the multi-threaded server. Where the dedicated server maintains a one-to-one relationship with its corresponding client process, the MTS clients are sharing just two servers from a common pool.
The key component to consider in an MTS configuration is the dispatcher process. The dispatcher process acts as a software switch, listening for incoming requests from the client and connecting the client to a free server when required. The client is said to have a virtual circuit to the server process. During the time that the server is processing the request for the client, it is unavailable for any other session. Really, shared servers are not multithreaded at all, but serially reusable.
In order to support this server reuse by multiple client sessions, Oracle moves the session information, which was formerly stored in the PGA of the server process, into the SGA. Specifically, the memory is allocated from the large pool, or from the shared pool if no large pool is defined. This memory allocation is not insignificant and can increase the size of the SGA by many hundreds of megabytes in a large system. In reality, the memory savings offered by MTS are not great, if any.
The increase in required SGA memory can have other consequences on some systems. Many 32-bit platforms have quite low restrictions on the maximum size of the SGA, for example. On some platforms, this can mean that the maximum size of the SGA is less than 1GB, which somewhat limits the maximum number of connections possible with MTS.
In addition, the enlarged SGA can cause problems with the virtual memory management of the host operating system. If the host does not support the locking of the SGA into real memory, each server process that attaches to the SGA must maintain page table entries (PTEs) for each of the (typically) 4KB pages that make up the SGA. With a 4GB SGA, this equates to around one million additional page table entries to manage for each process. If the SGA can be locked into memory, all processes can share a common set of page table entries and so this is less of a problem.
The MTS shared servers are classified as background processes by Oracle, as are the dispatchers. However, unlike the other "real" background processes, Oracle will not crash if the processes are killed from the operating system. The impact of killing shared servers is minimal; the termination will be detected, and a replacement server will be started. All sessions will be unaffected by this operation unless they were active in the shared server at that time. Killing a dispatcher will terminate all sessions connected through it.
6.3 Net8
Net8, formerly known as SQL*Net, is the software used on both sides of the client/server connection in order to provide a platform-independent transport layer between client applications and the database server. It has become increasingly complex over the last few years, but the principles remain the same from a tuning perspective.
We are not going to cover any of the Net8 functionality here; this is well covered in the Oracle documentation. Instead, there are a few basics that need to be understood, particularly if your application is to be rolled out over a large geographic area.
The fundamental thing to bear in mind with Net8 is that it operates over a network. A network, like any communication medium, has two primary measures of "speed": latency and bandwidth. It is frequently the latency aspect that is overlooked, resulting in poor application response times for the user.
If an application is written in SQL*Forms, and each user runs the actual form locally on their PC, the Net8 communication is between the actual user PC and the database server. If the two are separated by a wide area network (WAN), problems frequently arise as a result of the latency of the underlying network rather than any particular Oracle problem.
As an extreme example, imagine locating a worldwide database server in North America and user PCs as far away as Asia. This wide area network could be implemented using satellite technology, providing a high-bandwidth network at relatively low cost. Unfortunately, the latency of a satellite-based network is approximately 0.5 second for each hop between the client and the server. This means that a round trip takes about 1 second and forms the minimum response time for any database access.
Database access includes parsing, every fetch, and so on. This means that a query on sys.dual will take about 3 seconds using this network. Increasing the bandwidth of the network will not improve this problem at all, because it is the latency of the network that causes the problem.
Therefore, the first rule of Net8 should be to keep network latencies between the client and the server to a minimum, preferably on the same physical local area network (LAN) in the datacenter.
The second rule of Net8 also relates to the underlying network. There is a tunable for Net8, specified in the tnsnames.ora, called the SDU (session data unit). This is the size of the data that Net8 will send to the native network transport for sending across the network.
There are occasions when the default setting of the SDU is not appropriate and the value must be increased. Typical reasons for increasing this value include
· Client performing large array fetches from the server
· Large number of requests not fitting in one physical network frame
The second reason is of particular importance where the network has a higher latency than desired. One symptom of a small SDU is when sessions are frequently found to be in "SQL*Net more data to/from client" wait states.
6.4 The SGA Components
6.4.1 High-Level View
The SGA is the area of shared memory that Oracle creates on start-up. All Oracle processes (background and shadow processes) connect to this shared memory when they are started. The various areas of the SGA are provided to meet the following needs of the database system:
· Caching
· Concurrency
· Consistency
· Control
On its own, the SGA does nothing at all-it is just a piece of shared memory. It is the use of the SGA by the Oracle kernel that provides all of the above-listed attributes of the database.
The SGA, shown in Figure 6.2, is composed of four distinct components:
1. Fixed region
2. Variable region
3. Buffer cache
4. Redo buffer
All of these areas, with the exception of the redo buffer, are comprised of many other subareas.
6.4.2 Fixed Region
The fixed region is very small, typically on the order of 40-60KB. Within this region Oracle stores singleton latches, such as the redo allocation latch, in addition to the parent latches for variable lists of latches. The contents of this region are defined by a platform-specific assembly language source file called ksms.s. This file used to be resident in $ORACLE_HOME/rdbms/lib prior to release 7.3, but is no longer shipped. It was used as the mechanism for relocating the SGA, but this is now handled through init.ora parameters, and the assembly language file is no longer needed.
This is a shame, because the file was essentially a symbolic map of the fixed region of the SGA. For example, there was an entry showing
.set kcrfal_,sgabeg+19072

This shows that the structure for kcrfal (redo allocation) is stored in the fixed region, 19,072 bytes from the beginning of the SGA. Part of this structure is the actual latch itself.
The names of structures relate to the names of the software modules that contain them and therefore are considered Oracle Proprietary (and therefore cannot be published). Anyway, it's immaterial now, because ksms.s is no longer shipped.
Essentially, the fixed region contains the structures that relate to the contents of V$LATCH: If you subtract the lowest ADDR from v$latch from the highest ADDR, you will get an approximate size of the fixed region. It is called the fixed region because this component is sized at compile time. No amount of change in the init.ora will change the size and structure of the fixed region, only its location in memory.
Now is a good time to get into some specifics of how Oracle implements latches.
Anatomy of an Oracle Latch
In the example above, we state that the kcrfal structure starts at an offset of 19,072 bytes from the beginning of the SGA. This structure might look a little like that shown in Figure 6.3.
The latch is physically stored at the head of the structure. Due to the serialization provided by the system bus, we already know (from Chapter 2) that we can perform atomic writes to memory locations of certain sizes. The limit on the size of the location that can be atomically updated is dependent on the platform (processor) architecture.
In this example, we assume that a 32-bit atomic memory write is available. Therefore, the mechanism for acquiring the latch is simply a test-and-set operation on the memory location at 19,072 bytes into the SGA memory region. If the test-and-set operation is successful, you have acquired the latch and can continue to modify any other part of the structure, copy data into buffers, or modify several elements in a linked list of which this structure is the head. When the work is complete, the memory location is reset back to the "unset" value, and other sessions can allocate the latch.
You may have noticed that the latch location has been "padded" to 32 bytes in length. This, again, is just an example and assumes that the length of a single CPU cache line is 32 bytes on this platform. The reason for the padding is to guarantee that the latch itself is isolated in its own cache line and is not coexisting with another, unrelated latch. If this were not the case, it would result in false contention on the cache line, making the allocation of a little-used latch as difficult as that of a highly-contended-for latch.
6.4.3 Variable Region
The variable region contains the following elements:
· Shared pool
· Large pool
· Variable lists of latches (i.e., all the cache buffers chains latches)
· All lists (cache buffers chains, LRU lists, etc.)
In turn, the shared pool contents include the following elements:
· Library cache
· Dictionary cache
· NLS loadable objects
· Session and process context
· Enqueue resources
· Session memory (if running MTS and no LARGE_POOL is defined)
The large pool contains:
· Session memory (if running MTS)
· Parallel query buffers
· Oracle backup work areas
6.4.4 Buffer Cache
The buffer cache is an area of the SGA that is set aside for the caching of data blocks. From release Oracle8 onwards, it can contain up to three distinct "buffer pools," which are separately managed caches for different types of data usage:
· DEFAULT
· KEEP
· RECYCLE
The default buffer pool is the buffer cache that is created by default and has all objects assigned to it by default. This can be directly compared to the single buffer cache found in Oracle7. Optionally, two other buffer caches (pools) can be defined, which can be tuned to behave more suitably for specific situations. These two additional caches are the KEEP and RECYCLE caches (see Figure 6.4).
All three of the buffer pools have separate allocations of buffers and LRU lists that manage the buffers. The RECYCLE and KEEP pools are explicitly defined in the init.ora, whereas the DEFAULT pool allocates the remainder of the buffers and LRU lists for itself.
There are subtle differences between the three buffer pools, and some of these are highlighted next. One of the major differences, however, is that we now have the ability to individually tune the pools individually, dependent on their use.
RECYCLE Buffer Pool
The idea of the RECYCLE buffer pool is to use it more as a work area than a cache; this pool is used to store blocks that are virtually never reused after the initial read. Even though the blocks will rarely be reused, Oracle still needs to run all database blocks through the buffer cache in order to maintain consistent read views, and thus a buffer cache is still necessary for these blocks.
However, in a single buffer cache configuration, these single-use blocks could quickly dominate the buffer cache, implicitly flushing out other blocks that will be reused. Blocks that are frequently referenced are not affected by this cache pollution, because they will always be at the most recently used (MRU) end of the LRU lists. It is blocks that are reusable, but are not necessarily used very often, that are affected by this kind of cache pollution.
One operation that can pollute a cache is random single-row lookups on a table that is much larger than the cache itself. Each time one of these rows is read from disk, it must occupy an entire buffer in the cache. If these reads are part of an index-driven scan across the table, all of these buffers will need to be allocated from the cache, quickly dominating the space in the cache. Other blocks will need to be flushed from the cache to make space for these incoming blocks.
In this case, the table that is being accessed in this way can be moved to the RECYCLE pool. The idea of this pool is that we just give Oracle a work area for the blocks but don't really expect any kind of reusability. This having been said, the standard LRU mechanism still applies to this buffer pool but is managed in isolation from the other buffer pools. Therefore, if a few of the blocks in this table are reused from time to time (such as an index root block, as opposed to the leaf blocks), they will still remain in the cache if the access frequency merits the retention of the block.
When allocating a RECYCLE pool, we typically want it to be just large enough to allow contention-free allocation of the buffers. If the buffer is created too small, there will be too much demand for the number of buffers available, and potentially reusable blocks will be eliminated from the cache before they are requested again. However, this cache is typically not directly aiding system performance a great deal and so should not be wasting memory that could be used by the other caches. The sizing of the RECYCLE pool can require a little trial and error, and Oracle recommends an initial setting of one-quarter the number of buffers that the object occupies in the DEFAULT pool prior to being moved.
KEEP Pool
The KEEP pool is for the allocation of buffers for objects that are accessed with medium frequency, or those for which a consistent response time is desirable. Blocks that are accessed very often will remain in cache anyway, even in the DEFAULT or RECYCLE pools. The KEEP pool is for the blocks that are not accessed quite enough to remain in cache, or those with a mixture of hot blocks and warm blocks.
Using the KEEP pool, it is possible to assign just a few small objects to the cache and thus have the peace of mind of knowing that the buffers will never be aged out by unrelated blocks over time. This allows consistent response times for queries that access that object.
In order to support this a little more than the DEFAULT pool, some subtle changes are made in the algorithms used for the KEEP pool. First, to minimize cache pollution by excess CR versions of buffers (see Section 6.5.5), these buffers are not created in the KEEP pool. As they are assumed to be short-lived, CR versions of buffers in the KEEP pool are created in the DEFAULT pool in order to keep the content of the KEEP pool as static as possible.
In addition, the LRU algorithm used in the KEEP pool is slightly different. This will be discussed on Section 6.5.3.
6.5 Operation of the Buffer Cache
6.5.1 Introduction
The buffer cache is a portion of the SGA that is set aside for the caching of database blocks. As memory access is approximately 100,000 times faster than disk reads, it is highly desirable to service as many data requests from buffers in the cache as possible, without going to disk.
This is the ultimate goal of the buffer cache and is often the only information needed about the buffer cache in order to run a system fairly effectively. Simply keep an eye on your cache hit ratio:
Hit ratio = 1 - [physical reads/(db block gets - consistent gets)]1
However, when the buffer cache is pushed very hard, it is useful to understand the workings of the buffer cache.
The buffer cache is a set of buffers in memory, each sized as one Oracle block. When a query needs a block of data from disk, Oracle first checks the buffer cache for the presence of that block in memory. This search is performed using hash chains.
6.5.2 Hash Chains
The hash chains are fixed-depth lists of pointers to the actual buffers in the cache. A conceptual view of the list elements is shown in Figure 6.5.
When Oracle has determined which block it requires, it can then hash this dba2 to gain the ID of the hash chain that it should search. The buffer pool that the object is assigned to does not form part of this hashing algorithm; the entire set of hash chains is shared among all the buffer pools, with the chain ID determined by dba only. This means that an object that is cached in the RECYCLE pool could easily share the same hash chain as an object in the KEEP pool by virtue of dbas that hash to the same value.
Oracle maintains many hash chains, and they can be explicitly set using the _db_block_hash_buckets parameter in the init.ora. The maximum total number of elements in all of the hash chains is the same as db_block_buffers, because the hash chains are a list of the contents of the actual buffer cache.
Let's start an example that highlights this concept and the others that follow. In this example, we assume that we are using a single buffer pool, as in Oracle7.
A user submits a query to the database. It is determined that dba 153 is required as one of the blocks that is needed to execute this query. Oracle then hashes this dba, using the total number of hash chains as one of the hashing parameters. This yields a result of SEVEN, and so Oracle has determined that this dba will be listed in hash chain number 7 if it is present in the cache.
Oracle then starts at the top of hash chain 7 and checks the actual dbas of all the entries until it either hits the end of the chain or finds a match on the requested dba.
If the dba is matched in the hash chain, Oracle then checks to see if this buffer is OK to use. The information required to determine this is also likely to be stored within the hash chain element itself. One reason that the buffer may not be OK to use could be that the buffer is for an incompatible CR version of the block. If the buffer is not available for use, the hash chain is traversed further to determine whether more buffers store the same dba. If none of these buffers are available, or no further matching buffers are found for that dba, Oracle does not have a buffer in the cache that can satisfy the requirement and must go to the LRU list to create a new one (see Section 6.5.3).
If the buffer is found and deemed to be usable, Oracle can heat the buffer on the LRU list (again, see Section 6.5.3) and pin the buffer while it is in use. This pinning operation is a little like a latch but operates on a single element in the chain.
At this point, the buffer request has been serviced for the query, and Oracle can proceed to read the buffer and to read the actual contents of the buffer to act on the query. After use, the buffer is unpinned and is available for other sessions.
Twice in this section we have mentioned the LRU list, so let's go into some more detail on this.
6.5.3 LRU Chains
In order to have a cache that is based on frequency of use, a list that shows how often the blocks have been accessed needs to be maintained. In the Oracle kernel, this is implemented using an ordered LRU list, where the blocks with entries at one end of the list have been accessed most recently and the blocks entered at the other end of the list have been accessed least recently. This is the basis of a simple LRU chain.
In Chapter 1, we saw an example of an LRU chain in operation (see Section 1.5.4). This is the way Oracle managed its LRU chain right up until version 7.2. In this release, Oracle split the single LRU chain into multiple LRU chains in order to improve the scalability of the system.
We have two scenarios passed down from the section on hash chains:
1. Heating the buffer when it is located successfully in the hash chains
2. Finding a buffer to use when a block is not present in the cache, or when one is needed to create a CR version of the block
A simple view of the heating procedure is covered in Chapter 1 (see Section 1.5.4). Each time we access a block, it becomes the most recently used (MRU) buffer by implication. Therefore, the buffer is heated to the MRU of the LRU list. This heating process ensures that blocks that are not used as often as others will get implicitly aged down the LRU list by the heating process of the other blocks. When they get to the LRU end of the list, they become targets for replacement.
This is where the buffers come from in the second scenario. Oracle will go to the relevant LRU list when it needs a buffer and will take the first available buffer slot from the list. Note: The buffers at the LRU end of the list may not necessarily be available for use, because they could be dirty buffers (modified, but not written to disk).
The buffer is then heated to the top of the LRU list and is moved from its prior hash chain to the correct hash chain for the dba of the new block.
With the introduction of multiple LRU lists, sessions can no longer assume the location of a single LRU list. Instead, each session will round-robin between the multiple LRU lists, based on the buffer pool in which the object is cached. If an LRU is locked, the session will move onto the next LRU without waiting.
This is the basis of a simplified linear LRU mechanism, applied to the Oracle buffer cache. However, Oracle implements LRU slightly differently in order to gain better performance from the cache.
Oracle 8.1 Buffer Cache LRU
This description applies to Oracle 8.1 only. Prior (and probably future) releases of Oracle do not (will not) necessarily operate in this way.
When Oracle reads a block into the default buffer cache, it is not heated to the MRU end of the list immediately. The new block is generally inserted into the middle of the LRU list, because it is not known at this stage whether the block will be accessed enough to deserve a slot at the MRU end.
There then follows a short period of time during which the buffer will not be heated, even if it is accessed again. This timeout period is designed to be used in situations in which a block could be read and updated in quick succession and then never used again. One example of this would be a single row update: Oracle reads in the block in order to determine whether it meets the criteria for update and then modifies the block. This kind of activity could quickly dominate the MRU end of the cache in an OLTP system if the timeout period were not imposed. If the block continues to be accessed outside of this period, it will be heated to the MRU end of the list.
In the KEEP pool, the blocks are all assumed to be useful, especially because the kernel does not need to cater to CR buffers in its LRU algorithm. Therefore, all new blocks going into the KEEP pool are heated on first access.
Oracle 8.1 adds further optimizations for the LRU algorithm. Using the simple "read and heat" algorithm, every access to a block requires an LRU operation in order to heat the buffer, imposing a significant overhead in buffer access.4 In order to improve on this, 8.1 implements an LRU algorithm that keeps a reference count for each buffer. No heating occurs at all, unless the buffer hits the end of the LRU list and has a high enough reference count. This way, the required LRU manipulations are very much reduced.
6.5.4 Latch Coverage
The buffer cache is covered by a large number of latches, typically more than all other parts of the Oracle kernel combined. This is vital, because the buffer cache needs to be completely free of corruption and yet retain a high degree of concurrency in access.
First, every hash chain is protected by a latch. These latches are the "cache buffers chains" latches and, like all child latches, are reported in v$latch_children. This is one of the important aspects of a system to watch, especially when the application is very heavy on the buffer cache.
Viewing v$latch_children, the number of gets and sleeps can be determined for each child latch. It is possible and likely that a small number of these child latches are requested much more often than the others. This is sometimes attributed to a single block being heavy, but more often than not it is the result of several blocks hashing to the same chain. When this happens, a large proportion of accesses to the buffer cache become serialized through this latch. The following query will report on the hottest hash chains:
SELECT a.HLADDR,a.DBARFIL,a.DBABLK,b.GETS,b.SLEEPS
FROM v$latch_children b,
sys.x$bh a
WHERE a.HLADDR=b.ADDR
AND a.state != 0 /* ie, currently in use*/
AND b.sleeps > ( SELECT 100*avg(SLEEPS) from v$latch_children
WHERE name='cache buffers chains' )
ORDER BY b.SLEEPS desc

The output from this query might look a little like this:
HLADDR DBARFIL DBABLK GETS SLEEPS
-------- ---------- ---------- ---------- ----------
AA416498 158 3139 538830353 1588779
AA416498 208 374 538830353 1588779
AA416498 480 3343 538830353 1588779
AA41CED8 480 3543 1974346642 950985
AA41CED8 158 3339 1974346642 950985
AA33A8C8 157 12803 555346192 112729
AA3F5268 307 1283 148471828 93581
AA340B98 85 71701 263159467 81874
AA3C1498 480 783 554143110 78639
AA3E48C8 72 58417 358435709 46844

This shows several things about the operation of the hash chains in Oracle. First, there are multiple completely different blocks in the database on the same hash chain (HLADDR). This shows the hashing process sharing the chain among a diverse range of dbas. Second, it can be seen that there is still heavy contention for the first hash chain. There are 50 percent more sleeps occurring on the latch protecting this chain than on the next hottest chain, even though it has been acquired only one-quarter as often. This is likely to be caused by the duration of each latch operation that protects the chain, or by a series of "burst" allocation requests.
In this example, a good next step is to create more hash chains using the _db_block_hash_buckets parameter. The top two hash chains have two file/block combinations that could be the objects contended for: File 480, blocks 3343 and 3543; or File 158, blocks 3139 and 3339. The only way we can determine which of the objects is getting requested so much is to create more hash buckets and hopefully split up the dba ranges into different hash chains: The guilty object would then be clearly visible.
The second set of latches over the buffer cache come from every LRU chain being protected by a latch, shown as "cache buffer lru chains" in v$latch_children. There are many times fewer LRU chains than there are cache buffers chains, because there is a minimum ratio of buffers to LRUs enforced at start-up time of 50 buffers per chain.
6.5.5 CR Versions
CR (consistent read) block creation was first discussed in "Block Header" in Section 5.5, in the anatomy of an Oracle block. In order to support the CR model provided by Oracle, the buffer cache must be employed to supply buffers that contain versions of a given block that differ from its on-disk representation. There may be several different versions of the same block in cache at any one time. This is a CR block.
When a CR version of a block is needed by a query, it will first of all determine if there is any version of the block in cache at that point. If so, it will choose the version that has a version number closest to, but no lower than, the desired one. If the version number were lower than the desired number, the block would need to be rolled forward from redo log; CR is supported from undo information only.
If a buffer is found that can be used to construct the CR block, it is cloned onto a free buffer in the cache and assigned to the same hash chain as the original. If no version can be found, the block is read off disk.
Once the buffer is available for constructing the CR block, the last relevant piece of undo information is applied to the block, as determined by the ITL entries in the block. This rolls back both the contents of the block and the ITL entries themselves. This process repeats until the change number of the block matches that of the start of the query. At this stage, we have a CR version of the actual block.
As previously discussed, this block is never written to disk. It exists only in the cache, as a reflection of a previous version of a block. Writing the block to disk would mean corrupting the database.
6.5.6 DBWR Operation
In our discussion of the LRU list, we referred to a dirty buffer that could be found at the tail end of the LRU list. The list of dirty buffers is also maintained within the LRU structures. The reason for this is simple: There is no point in writing out dirty buffers if they are about to get updated in the very near future.
However, it can be seen that without a formal way to clean (write out) these buffers, the cache would quickly become full of dirty buffers. This never happens in reality, because a given session will search only a certain percentage of the LRU list for a clean, reusable buffer. If it does not find one before hitting the threshold, the session will post5 the database writer and wait on "free buffer waits" until buffers become available for reuse.
Every time the database writer is woken up to write out some dirty buffers, it will start at the cold end of each LRU and scan toward the hot end until a complete write batch full of dirty buffers is found. The size of this write batch is determined by the write batch size. This used to be a parameter in the init.ora but is now derived according to the algorithm in the Oracle8 Tuning Guide ("Internal Write Batch Size").
Another way in which the database writer is woken is by the checkpoint process. The checkpoint process doesn't actually do the writing associated with the checkpoint; it simply posts the database writer to write dirty blocks that are less than a given change number out to disk. This is an incremental checkpoint, because not all the dirty buffers need to be written out at each checkpoint. Each time a checkpoint occurs, this determines where rollforward recovery must start in the event of a failure prior to the next checkpoint.
Multiple database writers can operate on a given buffer cache/instance. Each database writer will work on LRUs in a round-robin fashion, ensuring that they do not contend with themselves for the latch. This implementation of true (i.e., independent) database writers is an improvement on Oracle7, where multiple database writers were implemented as a master/slave arrangement.
6.5.7 Flowchart Summary of Buffer Cache Operation
The flowchart presented in Figure 6.6 should serve as a useful reference for the operation of the buffer cache. It is not a complete view of the operations but provides a simplified version for ease of use. In particular, the two items marked with asterisks should be viewed in conjunction with the text in the relevant preceding sections.
6.6 Shared Pool Structure
The shared pool (see Figure 6.7) is a portion of the SGA, and itself contains several components.
These components compete for space within the space allocated to the shared pool. This means that if a poor cache hit ratio is being achieved in the library or dictionary cache, the other caches in the shared pool will be reduced in size proportionately, attempting to allow the badly controlled area to stabilize.
The "Other" portion of the shared pool is used to cache other structures needed for operation of the instance. The detail of the components in the shared pool (and the rest of the SGA) can be found by querying v$sgastat:
POOL NAME BYTES
----------- -------------------------- ----------
fixed_sga 47852
db_block_buffers 167772160
log_buffer 1048576
shared pool free memory 294079744
shared pool miscellaneous 5625880
shared pool transactions 3424000
shared pool db_files 264736
shared pool table columns 72928
shared pool SEQ S.O. 336000
shared pool KGK heap 24408
shared pool db_handles 1470000
shared pool KQLS heap 8121648
shared pool fixed allocation callback 3560
shared pool branches 960240
shared pool Checkpoint queue 347888
shared pool PLS non-lib hp 2104
shared pool ktlbk state objects 1712000
shared pool partitioning d 32024
shared pool db_block_hash_buckets 6062232
shared pool DML locks 1856000
shared pool State objects 2913312
shared pool KGFF heap 383656
shared pool trigger defini 18360
shared pool distributed_transactions- 392168
shared pool db_block_buffers 8683520