Thursday, November 1, 2012

Cache Coherency


Cache Coherency
GCS synchronizes global cache access, allowing only one instance at a time to modify the block. Thus, cache coherency is maintained in the RAC system by coordinating buffer caches located on separate instances.
GCS ensures that the data blocks cached in different cache buffers are maintained globally. That is why some people prefer to call cache fusion a ‘diskless cache coherency’ mechanism. This is true in a sense, because the previous Oracle parallel server version (OPS) utilized ‘forced disk writes’ to maintain cache coherency.

Global Cache Service
·         GCS is the main controlling process for cache fusion.
·         It tracks the location and status (mode and role) of the data blocks, as well as the access privileges of the various instances.
·         GCS guarantees data integrity by employing global access levels.
·         It maintains block modes for data blocks in the global role.
·         It is also responsible for block transfers between instances.

In a RAC system, users can connect with multiple instances to run database queries. Typically, users will be connected to different nodes but access the same set of data or data blocks. This situation demands that the data consistency, formerly confined to a single instance, be effectively extended to multiple instances. Therefore, buffer cache coherence from multiple instances must be maintained.
Instances require three main types of concurrency:
·         Concurrent reads on multiple instances — When users on two different instances need to read the same set of blocks.
·         Concurrent reads and writes on different instances — A user intends to read a data block that was recently modified, and the read can be for either the current version of the block, or for a read-consistent previous version.
·         Concurrent writes on different instances — When the same set of data blocks are modified by different users on different instances”

·         Cache Coherency demands that even though there are multiple instances (each with a separate db_cache_size data buffer region) in which data blocks can reside or brought in, block consistency must be maintained.
·         Oracle RAC achieves this by following the inter-instance block transfers through Cache Fusion mechanism.
·         The global cache services (GCS), which is implemented as a set of processes, organizes this facility.
·         GCS also ensures that only one instance modifies the block at any given time. Even when the same data block is cached in different instances at the same time, global consistency is maintained.
Data Block Writing Method
Oracle follows the concept of Dirty Block and Past Image of the block. Let’s understand what they are.
Whenever a server process changes or modifies a data block, it becomes a dirty block. Once a server process makes changes to the data block, the user may commit transactions, or transactions may not be committed for quite some time. In either case, the dirty block is not immediately written back to disk.
Writing dirty blocks to disk takes place under the following two conditions:
·         When a server process cannot find a clean, reusable buffer after scanning a threshold number of buffers, then the database writer process writes the dirty blocks to disk.
·         When the checkpoint takes place the database writer process writes the dirty blocks to disk
As we are aware, a typical data block is not written to the disk immediately, even after it becomes dirty as the result of an update.
When the same dirty data block is requested by another instance for write or read purposes, an image of the block is created at the owning instance, and only that block is shipped to the requesting instance. This backup image of the block is called the past image (PI) and is kept in memory.
In the event of instance failure, Oracle can reconstruct the current version of the block by reading the PIs from RAM. It is also possible to have more than one past image in the memory depending on how many times the data block was requested in the dirty stage. The process of writing the blocks back to the I/O device (disk storage unit) depends on the checkpoint schedule defined by the DBA for the RAC cluster. Once the checkpoint interval is reached, Oracle’s Database Writer (DBWR) process initiates an asynchronous write of the dirty blocks to disk.
When the write takes place, a message is sent across Cache Fusion to change the status for the block in the other instances and the past images (PI), on all other instances are invalidated and discarded.
For more details, refer to Oracle Metalink Document Note # 139436.1 titled, “Understanding 9i Real Application Clusters Cache Fusion.”
Internal Lock Messaging in RAC
Remember, Oracle uses a lock escalation mechanism to maintain cache coherency. There can only be one block buffered in the “xcur” exclusive state in the cluster at any one time and to modify a block, each instance must assign an xcur state to the buffer containing the block. 

For example, if another instance requests reading the same block in its most current version, then oracle sends a message to change the access mode from exclusive to shared, sends the block to the requesting instance and keeps a Prior Image (PI) buffer if the buffer contained a dirty (changed) block. It then sends a “current read” version of the block to the requesting instance. The original instance keeps a copy in current mode, but the overall status of the block becomes global. Again, there can be multiple copies of the shared current (scurmode) cached at any time.
In early versions of Oracle OPS, one master instance kept track of the lock status, so if the master instance crashed, the entire OPS system went down. Obviously, this was a serious shortcoming, remedied in RAC. In later versions of OPS and RAC, only the uncommitted transactions on the instance that goes down are lost. The other instances stay active.
In RAC there is still a master node, but while the first node to start-up becomes the “master” node, it is strictly a bookkeeping method, and there are no repercussions to the cluster if the master node dies. The Cache Fusion mechanisms for Global Caching Service (GCS) and Global Enqueue Service (GES) are global resources, running on all nodes in the cluster, serving to maintain copies of the global dictionary.
Now that we understand the RAC block updating process, we are ready to move even deeper into RAC internals. Our next installment will examine RAC invalidation mechanisms.

No comments:

Post a Comment