Friday, November 2, 2012

CLUSTER HA Components HA


Clusterware Components, Processes and Agents
Overview
·         Oracle Clusterware Version 11g Release 2 introduces the concept of the agent.
·         Agents are multi-threaded daemon programs that provide start, start, and cleanup and check actions for different resource types.
·         For example, the oraagent for crsd starts ASM, the oracle listener and starting the SCAN listener.
·         Agents can also receive, process and forward events to clients.
·         The standard agents in Oracle Clusterware 11g Release 2 are oraagent, orarootagent and cssdagent. Additionally there can be an application and script agents.
·         Agents create their own log files. These log files are contained in either ORA_CRS_HOME under a directory associated with the name of the agent.

There are a number of different processes that are associated with Oracle Clusterware. These processes are rolled up into several different Clusterware components. The following table lists the Components, associated processes and provide a description of the function of the component/process(es):

Component
Process
Description
Oracle High Availability Services (OHAS)
Ohasd
This process is responsible for starting the rest of the Oracle Clusterware stack on a given node. Ohasd is a brand new cluster startup framework in Oracle Clusterware 11g Release 2 that replaces the old init scripts.
Cluster Ready Service (CRS)
crsd
See the section titled CRS below for more information on this component and the crsd process.
Cluster Synchronization Service (CSS)
ocssd, cssdmonitor, cssdagent
See the section titled CSS below for more information on this component and the crsd process.
Event Manager (EVM)
evmd, evmlogger
Responsible for publishing Clusterware events.
Cluster Time Synchronization Service (CTSS)
octssd
Provides time synchronization services in an Oracle 11g Release 2 cluster.
Oracle Notification Service (ONS)
ons, enos
A publish-and-subscribe service responsible for communicating Fast Application Notification (FAN) events.
Oracle Agent
oraagent
The Oracle Agent is in conjunction with FAN to run scripts when specific Fan events occur.
Oracle Root Agent
orarootagent
This agent helps CRSD manage resources that are owned by root .
Grid Naming Sertvice (GNS)
gnsd
Provides gateway services between the multicast domain name service (which allows DNS requests) and external DNS services. GNS provides for name resolution within a cluster.
Grid Plug and Play (GPnP)
gpnpd
Supports Grid Plug and Play services, new in Oracle Clusterware 11g Release 2. GPnP provides services that allow you to easily add or remove nodes from a given cluster.
Multicast domain name service (mDNS)
mdnsd
This service services DNS requests.


CRS is responsible for managing HA options within the cluster. The crsd process manages CRS operations. CRS manages two kinds of resources:
  • Cluster resources
  • Local resources 
A cluster resource is a resource that is cluster aware and is managed over the entire cluster via the crsctlcommand. Cluster resources are subject to cross-node switchover and failover. This means that a resource can be assigned to one or more nodes, but may be re-assigned to a different node (of failed over to a different node) on demand. Cluster resources are managed with the CRS daemon (crsd). The OCR is used by CRS to manage the resource.
A local resource runs on each node of the cluster. Examples of cluster resources are RAC instances and listeners. CRS can control these services, starting them, stopping them and restarting them in the event of a failure.

CSS is a service that is responsible for determining which nodes of the cluster are available to the cluster. CSS also supports other cluster processes by providing node membership information and locking services. The CSS uses the private interconnect for communications as well as the Clusterware voting disks. Through a combination of heartbeat messages over the interconnect and the voting disks CSS will determine the status of each node of the cluster. 
CSS is also responsible for interfacing with any third-party Clusterware vendors. In these configurations CSS will interface with the vendor Clusterware and maintain the node membership information.
The CSS service is critical to Clusterware operations as it fences the operations of the nodes of the cluster. For example, if the interconnect fails on a given node then the failed node will no longer be able to communicate with the rest of the cluster. Without CSS controlling the situation, the isolated node could cause severe issues on the cluster including corruption of database data. This is what is known as a split-brain condition.
To avoid split-brain conditions CSS sends heartbeat messages across the cluster interconnect. If a node fails (say the interconnect fails or the node freezes) then that node will no longer send heartbeat messages. The surviving nodes will detect that the heartbeat messages from the node are no longer being sent.  CSS then uses the voting disks to determine which node has gone offline. CSS will then work with Oracle Clusterware to evict the missing node from the cluster.
The CSS uses several different processes. Failure of these process will result in the restart of the cluster. The CSS process are:
  • CSS daemon (ocssd) – Manages cluster node membership information. It’s also used in non-RAC installs to provide Group Services (GS). ASM uses GS to register itself and its disk groups.
  • CSS Agent (cssdagent) – Monitors the cluster and provides fencing services (was oprocd daemon in previous versions). The CSS Agent is also responsible for monitoring vendor Clusterware.
  • CSS Monitor (cssdmonitor) – This process monitors for node hangs, monitoris OCSSD processes for hangs and is also responsible for monitoring vendor Clusterware.

Oracle Clusterware 11g Release 2 changes the way that Clusterware is started. In a Linux install, Clusterware is now started with one init script, init.ohasd which replaces a number of scripts that were previously used. The ohasd daemon sets off a cascade of processes as outlined in the following graphic:


Note: This graphic only summarizes the processes started by Oracle Clusterware.
You can control the startup or shutdown of the cluster via the crsctl command. For example, use crsctl start cluster to start the cluster and crsctl stop cluster to stop the cluster. You can also use the crsctl check cluster command to check on the status of the cluster. See the section titled “Managing Oracle Clusterware” for more information on crsctl and managing Oracle Clusterware.

No comments:

Post a Comment