Overview of Monitoring and Tuning Oracle RAC
Database Reliability Framework
The Database Reliability Framework (DRF) is a proactive and automatic monitoring and correction framework.
The Database Reliability Framework monitors various metrics across different layers of the database continuously for the purpose of detecting problems before any disruption of service occurs. DRF improves database availability by monitoring critical events in the database and taking corrective actions when these critical events hit certain thresholds.
After a problem is identified, an action is implemented automatically. Actions include resizing internal memory structures or changing the priority of Oracle RAC processes, depending on the identified problem. For example, consider a system which has high
redo waits with no I/O contention based on the metrics collected over time. If there is enough CPU resource available, then a possible action plan for reducing the
redo waits is to move the LGWR process to higher priority to ensure it gets enough CPU.
Verifying the Interconnect Settings for Oracle RAC
Use SQL statements to verify the interconnect settings for Oracle RAC.
The interconnect and internode communication protocols can affect Cache Fusion performance. In addition, the interconnect bandwidth, its latency, and the efficiency of the IPC protocol determine the speed with which Cache Fusion processes block transfers.
Cache Fusion A diskless cache coherency mechanism in Oracle RAC that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.
To verify the interconnect settings of the Oracle RAC database instance to which you are connected, query the
V$CONFIGURED_INTERCONNECTS views. For example:
Verify Interconnect Settings with V$CLUSTER_INTERCONNECTS
SQL> SELECT * FROM V$CLUSTER_INTERCONNECTS; NAME IP_ADDRESS IS_PUBLIC SOURCE --------------- -------------- --- ------------------------------- eth2 10.137.20.181 NO Oracle Cluster Repository
You can query the
GV$CLUSTER_INTERCONNECTS view to display the entries for all of the instances in the cluster.
Verify Interconnect Settings with V$CONFIGURED_INTERCONNECTS
SQL> SELECT * FROM V$CONFIGURED_INTERCONNECTS; NAME IP_ADDRESS IS_PUBLIC SOURCE --------------- --------------- --- ------------------------------- eth2 10.137.20.181 NO Oracle Cluster Repository eth0 10.137.8.225 YES Oracle Cluster Repository
Influencing Interconnect Processing
Once your interconnect is operative, you cannot significantly influence its performance. However, you can influence an interconnect protocol’s efficiency by adjusting the interprocess communication (IPC) buffer sizes.
The Oracle Cluster Registry (OCR) stores your system’s interconnect information. Use the Oracle Interface Configuration (OIFCFG) command-line utility
oifcfg getif command or the OCRDUMP utility to identify the interconnect that you are using. You can then change the interconnect that you are using by running an OIFCFG command.
Although you rarely need to set the
CLUSTER_INTERCONNECTS parameter, you can use it to assign a private network IP address or NIC as in the following example:
Performance Views in Oracle RAC
Each instance has a set of instance-specific views, which are prefixed with
You can also query global dynamic performance views to retrieve performance information from all of the qualified instances. Global dynamic performance view names are prefixed with
GV$ view retrieves the
V$ view information from all qualified instances. In addition to the
V$ information, each
GV$ view contains an extra column named
INST_ID of data type
INST_ID column displays the instance number from which the associated
V$ view information was obtained.
You can use the
INST_ID column as a filter to retrieve
V$ information from a subset of available instances. For example, the following query retrieves the information from the
V$LOCK view for instances 2 and 5:
SQL> SELECT * FROM GV$LOCK WHERE INST_ID = 2 OR INST_ID = 5;
Creating Oracle RAC Data Dictionary Views with CATCLUST.SQL
If you did not create your Oracle RAC database with the Database Configuration Assistant (DBCA), then you must run the
CATCLUST.SQL script to create views and tables related to Oracle RAC. You must have
SYSDBA privileges to run this script.
Automatic Workload Repository in Oracle RAC Environments
You can use Automatic Workload Repository to monitor performance statistics related to Oracle RAC databases.
Automatic Workload Repository (AWR) automatically generates snapshots of the performance data once every hour and collects the statistics in the workload repository. In Oracle RAC environments, each AWR snapshot captures data from all active instances in the cluster. The data for each snapshot set is captured from the same point in time. AWR stores the snapshot data for all instances in the same table and the data is identified by an instance qualifier. For example, the
BUFFER_BUSY_WAIT statistic shows the number of buffer waits on each instance. AWR does not store data that is aggregated from across the entire cluster. In other words, the data is stored for each individual instance.
Automatic Workload Repository (AWR) A built-in repository that exists in every Oracle database. At regular intervals, Oracle Database makes a snapshot of all of its vital statistics and workload information and stores them in the AWR.
ADDM can analyze performance using data collected from all instances and present it at different levels of granularity, including:
- Analysis for the entire cluster
- Analysis for a specific database instance
- Analysis for a subset of database instances
To perform these analyses, you can run the ADDM Advisor in ADDM for Oracle RAC mode to perform an analysis of the entire cluster; in Local ADDM mode to analyze the performance of an individual instance; or in Partial ADDM mode to analyze a subset of instances. Activate ADDM analysis using the advisor framework through Advisor Central in Oracle Enterprise Manager, or through the
DBMS_ADDM PL/SQL packages.
Active Session History Reports for Oracle RAC
This section describes Active Session History (ASH) reports for Oracle RAC under the following topics:
- Overview of ASH Reports for Oracle RAC
- ASH Report for Oracle RAC: Top Cluster Events
- ASH Report for Oracle RAC: Top Remote Instance
Overview of ASH Reports for Oracle RAC
ASH report statistics provide details about Oracle Database session activity. Oracle Database records information about active sessions for all active Oracle RAC instances and stores this data in the System Global Area (SGA). Any session that is connected to the database and using CPU is considered an active session.
ASH statistics that are gathered over a specified duration can be put into ASH reports. Each ASH report is divided into multiple sections to help you identify short-lived performance problems that do not appear in the ADDM analysis. Two ASH report sections that are specific to Oracle RAC are Top Cluster Events and Top Remote Instance.
ASH Report for Oracle RAC: Top Cluster Events
The ASH report Top Cluster Events section is part of the Top Events report that is specific to Oracle RAC. The Top Cluster Events report lists events that account for the highest percentage of session activity in the cluster wait class event along with the instance number of the affected instances. You can use this information to identify which events and instances caused a high percentage of cluster wait events.
ASH Report for Oracle RAC: Top Remote Instance
The ASH report Top Remote Instance section is part of the Top Load Profile report that is specific to Oracle RAC. The Top Remote Instance report shows cluster wait events along with the instance numbers of the instances that accounted for the highest percentages of session activity. You can use this information to identify the instance that caused the extended cluster wait period.
Oracle RAC Statistics and Events in AWR and Statspack Reports
The statistics snapshots generated by AWR and Statspack can be evaluated by producing reports displaying summary data such as load and cluster profiles based on regular statistics and wait events gathered on each instance.
Most of the relevant data is summarized on the Oracle RAC Statistics Page. This information includes:
- Global cache load profile
- Global cache efficiency percentages—workload characteristics
- Global cache and Enqueue Service (GES)—messaging statistics
Additional Oracle RAC sections appear later in the report:
- Global enqueue statistics
- Global CR statistics
- Global cache transfer statistics.
Analyzing Performance Using GCS and GES Statistics
The Global Cache Service (GCS) Global cache and Enqueue Service (GES)
You can monitor GCS performance by identifying data blocks and objects which are frequently used (hot) by all instances.
High concurrency on certain blocks may be identified by GCS wait events and times.
gc current block busy wait event indicates that the access to cached data blocks was delayed because they were busy either in the remote or the local cache. This could be caused by any of the following:
- The blocks were pinned
- The blocks were held up by sessions
- The blocks were delayed by a log write on a remote instance
- A session on the same instance was already accessing a block which was in transition between instances and the current session needed to wait behind it (for example,
gc current block busy)
V$SESSION_WAIT view to identify objects and data blocks with contention. The GCS wait events contain the file and block number for a block request in p1 and p2, respectively.
An additional segment statistic,
gc buffer busy, has been added to quickly determine the busy objects without having to query the
V$SESSION_WAIT view mentioned earlier.
The AWR infrastructure provides a view of active session history which can also be used to trace recent wait events and their arguments. It is therefore useful for hot block analysis. Most of the reporting facilities used by AWR and Statspack contain the object statistics and cluster wait class category, so that sampling of the views mentioned earlier is largely unnecessary.
Oracle RAC wait events :
There are four areas of wait class waits in a RAC database, contention, message, load and block wait tuning:
Contention wait event tuning in RAC: This is tuning for contention. The gc current block busy and gc cr block busy events indicate that the remote instance received the block after a remote instance processing delay, in many cases due to a log flush. High concurrency is evidenced by the gc buffer busy event which indicates that the block was pinned or held up by a session on a remote instance. It can also indicate that a session on the same instance has already requested the block.
gc current block busy
gc cr block busy
gc buffer busy
Message wait tuning in RAC: This Indicates that no block was received from being cached in any instance. Instead a global grant was given enabling the instance to read the block from disk. If this time is long, it may be that the frequently used SQL causes a lot of disk I/O (for the cr grant) or that the workload inserts a lot of data and needs to format new blocks (for the current grant).
gc current grant 2-way
gc cr grant 2-way
Load wait event tuning in RAC: Load wait events indicate a slowdown in the global caching services (GCS) layer. Cache fusion interconnect, load issues, or SQL execution against a large working set is frequently the root cause of the below wait events.
gc current block congested
gc cr block congested:
Block wait tuning in RAC: This includes block waits for two-way and three-way wait events. These waits also indicate that the remotely cached blocks were shipped without having been busy, pinned, or requiring a log flush operation.
gc current block 3-way
gc cr block 2-way
gc cr block 3-way
There is a script that you can download from MOSC Note 135714.1 called racdiag.sql. This script will collect items such as waiting sessions, GES lock information and system statistics.
select event_id, event, count(*) cnt from dba_hist_active_sess_history where snap_id between nnn and nnn+1 and wait_class_id=3871361733 group by event_id, event order by 3;