Skip to content

Monitoring and Tuning Oracle RAC Databases

    Overview of Monitoring and Tuning Oracle RAC

    Database Reliability Framework

    The Database Reliability Framework (DRF) is a proactive and automatic monitoring and correction framework.

    The Database Reliability Framework monitors various metrics across different layers of the database continuously for the purpose of detecting problems before any disruption of service occurs. DRF improves database availability by monitoring critical events in the database and taking corrective actions when these critical events hit certain thresholds.

    After a problem is identified, an action is implemented automatically. Actions include resizing internal memory structures or changing the priority of Oracle RAC processes, depending on the identified problem. For example, consider a system which has high redo waits with no I/O contention based on the metrics collected over time. If there is enough CPU resource available, then a possible action plan for reducing the redo waits is to move the LGWR process to higher priority to ensure it gets enough CPU. 

    Verifying the Interconnect Settings for Oracle RAC

    Use SQL statements to verify the interconnect settings for Oracle RAC.

    The interconnect and internode communication protocols can affect Cache Fusion performance. In addition, the interconnect bandwidth, its latency, and the efficiency of the IPC protocol determine the speed with which Cache Fusion processes block transfers.

    Cache Fusion
    A diskless cache coherency mechanism in Oracle RAC that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.

    To verify the interconnect settings of the Oracle RAC database instance to which you are connected, query the V$CLUSTER_INTERCONNECTS and V$CONFIGURED_INTERCONNECTS views. For example:

    Verify Interconnect Settings with V$CLUSTER_INTERCONNECTS

    CopySQL> SELECT * FROM V$CLUSTER_INTERCONNECTS;
    
    NAME             IP_ADDRESS       IS_PUBLIC SOURCE
    ---------------  --------------   ---       -------------------------------
    eth2             10.137.20.181    NO        Oracle Cluster Repository

    Note:

    You can query the GV$CLUSTER_INTERCONNECTS view to display the entries for all of the instances in the cluster.

    Verify Interconnect Settings with V$CONFIGURED_INTERCONNECTS

    SQL> SELECT * FROM V$CONFIGURED_INTERCONNECTS;
    
    NAME             IP_ADDRESS       IS_PUBLIC   SOURCE
    ---------------  ---------------  ---         -------------------------------
    eth2             10.137.20.181    NO          Oracle Cluster Repository
    eth0             10.137.8.225     YES         Oracle Cluster Repository

    Influencing Interconnect Processing

    Once your interconnect is operative, you cannot significantly influence its performance. However, you can influence an interconnect protocol’s efficiency by adjusting the interprocess communication (IPC) buffer sizes.

    The Oracle Cluster Registry (OCR) stores your system’s interconnect information. Use the Oracle Interface Configuration (OIFCFG) command-line utility oifcfg getif command or the OCRDUMP utility to identify the interconnect that you are using. You can then change the interconnect that you are using by running an OIFCFG command.

    Although you rarely need to set the CLUSTER_INTERCONNECTS parameter, you can use it to assign a private network IP address or NIC as in the following example:

    CLUSTER_INTERCONNECTS=10.0.0.1
    

    Performance Views in Oracle RAC

    Each instance has a set of instance-specific views, which are prefixed with V$.

    You can also query global dynamic performance views to retrieve performance information from all of the qualified instances. Global dynamic performance view names are prefixed with GV$.

    Querying a GV$ view retrieves the V$ view information from all qualified instances. In addition to the V$ information, each GV$ view contains an extra column named INST_ID of data type NUMBER. The INST_ID column displays the instance number from which the associated V$ view information was obtained.

    You can use the INST_ID column as a filter to retrieve V$ information from a subset of available instances. For example, the following query retrieves the information from the V$LOCK view for instances 2 and 5:

    SQL> SELECT * FROM GV$LOCK WHERE INST_ID = 2 OR INST_ID = 5;
    Creating Oracle RAC Data Dictionary Views with CATCLUST.SQL

    If you did not create your Oracle RAC database with the Database Configuration Assistant (DBCA), then you must run the CATCLUST.SQL script to create views and tables related to Oracle RAC. You must have SYSDBA privileges to run this script.

    Automatic Workload Repository in Oracle RAC Environments

    You can use Automatic Workload Repository to monitor performance statistics related to Oracle RAC databases.

    Automatic Workload Repository (AWR) automatically generates snapshots of the performance data once every hour and collects the statistics in the workload repository. In Oracle RAC environments, each AWR snapshot captures data from all active instances in the cluster. The data for each snapshot set is captured from the same point in time. AWR stores the snapshot data for all instances in the same table and the data is identified by an instance qualifier. For example, the BUFFER_BUSY_WAIT statistic shows the number of buffer waits on each instance. AWR does not store data that is aggregated from across the entire cluster. In other words, the data is stored for each individual instance.

    Automatic Workload Repository (AWR)
    A built-in repository that exists in every Oracle database. At regular intervals, Oracle Database makes a snapshot of all of its vital statistics and workload information and stores them in the AWR.

    ADDM can analyze performance using data collected from all instances and present it at different levels of granularity, including:

    • Analysis for the entire cluster
    • Analysis for a specific database instance
    • Analysis for a subset of database instances

    To perform these analyses, you can run the ADDM Advisor in ADDM for Oracle RAC mode to perform an analysis of the entire cluster; in Local ADDM mode to analyze the performance of an individual instance; or in Partial ADDM mode to analyze a subset of instances. Activate ADDM analysis using the advisor framework through Advisor Central in Oracle Enterprise Manager, or through the DBMS_ADVISOR and DBMS_ADDM PL/SQL packages.

    Active Session History Reports for Oracle RAC

    This section describes Active Session History (ASH) reports for Oracle RAC under the following topics:

    Overview of ASH Reports for Oracle RAC

    ASH report statistics provide details about Oracle Database session activity. Oracle Database records information about active sessions for all active Oracle RAC instances and stores this data in the System Global Area (SGA). Any session that is connected to the database and using CPU is considered an active session.

    ASH statistics that are gathered over a specified duration can be put into ASH reports. Each ASH report is divided into multiple sections to help you identify short-lived performance problems that do not appear in the ADDM analysis. Two ASH report sections that are specific to Oracle RAC are Top Cluster Events and Top Remote Instance.

    ASH Report for Oracle RAC: Top Cluster Events

    The ASH report Top Cluster Events section is part of the Top Events report that is specific to Oracle RAC. The Top Cluster Events report lists events that account for the highest percentage of session activity in the cluster wait class event along with the instance number of the affected instances. You can use this information to identify which events and instances caused a high percentage of cluster wait events.

    ASH Report for Oracle RAC: Top Remote Instance

    The ASH report Top Remote Instance section is part of the Top Load Profile report that is specific to Oracle RAC. The Top Remote Instance report shows cluster wait events along with the instance numbers of the instances that accounted for the highest percentages of session activity. You can use this information to identify the instance that caused the extended cluster wait period.

    Oracle RAC Statistics and Events in AWR and Statspack Reports

    The statistics snapshots generated by AWR and Statspack can be evaluated by producing reports displaying summary data such as load and cluster profiles based on regular statistics and wait events gathered on each instance.

    Most of the relevant data is summarized on the Oracle RAC Statistics Page. This information includes:

    • Global cache load profile
    • Global cache efficiency percentages—workload characteristics
    • Global cache and Enqueue Service (GES)—messaging statistics

    Additional Oracle RAC sections appear later in the report:

    • Global enqueue statistics
    • Global CR statistics
    • Global CURRENT served statistics
    • Global cache transfer statistics.
    Analyzing Performance Using GCS and GES Statistics
    The Global Cache Service (GCS)
    Global cache and Enqueue Service (GES)

    You can monitor GCS performance by identifying data blocks and objects which are frequently used (hot) by all instances.

    High concurrency on certain blocks may be identified by GCS wait events and times.

    The gc current block busy wait event indicates that the access to cached data blocks was delayed because they were busy either in the remote or the local cache. This could be caused by any of the following:

    • The blocks were pinned
    • The blocks were held up by sessions
    • The blocks were delayed by a log write on a remote instance
    • A session on the same instance was already accessing a block which was in transition between instances and the current session needed to wait behind it (for example, gc current block busy)

    Use the V$SESSION_WAIT view to identify objects and data blocks with contention. The GCS wait events contain the file and block number for a block request in p1 and p2, respectively.

    An additional segment statistic, gc buffer busy, has been added to quickly determine the busy objects without having to query the V$SESSION_WAIT view mentioned earlier.

    The AWR infrastructure provides a view of active session history which can also be used to trace recent wait events and their arguments. It is therefore useful for hot block analysis. Most of the reporting facilities used by AWR and Statspack contain the object statistics and cluster wait class category, so that sampling of the views mentioned earlier is largely unnecessary.

    Oracle RAC wait events :

    There are four areas of wait class waits in a RAC database, contention, message, load and block wait tuning:

    Contention wait event tuning in RAC: This is tuning for contention. The gc current block busy and gc cr block busy events indicate that the remote instance received the block after a remote instance processing delay, in many cases due to a log flush. High concurrency is evidenced by the gc buffer busy event which indicates that the block was pinned or held up by a session on a remote instance. It can also indicate that a session on the same instance has already requested the block.

    gc current block busy

    gc cr block busy

    gc buffer busy

    Message wait tuning in RAC: This Indicates that no block was received from being cached in any instance. Instead a global grant was given enabling the instance to read the block from disk. If this time is long, it may be that the frequently used SQL causes a lot of disk I/O (for the cr grant) or that the workload inserts a lot of data and needs to format new blocks (for the current grant).

    gc current grant 2-way

    gc cr grant 2-way

    Load wait event tuning in RAC: Load wait events indicate a slowdown in the global caching services (GCS) layer. Cache fusion interconnect, load issues, or SQL execution against a large working set is frequently the root cause of the below wait events.

    gc current block congested

    gc cr block congested:

    Block wait tuning in RAC: This includes block waits for two-way and three-way wait events. These waits also indicate that the remotely cached blocks were shipped without having been busy, pinned, or requiring a log flush operation.

    gc current block 3-way

    gc cr block 2-way

    gc cr block 3-way

    There is a script that you can download from MOSC Note 135714.1 called racdiag.sql. This script will collect items such as waiting sessions, GES lock information and system statistics.

    select
     event_id,
    event,
    count(*) cnt
    from
    dba_hist_active_sess_history
    where
     snap_id between nnn and nnn+1
     and
     wait_class_id=3871361733
    group by
     event_id,
     event
     order by 3;

    Also See:

    ORAchk

    Performance Views in Oracle RAC