Exadata General Administration

This section contains topics about General Administration of Oracle Exadata Database Machine.

Outline of Roles and Responsibilities

You should determine which individuals or groups are responsible for resolving any issue that arises.

Most IT organizations have teams of database administrators, system administrators, network administrators, and storage administrators. These administrators are responsible for system implementation and ongoing operations. In an Oracle Exadata Database Machine environment, it is usually more efficient and effective to have the database administrator to take the lead role for Oracle Exadata Database Machine management, with assistance from the system administrator. This is because Oracle Exadata Database Machine is engineered to run Oracle Database, and administration is specific to Oracle Database and Oracle Exadata System Software. The other teams may have distinct responsibilities or be a second level of support to provide assistance.

Usually there is one individual or group that has primary responsibility for any issue that arises. This individual or group receives the first contact from Oracle Enterprise Manager Cloud Control, the help desk, or operations team when there is an issue on the system. For Oracle Exadata Database Machine, the primary contact is typically the database administrator. If the database administrator needs assistance from another team to resolve the issue, then they collaborate to resolve the issue. Ownership of the issue should remain clear.

Common Administrative Tasks for Oracle Exadata Database Machine Management

Initial system deployment is usually performed by Oracle engineers. The primary responsibilities for the database administrator begin with typical operational tasks.

Common Administrative Tasks for Oracle Exadata Database Machine Management

Task or EventAdministratorActions
Slow performanceDatabase administrator System AdministratorReceive alerts from Oracle Enterprise Manager Cloud Control that performance thresholds have been exceeded. Review system performance, CPU, memory and I/O on all servers for unusual trends. Review database performance, wait events, locking, parallelism, and execution plans.
Patch application or upgradesDatabase administrator System AdministratorApply Oracle Exadata System Software patches or upgrades, and RDMA Network Fabric switch firmware upgrades. Apply Oracle Database patches, Oracle Grid Infrastructure patches or upgrades.
System outage or failureDatabase administrator System AdministratorConnect to Integrated Lights Out Manager (ILOM), verify current system state, identify hardware issue or restart system, and review logs for root cause analysis. Check surviving instances for errors, monitor performance on surviving instances, and verify application functionality has not been disrupted.
Suspected network issuesDatabase administrator System AdministratorInspect network interfaces for errors or dropped packets, check if any switches have restarted, and escalate to network administration team, as needed. Inspect database-side performance to assess impact, if any.
Backup databaseDatabase administrator System AdministratorRun database backup routines, and ensure database server backups are completed.
Failed disk replacementDatabase administrator System AdministratorReceive alerts about hardware replacement, verify Oracle Auto Service Request (ASR) has opened a service request, and verify operators will allow field service technician in the data center to replace drive or provide spare drive.

Understanding the Administrative Differences with Oracle Exadata Database Machine

Most administration tasks are similar on Oracle Exadata Database Machine servers as on traditional database servers and storage servers, but there are some differences.

The following list shows the differences and exceptions for Oracle Exadata Database Machine servers:

  • Configuration settings for Oracle Exadata Database Machine database servers, RDMA Network Fabric switches, and other components have settings based on testing and performance criteria. Changing the configuration settings, such as database server firmware or kernel parameters, based on company policy or other reasons should be reviewed for the potential impact to Oracle Exadata Database Machine.
  • Restarting a server incorrectly can disrupt the database. The storage servers have special procedures and guidelines that must be followed to minimize disruption, such as off-lining grid disks before restarting the server, and not restarting more than one server at a time.
  • Storage servers cannot be modified the same way as the database servers. Network changes, such as those for the NTP servers or DNS servers, are done using the ipconf utility. Network changes cannot be done manually by editing the configuration files. In addition, no software or additional packages can be installed on the storage servers. This restriction includes monitoring software. Storage server system updates are provided by Oracle Exadata System Software upgrades.
  • Storage servers do not require backups. A self-maintained internal USB drive or M.2 device that can be used for cell recovery. Backup clients cannot be installed on the storage servers.
  • Oracle wait events in Oracle Real Application Clusters (Oracle RAC) databases using storage servers may include events with %cell% in the name. These events are related to the storage servers.
  • The Oracle Database V$CELL views include rows for any database using Oracle Exadata Storage Server.
  • Oracle Automatic Storage Management (Oracle ASM) disk path names are of the format o/cell_ip_address/cell_griddisk_name, such as the following:

o/192.168.10.1/data_CD_01_dm01cel01

  • SQL plans may include storage to indicate that some operations may be off-loaded to the storage servers.
  • Operations such as backup and recovery use Oracle Recovery Manager (RMAN), and all data for backup and recovery continues to pass through the database instances. The backup clients for RMAN should be installed on the database servers in Oracle Exadata Database Machine to facilitate integration with enterprise backup solutions in the same way as in traditional environments.
  • The practice of deploying one or more non-production environments for development, testing and quality assurance still apply for Oracle Exadata Database Machine environments.

Powering On and Off Oracle Exadata Rack

This section contains the procedures for powering on and off the components of an Oracle Exadata Rack.

Non-emergency Power Procedures

When the outage is planned, use these procedures for powering on and off the components of Oracle Exadata Rack in an orderly fashion.

Powering On Oracle Exadata Rack

Oracle Exadata Rack is powered on by either pressing the power button on the front of the servers, or by logging in to the ILOM interface, and applying power to the system. When a database server is powered on and the operating system boots, Oracle Clusterware is automatically started, if it is installed. Oracle Clusterware then starts all resources that are configured to start automatically.

The power on sequence is as follows:

  1. Rack, including switches.

Ensure the switches have had power applied for a few minutes to complete power-on configuration before starting Exadata Storage Servers.

  1. Exadata Storage Servers.

Ensure all Exadata Storage Servers complete the boot process before starting the database servers. This may take five to ten minutes before all services start.

  1. Database servers.

Powering On Servers Remotely using ILOM

Servers can be powered on remotely using the Integrated Lights Out Manager (ILOM) interface.

The ILOM can be accessed using the Web console, the command-line interface (CLI), IPMI, or SNMP. For example, to apply power to server dm01cel01 using IPMI, where dm01cel01-ilom is the host name of the ILOM for the server to be powered on, run the following command from a server that has IPMItool installed:

# ipmitool -I lanplus -H dm01cel01-ilom -U root chassis power on

The preceding command causes the system to prompt for the password.

Powering Off Oracle Exadata Rack

Power off the components of the Oracle Exadata Rack in the correct order.

The power off sequence for Oracle Exadata Rack is as follows:

  1. Database servers (Oracle Exadata Database Machine only).
  2. Exadata Storage Servers.
  3. Rack, including switches.

Powering Off Database Servers

When powering off database servers, Oracle Clusterware should be stopped prior to restarting or shutting down a database server. Oracle Clusterware is stopped using the following command:

crsctl stop cluster

The following procedure is the recommended shutdown procedure for database servers:

  1. Stop Oracle Clusterware using the following command:

# GRID_HOME/grid/bin/crsctl stop cluster

If any resources managed by Oracle Clusterware are still running after issuing the crsctl stop cluster command, then the command fails. Use the -f option to unconditionally stop all resources, and stop Oracle Clusterware.

  1. Shut down the operating system using the following command:

# shutdown -h now

Powering Off Oracle Exadata Storage Servers

Oracle Exadata Storage Servers are powered off and restarted using the Linux shutdown command.

The following command shuts down Oracle Exadata Storage Server immediately:

# shutdown -h now

When powering off Oracle Exadata Storage Servers, all storage services are automatically stopped.

If you use the -r option, then the shutdown command shuts down and then restarts Oracle Exadata Storage Server. The -now option indicates you want to stop the server immediately.

# shutdown -r now

Another system command to reboot a server is the reboot command. However, shutdown -r now is the preferred command to restart a server. You should never use the command reboot -f command to shut down Oracle Exadata Storage Servers.

Powering Off Multiple Servers at the Same Time

The dcli utility can be used to run the shutdown command on multiple servers at the same time. Do not run the dcli utility from a server that will be shut down. For example, to shut down all Exadata Storage Servers using the dcli utility, run the command from a database server. The following command shows the command syntax:

# dcli -l root -g group_name shutdown -h now

In the preceding syntax, group_name is the file that contains a list of all Exadata Storage Servers, cell_group, or database servers, dbs_group.

The following command shows the syntax to shut down all Exadata Storage Servers at the same time:

# dcli -l root -g cell_group shutdown -h now

Example below shows the power off procedure for Oracle Exadata Rack when using the dcli utility to shut down multiple servers at the same time. The commands are run from a database server.

Example Powering Off Oracle Exadata Rack Using the dcli Utility

  1. Stop Oracle Clusterware on all database servers using the following command:

# GRID_HOME/grid/bin/crsctl stop cluster -all

  1. Shut down all remote database servers using the following command:

# dcli -l root -g remote_dbs_group shutdown -h now

In the preceding command, remote_dbs_group is the file that contains a list of all the remote database servers.

  1. Shut down all Exadata Storage Servers using the following command:

# dcli -l root -g cell_group shutdown -h now

In the preceding command, cell_group is the file that contains a list of all Exadata Storage Servers.

  1. Shut down the local database server using the following command:

shutdown -h now

  1. Remove power from the rack.

Powering On and Off Network Switches

The network switches do not have power switches. They power off when power is removed, by way of the power distribution unit (PDU) or at the breaker in the data center.

Emergency Power-off Considerations

If there is an emergency, then power to Oracle Exadata Rack should be halted immediately. The following emergencies may require powering off Oracle Exadata Rack:

  • Natural disasters such as earthquake, flood, hurricane, tornado or cyclone.
  • Abnormal noise, smell or smoke coming from the machine.
  • Threat to human safety.

Emergency Power-off Procedure

To perform an emergency power-off procedure for Oracle Exadata Rack, turn off power at the circuit breaker or pull the emergency power-off switch in the computer room. After the emergency, contact Oracle Support Services to restore power to the machine.

Emergency Power-off Switch

Emergency power-off (EPO) switches are required when computer equipment contains batteries capable of supplying more than 750 volt-amperes for more than five minutes. Systems that have these batteries include internal EPO hardware for connection to a site EPO switch or relay. Use of the EPO switch removes power from Oracle Exadata Rack.

Using Auto Service Request to Manage Hardware Faults

Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle Exadata Rack hardware faults occur.

Understanding Auto Service Request

When a hardware problem is detected, Oracle ASR Manager submits a service request to Oracle Support Services. In many cases, Oracle Support Services can begin work on resolving the issue before the database administrator is even aware the problem exists. Oracle Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle Exadata Rack hardware faults occur.

To enable this feature, the Oracle Exadata Rack components must be configured to send hardware fault telemetry to the Oracle ASR Manager software. This service covers components in storage servers and Oracle Database servers, such as disks and flash cards.

Oracle ASR Manager must be installed on a server that has connectivity to Oracle Exadata Rack, and an outbound Internet connection using HTTPS or an HTTPS proxy. Oracle recommends that Oracle ASR Manager be installed on a server outside of Oracle Exadata Rack. The following are some of the reasons for the recommendation:

  • If the server or the rack containing Oracle ASR Manager goes down, then Oracle ASR Manager is unavailable for all of the Oracle Exadata Database Machine components that it supports. This is very important to consider when several Oracle Exadata Database Machines use the Oracle ASR Manager.
  • In order to submit a service request (SR), the server must be able to access the Internet.

Example of Exadata Storage Server SNMP Trap

This example shows the SNMP trap for a storage server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-07 10:59:54 server1.example.com [UDP: [192.85.884.156]:61945]:

RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31

SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault

SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine

1007AK215C

SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004

SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4270 M2 SERVER

SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: SEAGATE ST32000SSSUN2.0T;

Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL

SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0

SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02001

SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3

SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero

SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Storage Server:

cellname  Disk Serial Number:   E06S8K

server1.example.com failure trap.

Example of Oracle Database Server SNMP Trap

This example shows the SNMP trap from an Oracle database server disk failure. The corresponding hardware alert code has been highlighted.

2011-09-09 10:59:54 dbserv01.example.com [UDP: [192.22.645.342]:61945]:

RFC1213-MIB::sysUpTime.0 = Timeticks: (52455631) 6 days, 1:42:36.31

SNMPv2-SMI::snmpModules.1.1.4.1.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapHardDriveFault

SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier = STRING: Sun Oracle Database Machine

1007AK215C

SUN-HW-TRAP-MIB::sunHwTrapChassisId = STRING: 0921XFG004

SUN-HW-TRAP-MIB::sunHwTrapProductName = STRING: SUN FIRE X4170 M2 SERVER

SUN-HW-TRAP-MIB::sunHwTrapSuspectComponentName = STRING: HITACHI H103030SCSUN300G

Slot: 0SUN-HW-TRAP-MIB::sunHwTrapFaultClass = STRING: NULL

SUN-HW-TRAP-MIB::sunHwTrapFaultCertainty = INTEGER: 0

SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID = STRING: HALRT-02007

SUN-HW-TRAP-MIB::sunHwTrapFaultUUID = STRING: acb0a175-70b8-435f-9622-38a9a55ee8d3

SUN-HW-TRAP-MIB::sunHwTrapAssocObjectId = OID: SNMPv2-SMI::zeroDotZero

SUN-HW-TRAP-MIB::sunHwTrapAdditionalInfo = STRING: Exadata Database Server: db03

Disk Serial Number: HITACHI H103030SCSUN300GA2A81019GGDE5E

dbserv01.example.com failure trap.

Installing and Configuring ASR

Oracle recommends installing Oracle Auto Service Request (ASR) on a standalone server running Oracle Solaris or Oracle Linux.

After installation is complete, configure fault telemetry destinations for the servers on Oracle Exadata Database Machine. The Oracle Exadata Database Machine servers can be set up during initial configuration. Oracle Exadata Deployment Assistant (OEDA)collects the configuration information, and then configures the servers.

Monitoring the System Using Oracle Enterprise Manager Cloud Control

Oracle Exadata Database Machine can be monitored by Oracle Enterprise Manager Cloud Control agents using the Oracle Exadata Plug-in and the Oracle Systems Infrastructure Plug-in. The Oracle Exadata Database Machine is discovered and monitored as a system target in Oracle Enterprise Manager Cloud Control. Individual database servers, storage servers, and switches are grouped together under the system target for the Oracle Exadata Database Machine so they can be monitored as a group

The Oracle Exadata Storage Server metrics are collected and managed by Management Server (MS). When used with Oracle Enterprise Manager Cloud Control, the metrics are presented as Oracle Enterprise Manager Cloud Control metrics.

All Exadata server alerts are delivered to Oracle Enterprise Manager Cloud Control using SNMP. The Exadata hardware and software components are monitored by Integrated Lights Out Manager (ILOM) and Oracle Exadata System Software in the following ways:

  • Hardware components are monitored by ILOM. When a hardware component reports a failure or an exceeded threshold, ILOM reports the failure as an SNMP trap to MS. MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.
  • Hardware and software components are also monitored by MS directly. When a failure or threshold is exceeded, MS processes the trap, creates an alert, and delivers the alert to the Oracle Enterprise Manager Cloud Control agent.

From the end-user perspective, there is no difference between the two types of alerts. The alert message contains the corrective action to resolve the alert.

Monitoring the System Using Oracle Configuration Manager

Oracle Configuration Manager collects configuration information and uploads it to the Oracle repository.

When the configuration information is uploaded daily, Oracle Support Services can analyze the data and provide better service. When a service request is logged, the configuration data is associated with the service request. The following are some of the benefits of Oracle Configuration Manager:

  • Reduced time for problem resolution
  • Proactive problem avoidance
  • Improved access to best practices, and the Oracle knowledge base
  • Improved understanding of the customer’s business needs
  • Consistent responses and services

The Oracle Configuration Manager software is installed and configured in each ORACLE_HOME directory on a server. For clustered databases, only one instance is configured for Oracle Configuration Manager. A configuration script is run on every database on the server. The Oracle Configuration Manager collectors then send their data to a centralized Oracle repository.

Determining the Server Model

Use the exadata.img.hw command to determine the model of the cell or database server.

/usr/sbin/exadata.img.hw –get model

Overview of the dbmsrv Service

Starting with Oracle Exadata System Software release 12.1.2.1.0:

  • The database nodes now run the Management Server (MS). Previously MS ran only on the storage nodes.
  • The database nodes now run a new service called Database Machine Service (dbmsrv). This new service is based on the MS that runs on the storage servers and provides enhanced management capabilities to the database nodes.
  • Starting with Oracle Exadata System Software release 12.1.2.1.2, Management Server (MS) on the database nodes does not use sudo any more. This means that configuration for sudoers is no longer needed.

Prior to Oracle Exadata System Software release 12.1.2.1.2:

  • For security reasons, Management Server on the database nodes is not run as root. However, it needs root permission to run certain utilities that monitor the system, such as disk status, ILOM, power supply unit, and to send Oracle Auto Service Request (ASR) messages and alerts. To achieve this, a sudoers configuration file, dbmsvc_sudo_conf, is added to enable the Management Server users on the database nodes to run the utilities with root privilege.

You should not disable the dbmsrv service or dbserverd, or edit the sudoers configuration file. If the entries in the file are removed, then the dbmsrv service may not be able to monitor some parts of the system. For example, if a disk fails, it might not be possible to send an Oracle ASR message in time, and this may cause a disruption on the database node and delay recovery.

To manage the new Management Server on the database node service in Oracle Exadata System Software release 12.1.2.1.0 and later, new users and groups were added.

Using a Script to Change User IDs and Group IDs for dbmsrv

Starting with Oracle Exadata System Software releases 18.1.12 and 19.1.2, you can use the migrate_ids.sh script to change the user and group IDs for the dbmsrv users.

You can change the user ID and group ID of the dbmsrv service users if there are conflicts with the default values (for example, if you are using LDAP or if you are using session management tools that require different values from the default values).

These steps are specific to the dbmsrv service users and groups only. Do not use them to modify the user and group IDs for other Oracle products.

  1. Navigate to the opt/oracle.SupportTools directory.
  2. Run the migrate_ids.sh script.

The migrate_ids.sh script has the following syntax and options:

migrate_ids.sh [-uid username new_uid]

                        [-gid group_name new_group_id]

                        [-skipdirs directory_path [,directory_path ]]

  1. -uid: Specify user name and the new uid to migrate the user to a new UID
  2. -gid: Specify group name and the new group ID to migrate the group to a new ID
  3. -skipdirs: Specify a list of absolute paths of directories to skip during the user or group ID migration.

The script searches all directories to find files that use the uid or gid being migrated so that the script can update the owner or group access to use the new uid or gid. The -skipdirs option allows you to specify which directories do not need to be searched. The specified directories and any files within them are skipped while changing the uid and gid values.

Using the -skipdirs option can be useful if you have large NFS directories that you want to skip to make the migration faster. However, if there are files in the directories being skipped that use the uid or gid being migrated, then those files are not updated. It is your responsibility to make sure that the directories being skipped with this option do not contain such files to ensure successful migration of the IDs.

Migrate the dbmadmin user to a new user ID

This example shows how to migrate only the uid of user dbmadmin to 3001.

migrate_ids.sh -uid dbmadmin 3001

Migrate the dbmusers group to a new group ID

This example shows how to migrate only the gid of group dbmusers to 4001.

migrate_ids.sh -gid dbmusers 4001

Migrate all dbmsrv users and groups to new values

This example shows how to migrate all the user and group IDs for dbmsrv to new values.

migrate_ids.sh -uid dbmsvc 3001 -gid dbmsvc 4001

migrate_ids.sh -uid dbmadmin 3002 -gid dbmadmin 4002

migrate_ids.sh -uid dbmmonitor 3003 -gid dbmmonitor 4003

migrate_ids.sh -gid dbmusers 4004

Migrate a user ID while skipping directories

This example shows how to migrate the user ID of user dbmadmin to 3001 while not searching the files in the /proc or /sys directories.

migrate_ids.sh -uid dbmadmin 3001 -skipdirs /proc,/sys

Manually Changing User IDs and Group IDs for dbmsrv

Prior to Oracle Exadata System Software releases 18.1.12 and 19.1.2 when the migrate_ids.sh script was introduced, you have to manually change the user and group IDs for the dbmsrv users.

You can change the user ID and group ID of the dbmsrv service users if there are conflicts with the default values (for example, if you are using LDAP or if you are using session management tools that require different values from the default values).

If possible, you should upgrade to the latest version of Oracle Exadata System Software and use the migrate_ids.sh script instead of using the manual procedure.

These steps are specific to the dbmsrv service users and groups only. Do not use them to modify the user and group IDs for other Oracle products.

  1. Shut down the services on the database server. Run the following command as root or the dbmadmin user.

dbmcli -e alter dbserver shutdown services all

  1. Change the group ID of the group.
    1. Change the assigned group ID for the group.

Run the following command as root, where new_group_ID is the new group ID, and group_name the name of group you want to change:

groupmod -g new_group_ID group_name

For example:

groupmod -g 3001 dbmusers

  1. Update the files containing the old group ID.

Run the following command as root:

find / -gid old_group_ID -exec chgrp -h new_group_ID {} \;

For example:

find / -gid 11140 -exec chgrp -h 3001 {} \;

  1. Change the user ID.

This step has to be done after changing the group ID or you will get a “GID does not exist” error.

  1. Change the user ID assigned to the user.

Run the following command as c, where new_user_ID is the new ID for the user, new_group_ID is the new group ID assigned in the previous step, and username is the name of the user you want to change.

usermod -u new_user_ID -g new_group_ID username

For example:

usermod -u 2998 -g 3001 dbmsvc

  1. Update the files containing the old user ID.

Run the following command as root:

find / -uid old_user_ID -exec chown -h new_user_ID {} \;

For example:

find / -uid 12137 -exec chown -h 2998 {} \;

  1. Reset the setuid bit on the executable files.

The setuid bit was changed by the chgrp and chown commands. Perform the following sub-steps as root.

  1. Modify the permissions for the dbrsMain executable.

# chmod 6550 /opt/oracle/dbserver/dbms/bin/dbrsMain

  1. Modify the permissions for the exaCmdHelper executable.

chmod 4550 /opt/oracle/dbserver/dbms/bin/exaCmdHelper

  1. Restart the services on the database server.

Run the following command as the root or dbmadmin user:

dbmcli -e alter dbserver startup services all

State of Storage Server and Database Servers for Operations

OperationStorage ServerDatabase Server
DNS server updateOnlineOnline
NTP server updateOnlineOnline
Time zone updateOfflineOnline
Admin network IP address, netmask, gateway, or host name changeOfflineOnline
Client network IP address, netmask, gateway, or host name changeOfflineOnline
Integrated Lights Out Manager (ILOM) IP address changeOfflineOnline if the ipmitool sunoem getval/setval command is supported
Other ILOM parameter changeOnline if the ipmitool sunoem getval/setval command is supportedOnline if the ipmitool sunoem getval/setval command is supported
RDMA Network Fabric IP address, netmask, or host name changeOfflineOnline
Partition key (pkey) changeOfflineOnline

Rescue Plan

In Exadata releases earlier than 12.2.1.1.0, after a storage server or database server rescue, you need to re-run multiple commands to configure items such as IORM plans, thresholds, and storage server and database server notification setting.

In Oracle Exadata release 12.2.1.1.0, there is a new attribute called rescuePlan for the cell and dbserver objects. When you are done configuring your database servers and storage servers, you should save the value of the rescuePlan attribute to a file. The file should be saved to a remote server because the data on the rescued server will be erased in the event of a rescue. After you rescue the server, you can retrieve the file from the remote server and run the file to restore the settings. See Example 3 below.

For security reasons, the rescue plan does not include configurations that require a password.

Rescue Plan for a Storage Cell

The rescuePlan attribute for a storage server could look like this:

$ cellcli -e list cell attributes rescuePlan

CREATE ROLE “admin”

GRANT PRIVILEGE all actions ON diagpack all attributes WITH all options TO ROLE “admin”

CREATE ROLE “diagRole”

GRANT PRIVILEGE download ON diagpack all attributes WITH all options TO ROLE “diagRole”

GRANT PRIVILEGE create ON diagpack all attributes WITH all options TO ROLE “diagRole”

GRANT PRIVILEGE list ON diagpack all attributes WITH all options TO ROLE “diagRole”

ALTER CELL accessLevelPerm=”remoteLoginEnabled”, diagHistoryDays=”7″, metricHistoryDays=”7″, notificationMethod=”mail,snmp”,

 notificationPolicy=”warning,critical,clear”, snmpSubscriber=((host=”localhost”, port=162, community=”public”, type=asr)),

 bbuLearnCycleTime=”2016-10-17T02:00:00-07:00″, bbuLearnSchedule=”MONTH 1 DATE 17 HOUR 2 MINUTE 0″,

 alertSummaryStartTime=”2016-09-21T17:00:00-07:00″, alertSummaryInterval=weekly,

 hardDiskScrubInterval=biweekly, hardDiskScrubFollowupIntervalInDays=”14″

ALTER IORMPLAN objective=basic

Rescue Plan for a Database Server

The rescuePlan attribute for a database server could look like this:

$ dbmcli -e list dbserver attributes rescuePlan

CREATE ROLE “listdbserverattrs”

GRANT PRIVILEGE list ON dbserver ATTRIBUTES bbuStatus, coreCount WITH all options TO ROLE “listdbserverattrs”

ALTER DBSERVER diagHistoryDays=”7″, metricHistoryDays=”7″, bbuLearnSchedule=”MONTH 1 DATE 17 HOUR 2 MINUTE 0″,

 alertSummaryStartTime=”2016-09-26T08:00:00-07:00″, alertSummaryInterval=weekly, pendingCoreCount=”128″ force

Creating a Rescue Plan script for a cell

The following command stores the commands in the rescuePlan attribute to a file called rescue.cli located on a remote server.

$ cellcli -e list cell attributes rescuePlan >& /location/on/remote/server/rescue_cell.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ cellcli -e start /location/on/remote/server/rescue_cell.cli

Creating a Rescue Plan script for a database server

The following command stores the commands in the rescuePlan attribute to a file called rescue_db.cli located on a remote server.

$ dbmcli -e list dbserver attributes rescuePlan >& /location/on/remote/server/rescue_db.cli

If you need to rescue the server, you can run the script after the server rescue to restore the settings. The following command runs the rescue_cell.cli file using the CellCLI start command:

$ dbmcli -e start /location/on/remote/server/rescue_db.cli

Using ExaWatcher Charts

ExaWatcher is a utility that collects performance data on the storage servers and database servers of an Exadata system. The data collected includes operating system statistics, such as iostat, cell statistics (cellsrvstat), and network statistics.

About ExaWatcher Charts

ExaWatcher collects and presents performance data on the storage servers and database servers of Oracle Exadata Database Machine for a specified period of time.

To extract the data collected by ExaWatcher, run GetExaWatcherResults.sh and specify the start and end time of the desired time range. The results are then placed in a compressed archive file in a directory called ExtractedResults.

For example:

$ GetExaWatcherResults.sh –from 08/24/2016_17:00:00 –to 08/25/2016_17:00:00

In Oracle Exadata System Software release 12.2.1.1.0, GetExaWatcherResults.sh also generates HTML pages that contain charts for IO, CPU utilization, cell server statistics, and alert history. The IO and CPU utilization charts use data from iostat, CPU detail uses data from mpstat, and cell server statistics use data from cellsrvstat. Alert history will be retrieved for the specified time frame.

You can find the new charts in the resulting archive file. In the archive file, there is a subdirectory named: Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/, for example, Charts.ExaWatcher.xxxxceladm13.oracle.com/2016_08_24_17_00_00_01h00m00s_0.

To view the HTML pages, the archive file needs to be moved to a machine with a local browser that has access to the internet. The file needs to be uncompressed from a bz2 compressed file, then untar with tar -xvf. Then you can open Charts.ExaWatcher.<hostname>/<timestamp>_<duration>/index.html in a browser. The left panel on that page shows the following menu:

ExaWatcher Menu in the Left Panel

The CellSrvStat menu item is available only when run against a storage server. The Alert History menu item is available only if there were alerts during the requested time frame.

Requirements for Using ExaWatcher Charts

To view the HTML pages, the generated archive file must be moved to a machine with a local browser that has access to the internet.

Due to the complexity of the ExaWatcher charts, if the Oracle Exadata Rack resides in a restricted environment, and the generated HTML files or archive file cannot be moved to an environment that has access to the internet, then you will not be able to view the ExaWatcher charts.

IO Charts

IO charts show IO performance for an entire server or for individual disks in the storage server.

IO Stat Summary

IOStat Summary shows a summary of IO performance for the entire server. The four charts shown in this page are:

Statistics for IOStat Summary

StatisticDescription
Flash IOPs Hard Disk IOPsTotal reads per second, writes per second, and IO per second (reads per second + writes per second) for the server. This uses r/s and w/s from iostat.
Flash MB/s Hard Disk MB/sTotal read MB per second, write MB per second, and IO MB per second. This uses rsec/s and wsec/s from iostat, converted into MB.

The statistics are shown for flash and hard disks, when applicable. On Exadata Extreme Flash, there are no hard disks. On database servers, there are no flash devices.

If there is a suspected I/O performance problem, the IOPs and the MB/s statistics for the storage servers can be compared to the data sheet to determine if the storage is at maximum capacity. High read times observed on the database can also be correlated to the service time and average wait time from iostat, to determine if the high times could potentially be due to the storage server. Note that the database times would typically include IOs that are satisfied from flash cache, as well as hard disk. In addition, these charts enable you to visualize any peaks during the time frame.

The partial screenshot below shows the IOPs and MB/s charts for flash and hard disk

IO Summary Charts

Below each chart, there is a range selector that you can use to drill down to a specific time within the chart. Moving the range selector on any chart affects all charts on the page.

IO Summary Charts Showing Range Selector

When you use the range selector, the displayed chart changes to show only the data for the time range specified by the range selector.

I/O Stat Detail

IOStat Detail shows performance for each disk on the storage server. The following charts are shown in this page:

Statistics for IO Stat Detail

StatisticDescription
Flash Service Time Hard Disk Service TimeAverage service time per disk contrasted against the range of wait times.
Flash Wait Time Hard Disk Wait TimeAverage wait time per disk

By default, the charts include a line that depicts the average across all disks on the server. The shaded, background image indicates the minimum and maximum range for the statistic. You can choose to display individual disks by using the drop down selector.

If the background image has a wide range, then this can indicate possible differences in disk performance. You can use this metric to look more closely at each individual disk on the storage server to see if there is an imbalance. If the background image has a narrow range, then that indicates the disks are performing similarly.

The individual disk IOPs and MB/s for a storage server can also be compared to the data sheet numbers to see if the disks are potentially hitting their maximum capacity.

IO Detail Charts

CPU Charts

The CPU charts show CPU utilization for the server. These statistics are from iostat (avg-cpu: %user, %system, %iowait).

CPU Charts

CPU Detail

The CPU detail charts show detailed information for CPU usage, including the average CPU utilization per CPU ID. These statistics are from mpstat.

CPU Detail Charts

Cell Server Charts

Cell server statistics are useful for tracking features that are specific to Exadata storage servers. This page displays statistics related to Smart Flash Cache and Smart IOs.

Cell Server Charts

Alert History

This page displays alerts that were present during the specified time frame. Alerts may be raised from errors or issues, which may result in IO performance issues on the servers.

Alert History