Exadata Storage Servers Administration

Oracle Exadata Storage Servers contain disks and memory devices that might require maintenance.

Managing Oracle Exadata Storage Servers

This section describes how to perform maintenance on Oracle Exadata Storage Servers.

Shutting Down Exadata Storage Server

When performing maintenance on Exadata Storage Servers, it may be necessary to power down or restart the cell.

If Exadata Storage Server is to be shut down when one or more databases are running, then you must verify that taking Exadata Storage Server offline will not impact Oracle ASM disk group and database availability. The ability to take Exadata Storage Server offline without affecting database availability depends on the level of Oracle ASM redundancy used on the affected disk groups. Availability also depends on the current status of disks in other Exadata Storage Servers that have mirror copies of data for the Exadata Storage Server that you are taking offline.

  1. Optional: Configure the grid disks to remain offline after restarting the cell.

If you are planning to have multiple restarts, or you want to control when the Exadata Storage Server becomes active again, then you can perform this step. Making the grid disks inactive allows you to verify the planned maintenance activity was successful before making the grid disks available again.

  1. Set the grid disks to inactive.

CellCLI> ALTER GRIDDISK ALL INACTIVE

  1. Wait at least 30 seconds, or until Oracle ASM has completed taking the corresponding Oracle ASM disks offline.

This step is very important if you are using versions of Oracle Exadata System Software before release 18.1. If you put the commands into a script, then make sure to add a sleep command with a value over 30 seconds.

  1. Stop the cell services.

CellCLI> ALTER CELL SHUTDOWN SERVICES ALL

The preceding command checks if any disks are offline, in predictive failure status, or need to be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the cell services. If the following error is displayed, then it may not be safe to stop the cell services because a disk group may be forced to dismount due to redundancy.

Stopping the RS, CELLSRV, and MS services…

The SHUTDOWN of ALL services was not successful.

CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be forced to dismount due to reduced redundancy.

Getting the state of CELLSRV services… running

Getting the state of MS services… running

Getting the state of RS services… running

If the CELL-01548 error occurs, then restore Oracle ASM disk group redundancy and retry the command when disk status is back to normal for all the disks.

  1. Shut down the Exadata Storage Server.
  2. After performing the maintenance, restart the Exadata Storage Server. The cell services are started automatically. As part of the Exadata Storage Server startup, all grid disks are automatically changed to ONLINE in Oracle ASM.
  3. Verify that all grid disks have been successfully brought online.

CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus

Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

  1. Optional:Change the grid disks status to ONLINE.

This step is only necessary when step 1 has been performed. If step 1 was not performed, then the grid disks were set to online automatically when the Exadata Storage Server was restarted.

CellCLI> ALTER GRIDDISK ALL ACTIVE

Checking Status of a Rebalance Operation

When dropping or adding a disk, you can check the status of the Oracle ASM rebalance operation.

  • The rebalance operation may have completed successfully. Check the Oracle ASM alert logs to confirm.
  • The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
  • The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
  • Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.

Enabling Network Connectivity with the Diagnostic ISO

If a storage server does not restart, the diagnostic ISO may be needed to access the cell so it can be manually repaired.

The diagnostic ISO should be used after other boot methods, such as using the USB, do not work.

The following procedure enables networking with the diagnostic ISO so files can be transferred to repair the cell:

  1. Restart the system using the diagnostics.iso file.
  2. Log in to the diagnostics shell as the root user.

When prompted, enter the diagnostics shell.

Choose from following by typing letter in ‘()’:

(e)nter interactive diagnostics shell. Must use credentials

from Oracle support to login (reboot or power cycle to exit

the shell),

(r)estore system from NFS backup archive,

Type e to enter the diagnostics shell and log in as the root user.

If prompted, log in to the system as the root user. If you are prompted for the root user password and do not have it, then contact Oracle Support Services.

  1. Use the following command to avoid pings:

alias ping=”ping -c”

  1. Make a directory named /etc/network.
  2. Make a directory named /etc/network/if-pre-up.d.
  3. Add the following lines to the /etc/network/interfaces file:

iface eth0 inet static

address IP_address_of_cell

netmask netmask_of_cell

gateway gateway_IP_address_of_cell

  1. Bring up the eth0 interface using the following command:

ifup eth0

There may be some warning messages, but the interface is operational.

  1. Use either FTP or the wget command to retrieve the files to repair the cell.

Using Extended (XT) Storage Servers

Oracle Exadata Storage Server X8-2 Extended (XT) offers a lower cost storage option that can be used for infrequently accessed, older, or regulatory data.

About Oracle Exadata Extended (XT) Storage Servers

Oracle Exadata Extended (XT) Storage Servers help you extend the operational and management benefits of Exadata Database Machine to rarely accessed data that must be kept online.

Each Oracle Exadata XT Storage Server includes twelve 14 TB SAS disk drives with 168 TB total raw disk capacity. To achieve a lower cost, Flash is not included, and licensing the Oracle Exadata System Software is optional. Hybrid Columnar Compression is included by default, but some software features are disabled without a license.

Oracle Exadata XT Storage Servers use the same RDMA Network Fabric as the other servers in your Oracle Exadata Rack. Oracle Exadata XT Storage Servers add storage capacity while remaining transparent to applications, transparent to SQL, and retaining the same operational model. You can use the same security model and encryption used for your other Exadata storage servers.

You can add Oracle Exadata XT Storage Servers to Oracle Exadata Racks, including Eighth Rack configurations, that are X4 or newer. You must add at least 2 servers initially. After adding the initial 2 servers, you can add additional Oracle Exadata XT Storage Servers as needed. To implement high redundancy, you must have a minimum of 3 Oracle Exadata XT Storage Servers. XT Storage Servers follow the same placement patterns as High Capacity (HC) and Extreme Flash (EF) Storage Servers.

An Oracle ASM disk group should use storage provided by only one type of storage server (HC, EF, or XT). After adding the Oracle Exadata XT Storage Servers to your rack, create new disk groups to use the storage. The default disk group name for XT storage servers is XTND. However, you can use a different name as required.

Oracle Exadata XT Storage Servers provide fully integrated storage for your Oracle Database. You can use the new disk group with database features such as Oracle Partitioning, Oracle Automatic Data Optimization, and Oracle Advanced Compression.

What Data Can Be Stored on Oracle Exadata Extended (XT) Storage Servers?

Oracle Exadata Extended (XT) Storage Servers are intended provide lower-cost storage for infrequently accessed, older, or regulatory data.

Oracle Exadata Extended (XT) Storage Servers help you to keep all your required data online and available for queries. This includes data such as:

  • Historical data
  • Images, BLOBs, contracts, and other large table-based objects
  • Compliance and regulatory data
  • Local backups

The XT storage servers can also provide storage for development databases which have less stringent performance requirements compared to production databases.

Enabling Smart Scan on Exadata XT Storage Servers

If you purchase Oracle Exadata System Software licenses for your Oracle Exadata XT Storage Servers, then you can enable features such as Smart Scan and Storage Indexes to improve performance.

  1. Procure or transfer Exadata System Software Licenses.

All drives must be licensed to enable Smart Scan.

  1. Modify the enableSmartStorage attribute for the XT storage servers.

You do not need to stop the storage servers first. Simply run the following command on each XT storage server that is licensed:

cellcli -e ALTER CELL enableSmartStorage=true

  1. Verify the cell has been modified.

cellcli -e “LIST CELL ATTRIBUTES name, status, enableSmartStorage”

Maniging the Hard Disks of Oracle Exadata Storage Servers

Every Oracle Exadata Storage Server in Oracle Exadata Rack has a system area, which is where the Oracle Exadata System Software system software resides. In Oracle Exadata Database Machine X7 and later systems, two internal M.2 devices contain the system area. In all other systems, the first two disks of Oracle Exadata Storage Server are system disks and the portions on these system disks are referred to as the system area.

In Oracle Exadata Database Machine X7 and later systems, all the hard disks in the cell are data disks. In systems prior to Oracle Exadata Database Machine X7, the non-system area of the system disks, referred to as data partitions, is used for normal data storage. All other disks in the cell are data disks.

Starting in Oracle Exadata System Software release 11.2.3.2.0, if there is a disk failure, then Oracle Exadata System Software sends an alert stating that the disk can be replaced, and, after all data has been rebalanced out from that disk, turns on the blue OK to Remove LED for the hard disk with predictive failure. In Oracle Exadata System Software releases earlier than 11.2.3.2.0, the amber Fault-Service Required LED was turned on for a hard disk with predictive failure, but not the blue LED. In these cases, it is necessary to manually check if all data has been rebalanced out from the disk before proceeding with disk replacement.

Starting with Oracle Exadata System Software release 18.1.0.0.0 and Oracle Exadata Database Machine X7 systems, there is an additional white Do Not Service LED that indicates when redundancy is reduced to inform system administrators or field engineers that the storage server should not be powered off for services. When redundancy is restored, Oracle Exadata System Software automatically turns off the Do Not Service LED to indicate that the cell can be powered off for services.

For a hard disk that has failed, both the blue OK to Remove LED and the amber Fault-Service Required LED are turned on for the drive indicating that disk replacement can proceed. The behavior is the same in all releases. The drive LED light is a solid light in Oracle Exadata System Software releases 11.2.3.2.0 and later; the drive LED blinks in earlier releases.

Monitoring the Status of Hard Disks

You can monitor the status of a hard disk by checking its attributes with the CellCLI LIST PHYSICALDISK command.

For example, a hard disk status equal to failed (the status for failed hard disks was critical in earlier releases), or warning – predictive failure is probably having problems and needs to be replaced. The disk firmware maintains the error counters, and marks a drive with Predictive Failure when internal thresholds are exceeded. The drive, not the cell software, determines if it needs replacement.

  • Use the CellCLI command LIST PHSYICALDISK to determine the status of a hard disk:

CellCLI> LIST PHYSICALDISK WHERE disktype=harddisk AND status!=normal DETAIL

         name:                            8:4

         deviceId:              12

           deviceName:                   /dev/sde

           diskType:                      HardDisk

         enclosureDeviceId:      8

         errOtherCount:          0

         luns:                   0_4

           makeModel:                    “HGST    H7280A520SUN8.0T”

         physicalFirmware:         PD51

         physicalInsertTime:      2016-11-30T21:24:45-08:00

         physicalInterface:     sas

         physicalSerial:            PA9TVR

         physicalSize:               7.153663907200098T

         slotNumber:                  4

         status:                        failed

When disk I/O errors occur, Oracle ASM performs bad extent repair for read errors due to media errors. The disks will stay online, and no alerts are sent. When Oracle ASM gets a read error on a physically-addressed metadata block, it does not have mirroring for the blocks, and takes the disk offline. Oracle ASM then drops the disk using the FORCE option.

Monitoring Hard Disk Controller Write-through Caching Mode

The hard disk controller on each Oracle Exadata Storage Server periodically performs a discharge and charge of the controller battery. During the operation, the write cache policy changes from write-back caching to write-through caching.

Write-through cache mode is slower than write-back cache mode. However, write-back cache mode has a risk of data loss if the Oracle Exadata Storage Server loses power or fails. For Oracle Exadata System Software releases earlier than release 11.2.1.3, the operation occurs every month. For Oracle Exadata System Software release 11.2.1.3.0 and later, the operation occurs every three months, for example, at 01:00 on the 17th day of January, April, July and October.

  • To change the start time for the learn cycle, use a command similar to the following:

CellCLI> ALTER CELL bbuLearnCycleTime=”2013-01-22T02:00:00-08:00″

The time reverts to the default learn cycle time after the cycle completes.

  • To see the time for the next learn cycle, use the following command:

CellCLI> LIST CELL ATTRIBUTES bbuLearnCycleTime

Oracle Exadata Storage Server generates an informational alert about the status of the caching mode for logical drives on the cell, similar to the following:

HDD disk controller battery on disk controller at adapter 0 is going into a learn

cycle. This is a normal maintenance activity that occurs quarterly and runs for

approximately 1 to 12 hours. The disk controller cache might go into WriteThrough

caching mode during the learn cycle. Disk write throughput might be temporarily

lower during this time. The message is informational only, no action is required.

  • To view the status of the battery, use a command similar to the following example:

# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0

BBU status for Adapter: 0

BatteryType: iBBU08

Voltage: 3721 mV

Current: 541 mA

Temperature: 43 C

BBU Firmware Status:

Charging Status : Charging

Voltage : OK

Temperature : OK

Learn Cycle Requested : No

Learn Cycle Active : No

Learn Cycle Status : OK

Learn Cycle Timeout : No

I2c Errors Detected : No

Battery Pack Missing : No

Battery Replacement required : No

Remaining Capacity Low : Yes

Periodic Learn Required : No

Transparent Learn : No

Battery state:

GasGuageStatus:

Fully Discharged : No

Fully Charged : No

Discharging : No

Initialized : No

Remaining Time Alarm : Yes

Remaining Capacity Alarm: No

Discharge Terminated : No

Over Temperature : No

Charging Terminated : No

Over Charged : No

Relative State of Charge: 7 %

Charger System State: 1

Charger System Ctrl: 0

Charging current: 541 mA

Absolute state of charge: 0 %

Max Error: 0 %

 Exit Code: 0x00

Replacing a Hard Disk Due to Disk Failure

A hard disk outage can cause a reduction in performance and data redundancy. Therefore, the disk should be replaced with a new disk as soon as possible. When the disk fails, the Oracle ASM disks associated with the grid disks on the hard disk are automatically dropped with the FORCE option, and an Oracle ASM rebalance follows to restore the data redundancy.

An Exadata alert is generated when a disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.

After the hard disk is replaced, the grid disks and cell disks that existed on the previous disk in that slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM group, then they are added back to the disk group, and the data is rebalanced on them, based on the disk group redundancy and ASM_POWER_LIMIT parameter.

The following procedure describes how to replace a hard disk due to disk failure:

  1. Determine the failed disk using the following command:

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=failed DETAIL

The following is an example of the output from the command. The slot number shows the location of the disk, and the status shows that the disk has failed.

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=failed DETAIL

         name:                   28:5

         deviceId:               21

         diskType:               HardDisk

         enclosureDeviceId:      28

         errMediaCount:          0

         errOtherCount:          0

         foreignState:           false

         luns:                   0_5

         makeModel:              “SEAGATE ST360057SSUN600G”

         physicalFirmware:       0705

         physicalInterface:      sas

         physicalSerial:         A01BC2

         physicalSize:           558.9109999993816G

         slotNumber:             5

         status:                 failed

  1. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  2. Replace the hard disk on Oracle Exadata Storage Server and wait for three minutes. The hard disk is hot-pluggable, and can be replaced when the power is on.
  3. Confirm the disk is online.

When you replace a hard disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long.

Use the LIST PHYSICALDISK command similar to the following to ensure the status is NORMAL.

CellCLI> LIST PHYSICALDISK WHERE name=28:5 ATTRIBUTES status

  1. Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

In rare cases, the automatic firmware update may not work, and the LUN is not rebuilt. This can be confirmed by checking the ms-odl.trc file.

Replacing a Hard Disk Due to Disk Problems

You may need to replace a hard disk because the disk is in warning – predictive failure status.

The predictive failure status indicates that the hard disk will soon fail, and should be replaced at the earliest opportunity. The Oracle ASM disks associated with the grid disks on the hard drive are automatically dropped, and an Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.

An alert is sent when the disk is removed. After replacing the hard disk, the grid disks and cell disks that existed on the previous disk in the slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced based on disk group redundancy and the ASM_POWER_LIMIT parameter.

  1. Determine which disk is the failing disk.

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \

        “warning – predictive failure” DETAIL

The following is an example of the output. The slot number shows the location of the disk, and the status shows the disk is expected to fail.

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \

         “warning – predictive failure” DETAIL

         name:                   28:3

         deviceId:               19

         diskType:               HardDisk

         enclosureDeviceId:      28

         errMediaCount:          0

         errOtherCount:          0

         foreignState:           false

         luns:                   0_3

         makeModel:              “SEAGATE ST360057SSUN600G”

         physicalFirmware:       0705

         physicalInterface:      sas

         physicalSerial:         E07L8E

         physicalSize:           558.9109999993816G

         slotNumber:             3

         status:                 warning – predictive failure

  1. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  2. Wait until the Oracle ASM disks associated with the grid disks on the hard disk have been successfully dropped. To determine if the grid disks have been dropped, query the V$ASM_DISK_STAT view on the Oracle ASM instance.
  3. Replace the hard disk on Oracle Exadata Storage Server and wait for three minutes. The hard disk is hot-pluggable, and can be replaced when the power is on.
  4. Confirm the disk is online.

When you replace a hard disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long. Use the LIST PHYSICALDISK command to ensure the status is NORMAL.

CellCLI> LIST PHYSICALDISK WHERE name=28:3 ATTRIBUTES status

  1. Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

Replacing a Hard Disk Due to Bad Performance

A single bad hard disk can degrade the performance of other good disks. It is better to remove the bad disk from the system than let it remain.

Starting with Oracle Exadata System Software release 11.2.3.2, an underperforming disk is automatically identified and removed from active configuration. Oracle Exadata Database Machine then runs a set of performance tests. When poor disk performance is detected by CELLSRV, the cell disk status changes to normal – confinedOnline, and the hard disk status changes to warning – confinedOnline.

The following conditions trigger disk confinement:

  • Disk stopped responding. The cause code in the storage alert log is CD_PERF_HANG.
  • Slow cell disk such as the following:
    • High service time threshold (cause code CD_PERF_SLOW_ABS)
    • High relative service time threshold (cause code CD_PERF_SLOW_RLTV)
  • High read or write latency such as the following:
    • High latency on writes (cause code CD_PERF_SLOW_LAT_WT)
    • High latency on reads (cause code CD_PERF_SLOW_LAT_RD)
    • High latency on reads and writes (cause code CD_PERF_SLOW_LAT_RW)
    • Very high absolute latency on individual I/Os happening frequently (cause code CD_PERF_SLOW_LAT_ERR)
  • Errors such as I/O errors (cause code CD_PERF_IOERR).

If the disk problem is temporary and passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked as poor performance, and Oracle Auto Service Request (ASR) submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. If Oracle ASM cannot take the disks offline, then the cell disk status stays at normal – confinedOnline until the disks can be taken offline safely.

The disk status change is associated with the following entry in the cell alert history:

MESSAGE ID date_time info “Hard disk entered confinement status. The LUN

 n_m changed status to warning – confinedOnline. CellDisk changed status to normal

 – confinedOnline. Status: WARNING – CONFINEDONLINE  Manufacturer: name  Model

 Number: model  Size: size  Serial Number: serial_number  Firmware: fw_release

 Slot Number: m  Cell Disk: cell_disk_name  Grid Disk: grid disk 1, grid disk 2

 … Reason for confinement: threshold for service time exceeded”

The following would be logged in the storage cell alert log:

CDHS: Mark cd health state change cell_disk_name  with newState HEALTH_BAD_

ONLINE pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD

Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0

inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0

global conf related state: numHDsConf: 1 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0

The following procedure describes how to remove a hard disk once the bad disk has been identified:

  1. Illuminate the hard drive service LED to identify the drive to be replaced using a command similar to the following, where disk_name is the name of the hard disk to be replaced, such as 20:2:

cellcli -e ‘alter physicaldisk disk_name serviceled on’

  1. Find all the grid disks on the bad disk.

For example:

[[email protected] ~]# cellcli -e “list physicaldisk 20:11 attributes name, id”

        20:11 RD58EA

[[email protected] ~]# cellcli -e “list celldisk where physicalDisk=’RD58EA'”

        CD_11_exa05celadm03 normal

[[email protected] ~]# cellcli -e “list griddisk where cellDisk=’CD_11_exa05celadm03′”

        DATA_CD_11_exa05celadm03 active

        DBFS_CD_11_exa05celadm03 active

        RECO_CD_11_exa05celadm03 active

        TPCH_CD_11_exa05celadm03 active

  1. Direct Oracle ASM to stop using the bad disk immediately.

SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;

  1. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  2. Ensure that the Oracle ASM disks associated with the grid disks on the bad disk have been successfully dropped by querying the V$ASM_DISK_STAT view.
  3. Remove the badly-performing disk. An alert is sent when the disk is removed.
  4. When a new disk is available, install the new disk in the system. The cell disks and grid disks are automatically created on the new hard disk.

Replacing a Hard Disk Proactively

Exadata Storage software has a complete set of automated operations for hard disk maintenance, when a hard disk has failed or has been flagged as a problematic disk. But there are situations where a hard disk has to be removed proactively from the configuration.

In the CellCLI ALTER PHYSICALDISK command, the DROP FOR REPLACEMENT option checks if a normal functioning hard disk can be removed safely without the risk of data loss. However, after the execution of the command, the grid disks on the hard disk are inactivated on the storage cell and set to offline in the Oracle ASM disk groups.

To reduce the risk of having a disk group without full redundancy and proactively replace a hard disk, follow this procedure:

  1. Identify the LUN, cell disk, and grid disk associated with the hard disk.

Use a command similar to the following where, X:Y identifies the hard disk name of the drive you are replacing.

# cellcli –e “list diskmap” | grep ‘X:Y

The output should be similar to the following:

   20:5            KEBTDJ          5                       normal  559G          

    CD_05_exaceladm01    /dev/sdf               

    “DATAC1_CD_05_exaceladm01, DBFS_DG_CD_05_exaceladm01,

     RECOC1_CD_05_exaceladm01″

To get the LUN, issue a command similar to the following:

CellCLI> list lun where deviceName=’/dev/sdf/’

         0_5     0_5     normal

  1. Drop the grid disk from the Oracle ASM disk groups in normal mode.

SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;

  1. Wait for the ASM rebalance operation to complete.
  2. Drop the physical disk.

Use a command similar to the following where, X:Y identifies the hard disk name of the drive you are replacing.

CellCLI> alter physicaldisk X:Y drop for replacement

  1. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  2. Replace the new hard disk.
  3. Verify the LUN, cell disk and grid disk associated with the hard disk were created.

CellCLI> list lun lun_name

CellCLI> list celldisk where lun=lun_name

CellCLI> list griddisk where celldisk=celldisk_name

  1. Verify the grid disk was added to the Oracle ASM disk groups.

The following query should return no rows.

SQL> SELECT path,header_status FROM v$asm_disk WHERE group_number=0;

The following query shows whether all the failure groups have the same number of disks:

SQL> SELECT group_number, failgroup, mode_status, count(*) FROM v$asm_disk

     GROUP BY group_number, failgroup, mode_status;

Moving All Drives to Another Exadata Storage Server

It may necessary to move all drives from one Exadata Storage Server to another Exadata Storage Server.

This need may occur when there is a chassis-level component failure, such as a motherboard or ILOM failure, or when troubleshooting a hardware problem.

  1. Back up the files in the following directories:
    • /etc/hosts
    • /etc/modprobe.conf
    • /etc/sysconfig/network
    • /etc/sysconfig/network-scripts
  2. Safely inactivate all grid disks and shut down Exadata Storage Server.
  3. Move the hard disks, flash disks, disk controller and USB flash drive from the original Exadata Storage Server to the new Exadata Storage Server.
  4. Power on the new Exadata Storage Server using either the service processor interface or by pressing the power button.
  5. Log in to the console using the service processor or the KVM switch.
  6. Check the files in the following directories. If they are corrupted, then restore them from the backups.
    • /etc/hosts
    • /etc/modprobe.conf
    • /etc/sysconfig/network
    • /etc/sysconfig/network-scripts
  7. Use the ifconfig command to retrieve the new MAC address for eth0, eth1, eth2, and eth3. For example:

# ifconfig eth0

eth0      Link encap:Ethernet  HWaddr 00:14:4F:CA:D9:AE

          inet addr:10.204.74.184  Bcast:10.204.75.255  Mask:255.255.252.0

          inet6 addr: fe80::214:4fff:feca:d9ae/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:141455 errors:0 dropped:0 overruns:0 frame:0

          TX packets:6340 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:9578692 (9.1 MiB)  TX bytes:1042156 (1017.7 KiB)

          Memory:f8c60000-f8c80000

  1. Edit the ifcfg-eth0 file, ifcfg-eth1 file, ifcfg-eth2 file, and ifcfg-eth3 file in the /etc/sysconfig/network-scripts directory to change the HWADDR value based on the output from step 7. The following is an example of the ifcfg-eth0 file:

#### DO NOT REMOVE THESE LINES ####

#### %GENERATED BY CELL% ####

DEVICE=eth0

BOOTPROTO=static

ONBOOT=yes

IPADDR=10.204.74.184

NETMASK=255.255.252.0

NETWORK=10.204.72.0

BROADCAST=10.204.75.255

GATEWAY=10.204.72.1

HOTPLUG=no

IPV6INIT=no

HWADDR=00:14:4F:CA:D9:AE

  1. Restart Exadata Storage Server.
  2. Activate the grid disks using the following command:

CellCLI> ALTER GRIDDISK ALL ACTIVE

If the Oracle ASM disk on the disks on the cell have not been dropped, then they change to ONLINE automatically, and start getting used.

  1. Validate the configuration using the following command:

CellCLI> ALTER CELL VALIDATE CONFIGURATION

  1. Activate the ILOM for ASR.

Managing Flash Disks on Oracle Exadata Storage Servers

Data is mirrored across Exadata Cells, and write operations are sent to at least two storage cells. If a flash card in one Oracle Exadata Storage Server has problems, then the read and write operations are serviced by the mirrored data in another Oracle Exadata Storage Server. No interruption of service occurs for the application.

If a flash card fails while in write-back mode, then Oracle Exadata System Software determines the data in the flash cache by reading the data from the surviving mirror. The data is then written to the cell that had the failed flash card. The location of the data lost in the failed flash cache is saved by Oracle Exadata System Software at the time of the flash failure. Resilvering then starts by replacing the lost data with the mirrored copy. During resilvering, the grid disk status is ACTIVE — RESILVERING WORKING. If the PMEM cache is in write-through mode, then the data in the failed PMEM device is already present on the data grid disk, so there is no need for resilvering.

Replacing a Flash Disk Due to Flash Disk Failure

Each Oracle Exadata Storage Server is equipped with flash devices.

Starting with Oracle Exadata Database Machine X7, the flash devices are hot-pluggable on the Oracle Exadata Storage Servers. When performing a hot-pluggable replacement of a flash device on Oracle Exadata Storage Servers for X7 or later, the disk status should be Dropped for replacement, and the power LED on the flash card should be off, which indicates the flash disk is ready for online replacement.

For Oracle Exadata Database Machine X6 and earlier, the flash devices are hot-pluggable on Extreme Flash (EF) storage servers, but not on High Capacity (HC) storage servers. On HC storage servers, you need to power down the storage servers before replacing them.

To identify a failed flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE disktype=flashdisk AND status=failed DETAIL

The following is an example of the output from an Extreme Flash storage server:

    name:                          NVME_10

    deviceName:                    /dev/nvme7n1

    diskType:                      FlashDisk

    luns:                          0_10

    makeModel:                     “Oracle NVMe SSD”

    physicalFirmware:              8DV1RA13

    physicalInsertTime:            2016-09-28T11:29:13-07:00

    physicalSerial:                CVMD426500E21P6LGN

    physicalSize:                  1.4554837569594383T

    slotNumber:                    10

    status:                        failed

The following is an example of the output from an Oracle Flash Accelerator F160 PCIe Card:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=failed DETAIL

         name:                   FLASH_5_1

         deviceName:             /dev/nvme1n1

         diskType:               FlashDisk

         luns:                   5_1

         makeModel:              “Oracle Flash Accelerator F160 PCIe Card”

         physicalFirmware:       8DV1RA13

         physicalInsertTime:     2016-11-30T21:24:45-08:00

         physicalSerial:         1030M03UYM

         physicalSize:           1.4554837569594383T

         slotNumber:             “PCI Slot: 5; FDOM: 1”

         status:                 failed

The following is an example of the output from a Sun Flash Accelerator F40 PCIe card:

         name:                   FLASH_5_3

         diskType:               FlashDisk

         luns:                   5_3

         makeModel:              “Sun Flash Accelerator F40 PCIe Card”

         physicalFirmware:       TI35

         physicalInsertTime:     2012-07-13T15:40:59-07:00

         physicalSerial:         5L002X4P

         physicalSize:           93.13225793838501G

         slotNumber:             “PCI Slot: 5; FDOM: 3”

         status:                 failed

For the PCIe cards, the name and slotNumber attributes show the PCI slot and the FDOM number. For Extreme Flash storage servers, the slotNumber attribute shows the NVMe slot on the front panel.

On Oracle Exadata Database Machine X7 and later systems, all flash disks are in the form of an Add-in-Card (AIC), which is inserted into a PCIe slot on the motherboard. The slotNumber attribute shows the PCI number and FDOM number, regardless of whether it is an EF or HC storage server.

If an flash disk is detected to have failed, then an alert is generated indicating that the flash disk, as well as the LUN on it, has failed. The alert message includes either the PCI slot number and FDOM number or the NVMe slot number. These numbers uniquely identify the field replaceable unit (FRU). If you have configured the system for alert notification, then an alert is sent by e-mail message to the designated address.

A flash disk outage can cause reduction in performance and data redundancy. The failed disk should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for flash cache, then the effective cache size for the storage server is reduced. If the flash disk is used for flash log, then flash log is disabled on the disk thus reducing the effective flash log size. If the flash disk is used for grid disks, then the Oracle Automatic Storage Management (Oracle ASM) disks associated with these grid disks are automatically dropped with the FORCE option from the Oracle ASM disk group, and a rebalance operation starts to restore the data redundancy.

The following procedure describes how to replace an FDOM due to disk failure on High Capacity storage servers that do not support online flash replacement. Replacing an NVMe drive on Extreme Flash storage servers is the same as replacing a physical disk: you can just remove the NVMe drive from the front panel and insert a new one. You do not need to shut down the storage server.

  1. Shut down the storage server.
  2. Replace the failed flash disk based on the PCI number and FDOM number. A white Locator LED is lit to help locate the affected storage server.
  3. Power up the storage server. The cell services are started automatically. As part of the storage server startup, all grid disks are automatically ONLINE in Oracle ASM.
  4. Verify that all grid disks have been successfully put online using the following command:

CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus

         data_CD_00_testceladm10     ONLINE

         data_CD_01_testceladm10     ONLINE

         data_CD_02_testceladm10     ONLINE

         …

Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size increases. If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced on them based on the disk group redundancy and ASM_POWER_LIMIT parameter.

About Flash Disk Degraded Performance Statuses

If a flash disk has degraded performance, you might need to replace the disk.

An alert is generated when a flash disk is in predictive failure, poor performance, write-through caching or peer failure status. The alert includes specific instructions for replacing the flash disk. If you have configured the system for alert notifications, then the alerts are sent by e-mail message to the designated address.

predictive failure

Flash disk predictive failure status indicates that the flash disk will fail soon, and should be replaced at the earliest opportunity. If the flash disk is used for flash cache, then it continues to be used as flash cache. If the flash disk is used for grid disks, then the Oracle ASM disks associated with these grid disks are automatically dropped, and Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.

When a flash disk has predictive failure due to one flash disk, then the data is copied. If the flash disk is used for grid disks, then Oracle ASM re-partners the associated partner, and does a rebalance. If the flash disk is used for write back flash cache, then the data is flushed from the flash disks to the grid disks.

To identify a predictive failure flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=  \

‘warning – predictive failure’ DETAIL

         name:               FLASH_1_1

         deviceName:         /dev/nvme3n1

         diskType:           FlashDisk

         luns:               1_1

         makeModel:          “Oracle Flash Accelerator F160 PCIe Card”

         physicalFirmware:   8DV1RA13

         physicalInsertTime: 2016-11-30T21:24:45-08:00

         physicalSerial:     CVMD519000251P6KGN

         physicalSize:       1.4554837569594383T

         slotNumber:         “PCI Slot: 1; FDOM: 1”

         status:             warning – predictive failure

poor performance

Flash disk poor performance status indicates that the flash disk demonstrates extremely poor performance, and should be replaced at the earliest opportunity. Starting with Oracle Exadata System Software release 11.2.3.2, an under-performing disk is automatically identified and removed from active configuration. If the flash disk is used for flash cache, then flash cache is dropped from this disk thus reducing the effective flash cache size for the storage server. If the flash disk is used for grid disks, then the Oracle ASM disks associated with the grid disks on this flash disk are automatically dropped with FORCE option, if possible. If DROP…FORCE cannot succeed due to offline partners, then the grid disks are automatically dropped normally, and Oracle ASM rebalance relocates the data from the poor performance disk to other disks.

Oracle Exadata Database Machine then runs a set of performance tests. When poor disk performance is detected by CELLSRV, the cell disk status changes to normal – confinedOnline, and the physical disk status changes to warning – confinedOnline. The following conditions trigger disk confinement:

  • Disk stopped responding. The cause code in the storage alert log is CD_PERF_HANG.
  • Slow cell disk such as the following:
    • High service time threshold (cause code CD_PERF_SLOW_ABS)
    • High relative service time threshold (cause code CD_PERF_SLOW_RLTV)
  • High read or write latency such as the following:
    • High latency on writes (cause code CD_PERF_SLOW_LAT_WT)
    • High latency on reads (cause code CD_PERF_SLOW_LAT_RD)
    • High latency on reads and writes (cause code CD_PERF_SLOW_LAT_RW)
    • Very high absolute latency on individual I/Os happening frequently (cause code CD_PERF_SLOW_LAT_ERR)
  • Errors such as I/O errors (cause code CD_PERF_IOERR).

If the disk problem is temporary and passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked as poor performance, and Oracle Auto Service Request (ASR) submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. If Oracle ASM cannot take the disks offline, then the cell disk status stays at normal – confinedOnline until the disks can be taken offline safely.

To identify a poor performance flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS= \

‘warning – poor performance’ DETAIL

         name:                FLASH_1_4

         diskType:            FlashDisk

         luns:                1_4

         makeModel:           “Sun Flash Accelerator F20 PCIe Card”

         physicalFirmware:    D20Y

         physicalInsertTime:  2012-09-27T13:11:16-07:00

         physicalSerial:      508002000092e70FMOD2

         physicalSize:        22.8880615234375G

         slotNumber:          “PCI Slot: 1; FDOM: 3”

         status:              warning – poor performance

The disk status change is associated with the following entry in the cell alert history:

MESSAGE ID date_time info “Hard disk entered confinement status. The LUN

 n_m changed status to warning – confinedOnline. CellDisk changed status to normal

 – confinedOnline. Status: WARNING – CONFINEDONLINE  Manufacturer: name  Model

 Number: model  Size: size  Serial Number: serial_number  Firmware: fw_release

 Slot Number: m  Cell Disk: cell_disk_name  Grid Disk: grid disk 1, grid disk 2

 … Reason for confinement: threshold for service time exceeded”

The following would be logged in the storage server alert log:

CDHS: Mark cd health state change cell_disk_name  with newState HEALTH_BAD_

ONLINE pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD

Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0

inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0

global conf related state: numHDsConf: 1 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0

If a flash disk exhibits extremely poor performance, then it is marked as poor performance. The flash cache on that flash disk is automatically disabled, and the grid disks on that flash disk are automatically dropped from the Oracle ASM disk group.

write-through caching

Flash disk write-through caching status indicates the capacitors used to support data cache on the PCIe card have failed, and the card should be replaced as soon as possible.

peer failure

Flash disk peer failure status indicates one of the flash disks on the same Sun Flash Accelerator PCIe card has failed or has a problem. For example, if FLASH_5_3 fails, then FLASH_5_0, FLASH_5_1, and FLASH_5_2 have peer failure status. The following is an example:

CellCLI> LIST PHYSICALDISK

         36:0            L45F3A          normal

         36:1            L45WAE          normal

         36:2            L45WQW          normal

         FLASH_5_0       5L0034XM        warning – peer failure

         FLASH_5_1       5L0034JE        warning – peer failure

         FLASH_5_2       5L002WJH        warning – peer failure

         FLASH_5_3       5L002X4P        failed

When CellSRV detects a predictive or peer failure in any flash disk used for write back flash cache and only one FDOM is bad, then the data on the bad FDOM is resilvered, and the data on the other three FDOMs is flushed. CellSRV then initiates an Oracle ASM rebalance for the disks if there are valid grid disks. The bad disk cannot be replaced until the tasks are completed. MS sends an alert when the disk can be replaced.

Replacing a Flash Disk Due to Flash Disk Problems

Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The four PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. Starting with Oracle Exadata Database Machine X7, you can replace the PCIe cards without powering down the storage server.

In Oracle Exadata Database Machine X6 and earlier systems, the PCIe cards are not hot-pluggable. The Oracle Exadata Storage Server must be powered down before replacing the flash disks or cards.

Starting with Oracle Exadata Database Machine X7, each flash card on both High Capacity and Extreme Flash storage servers is a field-replaceable unit (FRU). The flash cards are also hot-pluggable, so you do not have to shut down the storage server before removing the flash card.

On Oracle Exadata Database Machine X5 and X6 systems, each flash card on High Capacity and each flash drive on Extreme Flash are FRUs. This means that there is no peer failure for these systems.

On Oracle Exadata Database Machine X3 and X4 systems, because the flash card itself is a FRU, if any FDOMs were to fail, the Oracle Exadata System Software would automatically put the rest of FDOMs on that card to peer failure so that the data can be moved out to prepare for the flash card replacement.

On Oracle Exadata Database Machine V2 and X2 systems, each FDOM is a FRU. There is no peer failure for flash for these systems.

Determining when to proceed with disk replacement depends on the release, as described in the following:

  • For Oracle Exadata System Software releases earlier than 11.2.3.2:

Wait until the Oracle ASM disks have been successfully dropped by querying the V$ASM_DISK_STAT view before proceeding with the flash disk replacement. If the normal drop did not complete before the flash disk fails, then the Oracle ASM disks are automatically dropped with the FORCE option from the Oracle ASM disk group. If the DROP command did not complete before the flash disk fails.

  • For Oracle Exadata System Software releases 11.2.3.2 and later:

An alert is sent when the Oracle ASM disks have been dropped, and the flash disk can be safely replaced. If the flash disk is used for write-back flash cache, then wait until none of the grid disks are cached by the flash disk. Use the following command to check the cachedBy attribute of all the grid disks. The cell disk on the flash disk should not appear in any grid disk’s cachedBy attribute.

CellCLI> LIST GRIDDISK ATTRIBUTES name, cachedBy

If the flash disk is used for both grid disks and flash cache, then wait until receiving the alert, and the cell disk is not shown in any grid disk’s cachedBy attribute.

The following procedure describes how to replace a flash disk on High Capacity storage servers for Oracle Exadata Database Machine X6 and earlier due to disk problems.

  1. Stop the cell services using the following command:

CellCLI> ALTER CELL SHUTDOWN SERVICES ALL

The preceding command checks if any disks are offline, in predictive failure status or need to be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the cell services. If the following error is displayed, then it may not be safe to stop the cell services because a disk group may be forced to dismount due to redundancy.

Stopping the RS, CELLSRV, and MS services…

The SHUTDOWN of ALL services was not successful.

CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be

forced to dismount due to reduced redundancy.

Getting the state of CELLSRV services… running

Getting the state of MS services… running

Getting the state of RS services… running

If the error occurs, then restore Oracle ASM disk group redundancy and retry the command when disk status is back to normal for all the disks.

  1. Shut down the storage server.
  2. Replace the failed flash disk based on the PCI number and FDOM number. A white Locator LED is lit to help locate the affected storage server.
  3. Power up the storage server. The cell services are started automatically. As part of the storage server startup, all grid disks are automatically ONLINE in Oracle ASM.
  4. Verify that all grid disks have been successfully put online using the following command:

CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus

Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size increases. If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced on them based on the disk group redundancy and the ASM_POWER_LIMIT parameter.

Enabling Write Back Flash Cache

Write operations serviced by flash instead of by disk are referred to as write-back flash cache.

Starting with Oracle Exadata System Software release 11.2.3.2.1, Exadata Smart Flash Cache can transparently cache frequently-accessed data to fast solid-state storage, improving query response times and throughput.

Enable Write Back Flash Cache for 11.2.3.3.1 or Higher

Enable write back Flash Cache on the storage servers to improve query response times and throughput.

For Oracle Exadata System Software release 11.2.3.3.1 or higher, you do not have to stop cell services or inactivate grid disks when changing the Flash Cache from Write Through mode to Write Back mode.

  1. Validate all the physical disks are in NORMAL state before modifying Exadata Smart Flash Cache.

The following command should return no rows:

# dcli –l root –g cell_group cellcli –e “list physicaldisk attributes name,status”|grep –v NORMAL

  1. Drop the Flash Cache.

# dcli –l root –g cell_group cellcli -e drop flashcache

  1. Set the flashCacheMode attribute to writeback.

# dcli –l root – g cell_group cellcli -e “alter cell flashCacheMode=writeback”

  1. Re-create the Flash Cache.

# dcli –l root –g cell_group cellcli -e create flashcache all

  1. Verify the flashCacheMode has been set to writeback.

# dcli –l root –g cell_group cellcli -e list cell detail | grep flashCacheMode

  1. Validate the grid disk attributes cachingPolicy and cachedby.

# cellcli –e list griddisk attributes name,cachingpolicy,cachedby

Disabling Write Back Flash Cache

You can disable the Write-Back Flash Cache by enabling Write-Through Flash Cache.

Starting with Oracle Exadata System Software release 11.2.3.2.1, Exadata Smart Flash Cache can transparently cache frequently-accessed data to fast solid-state storage, improving query response times and throughput.

Write operations serviced by flash instead of by disk are referred to as write back flash cache.

Disable Write Back Flash Cache for Exadata X8M or Later Servers

To change the FlashCache mode from WriteBack to WriteThrough on an Oracle Exadata Database Machine X8M or later system with PMEM cache, if the PMEM cache is in WriteBack mode, you first need to modify the PMEM cache.

  1. For X8M or later systems with PMEM cache, if the PMEM Cache is in WriteBack mode:
    1. Flush the PMEM cache.

If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here.

# dcli –l root –g cell_group cellcli ALTER PMEMCACHE ALL FLUSH

Otherwise, list the specific disks using the CELLDISK=”cdisk1 [,cdisk2] …” clause.

  1. Drop the PMEM cache.

# dcli –l root –g cell_group cellcli DROP PMEMCACHE

  1. Modify the PMEM cache to use WriteThrough mode.

# dcli –l root –g cell_group cellcli ALTER CELL pmemCacheMode=WriteThrough

  1. Re-create the PMEM cache.

If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here. Otherwise, list the specific disks using the CELLDISK=”cdisk1 [,cdisk2] …” clause. If the size attribute is not specified, then the maximum size is allocated. All available space on each cell disk in the list is used for PMEM Cache.

# dcli –l root –g cell_group cellcli -e CREATE PMEMCACHE ALL

  1. Verify the pmemCacheMode has been set to writethrough.

# dcli –l root –g cell_group cellcli -e list cell detail | grep pmemCacheMode

  1. Validate all the physical disks are in NORMAL state before modifying the FlashCache.

# dcli –l root –g cell_group cellcli –e “LIST PHYSICALDISK ATTRIBUTES name,status” |grep –v NORMAL

The command should return no rows.

  1. Determine amount of dirty data in the flash cache.

# cellcli -e “LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \’FC_BY_DIRTY.*\’ “

  1. Flush the Flash cache.

If the Flash cache utilizes all available Flash cell disks, you can use the ALL keyword instead of listing the flash disks.

# dcli –g cell_group –l root cellcli -e “ALTER FLASHCACHE CELLDISK=\’FD_02_dm01celadm12,

FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\’ FLUSH”

  1. Check the progress of the flushing of flash cache.

The flushing process is complete when FC_BY_DIRTY is 0 MB.

# dcli -g cell_group -l root cellcli -e “LIST METRICCURRENT ATTRIBUTES name,metricvalue

 WHERE name LIKE \’FC_BY_DIRTY.*\’ “

Or, you can check to see if the attribute flushstatus has been set to Completed.

# dcli -g cell_group -l root cellcli -e “LIST CELLDISK ATTRIBUTES name, flushstatus,

flusherror” | grep FD

  1. After flushing of the flash cache completes, drop the flash cache.

# dcli -g cell_group -l root cellcli -e drop flashcache

  1. Modify the Flash cache to use WriteThrough mode.

# dcli -g cell_group -l root cellcli -e “ALTER CELL flashCacheMode=writethrough”

  1. Re-create the Flash cache.

If the Flash cache utilizes all available Flash cell disks, you can use the ALL keyword instead of listing the cell disks.

If you do not include the size attribute, then all available space on each cell disk in the list is used for Exadata Smart Flash Cache.

# dcli –l root –g cell_group cellcli -e “create flashcache celldisk=\’FD_02_dm01celadm12,

FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\’

  1. Verify the flashCacheMode has been set to writethrough.

# dcli –l root –g cell_group cellcli -e list cell detail | grep flashCacheMode

Monitoring Exadata Smart Flash Cache Usage Statistics

Use the following methods to monitor Exadata Smart Flash Cache usage:

  • ExaWatcher reports

Flash Cache size and read, write, and population operation related stats are exposed in the Cell Server Charts and in the FlashCache related stats section.

  • AWR report, in the Exadata Statistics section.
    • Under Performance Summary you can find various statistics related to Flash Cache and its benefits.
    • Under Exadata Smart Statistics there is a section for Flash Cache with several different reports on Exadata Smart Flash Cache statistics.
  • Use the CellCLI LIST command to display and monitor metrics for the flash cache.

Disabling Flash Cache Compression

Flash cache compression can be disabled on Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2, and Oracle Exadata Database Machine X3-8 Full Rack systems. Oracle Exadata Database Machine X5-2, X5-8, and later systems do not have flash cache compression.

The following procedure describes how to disable flash cache compression:

  1. Save the user data on the flash cell disks.

# cellcli -e ALTER FLASHCACHE ALL FLUSH

For grid disks, the attribute cachedby should be null. Also, the number of dirty buffers (unflushed) will be 0 after flush is complete.

# cellcli -e LIST METRICCURRENT FC_BY_DIRTY

          FC_BY_DIRTY     FLASHCACHE      0.000 MB

  1. Remove the flash cache from the cell.

# cellcli -e DROP FLASHCACHE ALL

  1. Remove the flash log from the cell.

# cellcli -e DROP FLASHLOG ALL

  1. Drop the cell disks on the flash disks.

# cellcli -e DROP CELLDISK ALL FLASHDISK

  1. Disable Flash Cache Compression using the following commands, based on the system:
    • If Exadata Storage Cell Server image is 11.2.3.3.1 or higher and the Exadata Storage Cell is X3-2 or X4-2:

# cellcli -e ALTER CELL flashcachecompress=false

  1. If Exadata Storage Cell Server image is 11.2.3.3.0 and the Exadata Storage Cell is X3-2:

# cellcli -e ALTER CELL flashCacheCompX3Support=true

# cellcli -e ALTER CELL flashCacheCompress=false

  1. You can verify that Flash Cache Compress has been disabled by viewing the cell attributes:

# cellcli -e LIST CELL attributes name,flashCacheCompress

  1. Correct values are FALSE or a null string.
  2. Verify the size of the physical disks has decreased.

# cellcli -e LIST PHYSICALDISK attributes name,physicalSize,status WHERE disktype=flashdisk

The status should be normal. Use the following information to validate the expected size when Compression is OFF:

  1. Aura 2.0/F40/X3:
    • Physical Disk Size: 93.13 G (OFF) or 186.26 G (ON)
    • Flash Cache Size: 1489 G (OFF) or 2979 G (ON)
  2. Aura 2.1/F80/X4:
    • Physical Disk Size: 186.26 G (OFF) or 372.53 G (ON)
    • Flash Cache Size: 2979 G (OFF) or 5959 G (ON)
  3. Create the cell disks on the flash disks.

# cellcli -e CREATE CELLDISK ALL FLASHDISK

CellDisk FD_00_exampleceladm18 successfully created

CellDisk FD_15_exampleceladm18 successfully created

  1. Create the flash log.

# cellcli -e CREATE FLASHLOG ALL

Flash log RaNdOmceladm18_FLASHLOG successfully created

Verify the flash log is in normal mode.

# cellcli -e LIST FLASHLOG DETAIL

  1. Create the flash cache on the cell.

# cellcli -e CREATE FLASHCACHE ALL

Flash cache exampleceladm18_FLASHCACHE successfully created

Verify the flash cache is in normal mode.

# cellcli -e LIST FLASHCACHE DETAIL

  1. Verify that flash cache compression is disabled.

# cellcli -e LIST CELL

The value of the flashCacheCompress attribute should be false.

Managing the RAM Cache on the Storage Servers

Cell RAM Cache is a cache in front of the Flash Cache and is an extension of the database cache. It is faster than the Flash Cache, but has smaller capacity.

The Cell RAM Cache feature was introduced in Oracle Exadata System Software release 18c (18.1.0.0.0). Cell RAM Cache is disabled by default (ramCacheMode is set to auto).

About the Cell RAM Cache

The Cell RAM Cache provides much lower IO latency for online transaction processing (OLTP) reads.

In an OLTP workload, the cell single block physical read wait statistic typically shows up as a top consumer in database processing time. If these reads could be procured from the Cell RAM Cache, you can greatly reduce the read latency and reduce the IO wait time for these reads, improving the performance of OLTP applications.

Alternatively, you can view the Cell RAM Cache as an extension of the database buffer cache. The buffer cache misses become Cell RAM Cache hits, which accelerates OLTP performance because you are getting more cache hits from the combined power of the buffer cache and the Cell RAM Cache.

Sizing Recommendations for the Cell RAM Cache

Use the buffer pool advisory section in the Automatic Workload Repository (AWR) reports during their peak OLTP workloads to determine the recommended size of the Cell RAM Cache.

The default size for the Cell RAM Cache on Exadata Storage Servers without the memory expansion kit is rather limited, so this feature is not enabled by default. To get the acceleration benefits, you should first install memory expansion kits on the storage servers. Then, you can enable the Cell RAM Cache and the Oracle Exadata System Software automatically sizes the Cell RAM Cache based on the free memory on Exadata Storage Servers.

The buffer pool advisory section in the AWR reports during peak OLTP workloads can help determine how you should configure the Cell RAM Cache. For example:

Description of awr-buffer-pool-advisory-report-cell-ram-cache-sizing.png follows

In the above report, with the current buffer cache size (size factor of 1.00), the database performs approximately 338 million physical reads. If you increased the buffer cache by 87% (size factor of 1.87), you would reduce the physical reads to around 12 million.

If you created a Cell RAM Cache that is 87% of the buffer cache size, you could have 338,431,000 reads – 11,971,000 reads, or approximately 326 million reads being satisfied by the Cell RAM Cache. You can size the Cell RAM Cache based on the number of OLTP reads that you would like to benefit from the Cell RAM Cache. The total size of the Cell RAM Cache should be divided by the number of available storage servers to get the target RAM size to use for each storage server.

If you have an Oracle Real Application Clusters (Oracle RAC) database with multiple database instances, then each database instance has an independent buffer pool advisory in the respective AWR report. The physical read savings may vary from instance to instance. The report has already taken into account the case where a buffer cache miss can be satisfied by a cache fusion block transfer from another instance. So the buffer pool advisory report only estimates the actual storage reads after discounting the cache fusion block transfers. To size the Cell RAM Cache for Oracle RAC is very simple: decide how much additional space you need for each instance and add up the values. The sum is the amount of Cell RAM Cache that you need to provision.

Similarly, if you have multiple databases sharing the same set of cells, you can add up the additional buffer cache sizes for each database. The sum is the total amount of Cell RAM Cache you need to provision on the cells.

After you know what size Cell RAM Cache you want, you can then make a decision on which memory expansion kit suits your need best. You can add extra memory for both the database server and the storage server, and you can upgrade the servers in any order.

  • Expanding database server memory gives you more memory to use on a given server, and potentially have a bigger buffer cache in addition to larger SGA, PGA, and so on.
  • If you expand storage server memory, then the Cell RAM Cache is shared across all database servers. The additional memory can be leveraged by all database servers across all Oracle Virtual Machines (Oracle VM) and database instances.

If you run different workloads at different times across different database servers or Oracle VMs, or if you are running a consolidated environment with multiple databases, then expanding the storage server memory might provide more flexibility for maximizing the usage of the additional memory because that memory can be shared across all databases.

Enabling the Cell RAM Cache

To enable the Cell RAM Cache feature, on each Oracle Exadata Storage Server set the ramCacheMode cell attribute to on.

The Cell RAM Cache feature is disabled by default (ramCacheMode is set to auto). On each storage server, when you set ramCacheMode=on, the storage server automatically uses all available free memory to create the Cell RAM Cache.

  1. Change the value of the ramCacheMode attribute on each cell.

You can use dcli to modify multiple cells with a single command, or you can run the following command on each Oracle Exadata Storage Server.

CellCLI> ALTER CELL ramCacheMode=on

Cell host03celadm10 successfully altered

  1. Restart CellSrv.

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

Viewing the Cell RAM Cache Size

After the ramCacheMode attribute is set to on, the storage server automatically uses as much free memory as is available on the storage server to create the Cell RAM Cache.

The creation of RAM cache takes place asynchronously in the background. If you query the size of the Cell RAM Cache immediately after enabling this feature, you will not see an accurate size. You can monitor the cell alert.log to follow the progress of Cell RAM Cache creation.

  • To view the current status of the Cell RAM Cache, retrieve the ramCacheMode, ramCacheSize and ramCacheMaxSize attributes for the cell.

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize

     18.875G         On          18.875G

Changing the Size of the Cell RAM Cache

Cell RAM Cache automatically uses as much free memory as is available on the storage server to create the Cell RAM Cache.

After you enable the Cell RAM Cache feature, the storage server automatically uses as much free memory as is available on the Oracle Exadata Storage Server to create the Cell RAM Cache. The size of the Cell RAM Cache can be determined by using the CellCLI LIST CELL ramCacheSize command.

The ramCacheMaxSize attribute determines the maximum amount of memory that can be used for the Cell RAM Cache.

To limit the size of the Cell RAM Cache, modify the value of the ramCacheMaxSize attribute on each storage server.

You can use exadcli to modify multiple storage servers with a single command, or you can run the following command on each Oracle Exadata Storage Server.

CellCLI> ALTER CELL ramCacheMaxSize=1G

Cell host03celadm10 successfully altered

Limit the Size of the Cell RAM Cache

To limit the maximum size of the Cell RAM Cache to 1 GB, use the following command on to modify multiple Oracle Exadata Storage Servers:

exadcli -c cell01,cell02,cell03 -l celladministrator alter cell ramCacheMaxSize=1G

Monitoring Cell RAM Cache Usage Statistics

Use the following methods to monitor Cell RAM Cache usage:

  • ExaWatcher reports

RamCache size and read, write, and population operation related stats are exposed via cellsrvstat in the “RamCache related stats” section.

  • AWR report, in the Memory Cache section.

There are 3 subsections:

  • Memory Cache Space Usage: For each cell, this table shows the space allocated for the Cell RAM Cache and the percentage of space used in the Cell RAM Cache for OLTP activity.
  • Memory Cache User Reads: For each cell, this table shows the read statistics for the Cell RAM Cache based on the number of reads and the amount of data read (MB).
  • Memory Cache Internal Writes: For each cell, this table shows the write statistics for the Cell RAM Cache based on the number of write requests and the amount of data written (MB).

Disabling the Cell RAM Cache

To disable the Cell RAM Cache feature, set the ramCacheMode cell attribute to off on each Oracle Exadata Storage Server.

  1. Change the value of the ramCacheMode attribute on each cell.

You can use dcli to modify multiple cells with a single command, or you can run the following command on each Oracle Exadata Storage Server.

CellCLI> ALTER CELL ramCacheMode=off

Cell host03celadm10 successfully altered

  1. Restart CellSrv.

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

Resizing Grid Disks

You can resize grid disks and Oracle ASM disk groups to shrink one with excess free space and increase the size of another that is near capacity.

Initial configuration of Oracle Exadata Database Machine disk group sizes is based on Oracle best practices and the location of the backup files.

  • For internal backups: allocation of available space is 40% for the DATA disk groups, and 60% for the RECO disk groups.
  • For external backups: allocation of available space is 80% for the DATA disk group, and 20% for the RECO disk group.

The disk group allocations can be changed after deployment. For example, the DATA disk group allocation may be too small at 60%, and need to be resized to 80%.

If your system has no free space available on the cell disks and one disk group, for example RECO, has plenty of free space, then you can resize the RECO disk group to a smaller size and reallocate the free space to the DATA disk group. The free space available after shrinking the RECO disk group is at a non-contiguous offset from the existing space allocations for the DATA disk group. Grid disks can use space anywhere on the cell disks and do not have to be contiguous.

If you are expanding the grid disks and the cell disks already have sufficient space to expand the existing grid disks, then you do not need to first resize an existing disk group. You would skip steps 2 and 3 below where the example shows the RECO disk group and grid disks are shrunk (you should still verify the cell disks have enough free space before growing the DATA grid disks). The amount of free space the administrator should reserve depends on the level of failure coverage.

If you are shrinking the size of the grid disks, you should understand how space is reserved for mirroring. Data is protected by Oracle ASM using normal or high redundancy to create one or two copies of data, which are stored as file extents. These copies are stored in separate failure groups. A failure in one failure group does not affect the mirror copies, so data is still accessible.

When a failure occurs, Oracle ASM re-mirrors, or rebalances, any extents that are not accessible so that redundancy is reestablished. For the re-mirroring process to succeed, sufficient free space must exist in the disk group to allow creation of the new file extent mirror copies. If there is not enough free space, then some extents will not be re-mirrored and the subsequent failure of the other data copies will require the disk group to be restored from backup. Oracle ASM sends an error when a re-mirror process fails due to lack of space.

Determine the Amount of Available Space

To increase the size of the disks in a disk group you must either have unallocated disk space available, or you have to reallocate space currently used by a different disk group.

  1. View the space currently used by the disk groups.

SELECT name, total_mb, free_mb, total_mb – free_mb used_mb, round(100*free_mb/total_mb,2) pct_free

FROM v$asm_diskgroup

ORDER BY 1;

NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE

—————————— ———- ———- ———- ———-

DATAC1                           68812800    9985076   58827724      14.51

RECOC1                           94980480   82594920   12385560      86.96

The example above shows that the DATAC1 disk group has only about 15% of free space available while the RECOC1 disk group has about 87% free disk space. The PCT_FREE displayed here is raw free space, not usable free space. Additional space is needed for rebalancing operations.

  1. For the disk groups you plan to resize, view the count and status of the failure groups used by the disk groups.

SELECT dg.name, d.failgroup, d.state, d.header_status, d.mount_mode,

 d.mode_status, count(1) num_disks

FROM V$ASM_DISK d, V$ASM_DISKGROUP dg

WHERE d.group_number = dg.group_number

AND dg.name IN (‘RECOC1’, ‘DATAC1’)

GROUP BY dg.name, d.failgroup, d.state, d.header_status, d.mount_status,

  d.mode_status

ORDER BY 1, 2, 3;

NAME       FAILGROUP      STATE      HEADER_STATU MOUNT_S  MODE_ST  NUM_DISKS

———- ————-  ———- ———— ——– ——-  ———

DATAC1     EXA01CELADM01  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM02  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM03  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM04  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM05  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM06  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM07  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM08  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM09  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM10  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM11  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM12  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM13  NORMAL     MEMBER        CACHED  ONLINE   12

DATAC1     EXA01CELADM14  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM01  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM02  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM03  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM04  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM05  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM06  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM07  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM08  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM09  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM10  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM11  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM12  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM13  NORMAL     MEMBER        CACHED  ONLINE   12

RECOC1     EXA01CELADM14  NORMAL     MEMBER        CACHED  ONLINE   12

The above example is for a full rack, which has 14 cells and 14 failure groups for DATAC1 and RECOC1. Verify that each failure group has at least 12 disks in the NORMAL state (num_disks). If you see disks listed as MISSING, or you see an unexpected number of disks for your configuration, then do not proceed until you resolve the problem.

Extreme Flash systems should see a disk count of 8 instead of 12 for num_disks.

  1. List the corresponding grid disks associated with each cell and each failure group, so you know which grid disks to resize.

SELECT dg.name, d.failgroup, d.path

FROM V$ASM_DISK d, V$ASM_DISKGROUP dg

WHERE d.group_number = dg.group_number

AND dg.name IN (‘RECOC1’, ‘DATAC1’)

ORDER BY 1, 2, 3;

NAME        FAILGROUP      PATH

———– ————-  ———————————————-

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_00_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_01_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_02_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_03_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_04_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_05_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_06_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_07_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_08_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_09_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_10_exa01celadm01

DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_11_exa01celadm01

DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_00_exa01celadm01

DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_01_exa01celadm01

DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_02_exa01celadm01

RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_00_exa01celadm13

RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_01_exa01celadm13

RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_02_exa01celadm13

RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_09_exa01celadm14

RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_10_exa01celadm14

RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_11_exa01celadm14 

168 rows returned.

  1. Check the cell disks for available free space.

Free space on the cell disks can be used to increase the size of the DATAC1 grid disks. If there is not enough available free space to expand the DATAC1 grid disks, then you must shrink the RECOC1 grid disks to provide the additional space for the desired new size of DATAC1 grid disks.

[[email protected] tmp]# dcli -g ~/cell_group -l root “cellcli -e list celldisk \

  attributes name,freespace”

exa01celadm01: CD_00_exa01celadm01 0

exa01celadm01: CD_01_exa01celadm01 0

exa01celadm01: CD_02_exa01celadm01 0

exa01celadm01: CD_03_exa01celadm01 0

exa01celadm01: CD_04_exa01celadm01 0

exa01celadm01: CD_05_exa01celadm01 0

exa01celadm01: CD_06_exa01celadm01 0

exa01celadm01: CD_07_exa01celadm01 0

exa01celadm01: CD_08_exa01celadm01 0

exa01celadm01: CD_09_exa01celadm01 0

exa01celadm01: CD_10_exa01celadm01 0

exa01celadm01: CD_11_exa01celadm01 0

In this example, there is no free space available, so you must shrink the RECOC1 grid disks first to provide space for the DATAC1 grid disks. In your configuration there might be plenty of free space available and you can use that free space instead of shrinking the RECOC1 grid disks.

  1. Calculate the amount of space to shrink from the RECOC1 disk group and from each grid disk.

The minimum size to safely shrink a disk group and its grid disks must take into account the following:

  1. Space currently in use (USED_MB)
  2. Space expected for growth (GROWTH_MB)
  3. Space needed to rebalance in case of disk failure (DFC_MB), typically 15% of total disk group size

The minimum size calculation taking the above factors into account is:

Minimum DG size (MB) = ( USED_MB + GROWTH_MB ) * 1.15

  1. USED_MB can be derived from V$ASM_DISKGROUP by calculating TOTAL_MB – FREE_MB
  2. GROWTH_MB is an estimate specific to how the disk group will be used in the future and should be based on historical patterns of growth

For the RECOC1 disk group space usage shown in step 1, we see the minimum size it can shrink to assuming no growth estimates is:

Minimum RECOC1 size = (TOTAL_MB – FREE_MB + GROWTH_MB) * 1.15

= ( 94980480 – 82594920 + 0) * 1.15 = 14243394 MB = 13,910 GB

In the example output shown in Step 1, RECOC1 has plenty of free space and DATAC1 has less than 15% free. So, you could shrink RECOC1 and give the freed disk space to DATAC1. If you decide to reduce RECOC1 to half of its current size, the new size is 94980480 / 2 = 47490240 MB. This size is significantly above the minimum size we calculated for the RECOC1 disk group above, so it is safe to shrink it down to this value.

The query in Step 2 shows that there are 168 grid disks for RECOC1, because there are 14 cells and 12 disks per cell (14 * 12 = 168). The estimated new size of each grid disk for the RECOC1 disk group is 47490240 / 168, or 282,680 MB.

Find the closest 16 MB boundary for the new grid disk size. If you do not perform this check, then the cell will round down the grid disk size to the nearest 16 MB boundary automatically, and you could end up with a mismatch in size between the Oracle ASM disks and the grid disks.

SQL> SELECT 16*TRUNC(&new_disk_size/16) new_disk_size FROM dual;

Enter value for new_disk_size: 282680

NEW_DISK_SIZE

————-

       282672

Based on the above result, you should choose 282672 MB as the new size for the grid disks in the RECOC1 disk group. After resizing the grid disks, the size of the RECOC1 disk group will be 47488896 MB.

  1. Calculate how much to increase the size of each grid disk in the DATAC1 disk group.

Ensure the Oracle ASM disk size and the grid disk sizes match across the entire disk group. The following query shows the combinations of disk sizes in each disk group. Ideally, there is only one size found for all disks and the sizes of both the Oracle ASM (total_mb) disks and the grid disks (os_mb) match.

SELECT dg.name, d.total_mb, d.os_mb, count(1) num_disks

FROM v$asm_diskgroup dg, v$asm_disk d

WHERE dg.group_number = d.group_number

GROUP BY dg.name, d.total_mb, d.os_mb;

NAME                             TOTAL_MB      OS_MB  NUM_DISKS

—————————— ———- ———- ———-

DATAC1                             409600     409600        168

RECOC1                             565360     565360        168

After shrinking RECOC1’s grid disks, the following space is left per disk for DATAC1:

Additional space for DATAC1 disks = RECOC1_current_size – RECOC1_new_size
                                                       = 565360 – 282672 = 282688 MB

To calculate the new size of the grid disks for the DATAC1 disk group, use the following:

DATAC1’s disks new size  = DATAC1_ disks_current_size + new_free_space_from_RECOC1
                                          = 409600 + 282688 = 692288 MB

Find the closest 16 MB boundary for the new grid disk size. If you do not perform this check, then the cell will round down the grid disk size to the nearest 16 MB boundary automatically, and you could end up with a mismatch in size between the Oracle ASM disks and the grid disks.

SQL> SELECT 16*TRUNC(&new_disk_size/16) new_disk_size FROM dual;

Enter value for new_disk_size: 692288

NEW_DISK_SIZE

————-

       692288

Based on the query result, you can use the calculated size of 692288 MB for the disks in the DATAC1 disk groups because the size is on a 16 MB boundary. If the result of the query is different from the value you supplied, then you must use the value returned by the query because that is the value to which the cell will round the grid disk size.

The calculated value of the new grid disk size will result in the DATAC1 disk group having a total size of 116304384 MB (168 disks * 692288 MB).

Shrink the Oracle ASM Disks in the Donor Disk Group

If there is no free space available on the cell disks, you can reduce the space used by one disk group to provide additional disk space for a different disk group.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group.

Before resizing the disk group, make sure the disk group you are taking space from has sufficient free space.

  1. Shrink the Oracle ASM disks for the RECO disk group down to the new desired size for all disks.

Use the new size for the disks in the RECO disk group that was calculated in Step 5 of Determine the Amount of Available Space.

SQL> ALTER DISKGROUP recoc1 RESIZE ALL SIZE 282672M REBALANCE POWER 64;

If the specified disk group has quorum disks configured within the disk group, then the ALTER DISKGROUP … RESIZE ALL command could fail with error ORA-15277. You can specify the storage server failure group names (for the ones with a FAILURE_TYPE of REGULAR, not QUORUM) explicitly in the SQL command, for example:

SQL> ALTER DISKGROUP recoc1 RESIZE DISKS IN FAILGROUP exacell01 SIZE 282672M,

exacell02 SIZE 282672M, exacell03 SIZE 282672M REBALANCE POWER 64;

Wait for rebalance to finish by checking the view GV$ASM_OPERATION.

SQL> set lines 250 pages 1000

SQL> col error_code form a10

SQL> SELECT dg.name, o.*

  2  FROM gv$asm_operation o, v$asm_diskgroup dg

  3  WHERE o.group_number = dg.group_number;

Proceed to the next step ONLY when the query against GV$ASM_OPERATION shows no rows for the disk group being altered.

  1. Verify the new size of the ASM disks using the following queries:

SQL> SELECT name, total_mb, free_mb, total_mb – free_mb used_mb,

  2   ROUND(100*free_mb/total_mb,2) pct_free

  3  FROM v$asm_diskgroup

  4  ORDER BY 1;

NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE

—————————— ———- ———- ———- ———-

DATAC1                           68812800    9985076   58827724      14.51

RECOC1                           47488896   35103336   12385560      73.92

SQL> SELECT dg.name, d.total_mb, d.os_mb, COUNT(1) num_disks

  2  FROM v$asm_diskgroup dg, v$asm_disk d

  3  WHERE dg.group_number = d.group_number

  4  GROUP BY dg.name, d.total_mb, d.os_mb;

NAME                             TOTAL_MB      OS_MB  NUM_DISKS

—————————— ———- ———- ———-

DATAC1                             409600     409600        168

RECOC1                             282672     565360        168

The above query example shows that the disks in the RECOC1 disk group have been resized to a size of 282672 MG each, and the total disk group size is 47488896 MB.

Increase the Size of the Grid Disks Using Available Space

You can increase the size used by the grid disks if there is unallocated disk space either already available, or made available by shrinking the space used by a different Oracle ASM disk group.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group. If you already have sufficient space to expand an existing disk group, then you do not need to reallocate space from a different disk group.

  1. Check that the cell disks have the expected amount of free space.

After completing the tasks to shrink the Oracle ASM disks and the grid disks, you would expect to see the following free space on the cell disks:

[[email protected] tmp]# dcli -g ~/cell_group -l root “cellcli -e list celldisk \

attributes name,freespace”

exa01celadm01: CD_00_exa01celadm01 276.0625G

exa01celadm01: CD_01_exa01celadm01 276.0625G

exa01celadm01: CD_02_exa01celadm01 276.0625G

exa01celadm01: CD_03_exa01celadm01 276.0625G

exa01celadm01: CD_04_exa01celadm01 276.0625G

exa01celadm01: CD_05_exa01celadm01 276.0625G

exa01celadm01: CD_06_exa01celadm01 276.0625G

exa01celadm01: CD_07_exa01celadm01 276.0625G

exa01celadm01: CD_08_exa01celadm01 276.0625G

exa01celadm01: CD_09_exa01celadm01 276.0625G

exa01celadm01: CD_10_exa01celadm01 276.0625G

exa01celadm01: CD_11_exa01celadm01 276.0625G

  1. For each storage cell, increase the size of the DATA grid disks to the desired new size.

dcli -c exa01celadm01 -l root “cellcli -e alter griddisk DATAC1_CD_00_exa01celadm01 \

,DATAC1_CD_01_exa01celadm01 \

,DATAC1_CD_02_exa01celadm01 \

,DATAC1_CD_03_exa01celadm01 \

,DATAC1_CD_04_exa01celadm01 \

,DATAC1_CD_05_exa01celadm01 \

,DATAC1_CD_06_exa01celadm01 \

,DATAC1_CD_07_exa01celadm01 \

,DATAC1_CD_08_exa01celadm01 \

,DATAC1_CD_09_exa01celadm01 \

,DATAC1_CD_10_exa01celadm01 \

,DATAC1_CD_11_exa01celadm01 \

size=692288M “

dcli -c exa01celadm14 -l root “cellcli -e alter griddisk DATAC1_CD_00_exa01celadm14 \

,DATAC1_CD_01_exa01celadm14 \

,DATAC1_CD_02_exa01celadm14 \

,DATAC1_CD_03_exa01celadm14 \

,DATAC1_CD_04_exa01celadm14 \

,DATAC1_CD_05_exa01celadm14 \

,DATAC1_CD_06_exa01celadm14 \

,DATAC1_CD_07_exa01celadm14 \

,DATAC1_CD_08_exa01celadm14 \

,DATAC1_CD_09_exa01celadm14 \

,DATAC1_CD_10_exa01celadm14 \

,DATAC1_CD_11_exa01celadm14 \

size=692288M “

  1. Verify the new size of the grid disks associated with the DATAC1 disk group using the following command:

dcli -g cell_group -l root “cellcli -e list griddisk attributes name,size \

where name like \’DATAC1.*\’ “

exa01celadm01: DATAC1_CD_00_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_01_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_02_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_03_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_04_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_05_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_06_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_07_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_08_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_09_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_10_exa01celadm01 676.0625G

exa01celadm01: DATAC1_CD_11_exa01celadm01 676.0625G

Instead of increasing the size of the DATA disk group, you could instead create new disk groups with the new free space or keep it free for future use. In general, Oracle recommends using the smallest number of disk groups needed (typically DATA, RECO, and DBFS_DG) to give the greatest flexibility and ease of administration. However, there may be cases, perhaps when using virtual machines or consolidating many databases, where additional disk groups or available free space for future use may be desired.

If you decide to leave free space on the grid disks in reserve for future use, please see the My Oracle Support Note 1684112.1 for the steps on how to allocate free space to an existing disk group at a later time.

Increase the Size of the Oracle ASM Disks

You can increase the size used by the Oracle ASM disks after increasing the space allocated to the associated grid disks.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group.

You must have completed the task of resizing the grid disks before you can resize the corresponding Oracle ASM disk group.

  1. Increase the Oracle ASM disks for DATAC1 disk group to the new size of the grid disks on the storage cells.

SQL> ALTER DISKGROUP datac1 RESIZE ALL;

This command resizes the Oracle ASM disks to match the size of the grid disks.

As a workaround, you can specify the storage server failure group names (for the ones of FAILURE_TYPE “REGULAR”, not “QUORUM”) explicitly in the SQL command, for example:

SQL> ALTER DISKGROUP datac1 RESIZE DISKS IN FAILGROUP exacell01, exacell02, exacell03;

  1. Wait for the rebalance operation to finish.

SQL> set lines 250 pages 1000

SQL> col error_code form a10

SQL> SELECT dg.name, o.* FROM gv$asm_operation o, v$asm_diskgroup dg WHERE o.group_number = dg.group_number;

Do not continue to the next step until the query returns zero rows for the disk group that was altered.

  1. Verify that the new sizes for the Oracle ASM disks and disk group is at the desired sizes.

SQL> SELECT name, total_mb, free_mb, total_mb – free_mb used_mb,

     ROUND(100*free_mb/total_mb,2) pct_free

     FROM v$asm_diskgroup

     ORDER BY 1;

NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE

—————————— ———- ———- ———- ———-

DATAC1                          116304384   57439796   58864588      49.39

RECOC1                           47488896   34542516   12946380      72.74

SQL>  SELECT dg.name, d.total_mb, d.os_mb, COUNT(1) num_disks

      FROM  v$asm_diskgroup dg, v$asm_disk d

      WHERE dg.group_number = d.group_number

      GROUP BY dg.name, d.total_mb, d.os_mb;

NAME                             TOTAL_MB      OS_MB  NUM_DISKS

—————————— ———- ———- ———-

DATAC1                             692288     692288        168

RECOC1                             282672     282672        168

The results of the queries show that the RECOC1 and DATAC1 disk groups and disk have been resized.

Using the Oracle Exadata System Software Rescue Procedure

In the rare event that both system disks fail simultaneously, you must use the Oracle Exadata Storage Server rescue functionality provided on the Oracle Exadata System Software CELLBOOT USB flash drive.

About the Oracle Exadata System Software Rescue Procedure

The rescue procedure is necessary when system disks fail, the operating system has a corrupt file system, or there was damage to the boot area.

If only one system disk fails, then use CellCLI commands to recover.

If you are using normal redundancy, then there is only one mirror copy for the cell being rescued. The data may be irrecoverably lost if that single mirror also fails during the rescue procedure. Oracle recommends that you take a complete backup of the data on the mirror copy, and immediately take the mirror copy cell offline to prevent any new data changes to it prior to attempting a rescue. This ensures that all data residing on the grid disks on the failed cell and its mirror copy is inaccessible during rescue procedure.

The Oracle Automatic Storage Management (Oracle ASM) disk repair timer has a default repair time of 3.6 hours. If you know that you cannot perform the rescue procedure within that time frame, then you should use the Oracle ASM rebalance procedure to rebalance the disk until you can do the rescue procedure.

When using high redundancy disk groups, such as having more than one mirror copy in Oracle ASM for all the grid disks of the failed cell, then take the failed cell offline. Oracle ASM automatically drops the grid disks on the failed cell after the configured Oracle ASM time out, and starts rebalancing data using mirror copies. The default timeout is two hours. If the cell rescue takes more than two hours, then you must re-create the grid disks on the rescued cells in Oracle ASM.

It is important to note the following when using the rescue procedure:

  • The rescue procedure can potentially rewrite some or all of the disks in the cell. If this happens, then you can lose all the content on those disks without possibility of recovery.

Use extreme caution when using this procedure, and pay attention to the prompts. Ideally, you should use the rescue procedure only with assistance from Oracle Support Services, and when you have decided that you can afford the loss of data on some or all of the disks.

  • The rescue procedure does not destroy the contents of the data disks or the contents of the data partitions on the system disks unless you explicitly choose to do so during the rescue procedure.
  • Starting in Oracle Exadata System Software release 11.2, the rescue procedure restores the Oracle Exadata System Software to the same release. This includes any patches that existed on the cell as of the last successful boot. Note the following about using the rescue procedure:
    • Cell configuration information, such as alert configurations, SMTP information, administrator e-mail address, and so on is not restored.
    • The network configuration that existed at the end of last successful run of /usr/local/bin/ipconf utility is restored.
    • The SSH identities for the cell, and the root, celladmin and cellmonitor users are restored.
    • Integrated Lights Out Manager (ILOM) configurations for Oracle Exadata Storage Servers are not restored. Typically, ILOM configurations remain undamaged even in case of Oracle Exadata System Software failures.
  • The rescue procedure does not examine or reconstruct data disks or data partitions on the system disks. If there is data corruption on the grid disks, then do not use the rescue procedure. Instead use the rescue procedure for Oracle Database and Oracle ASM.

After a successful rescue, you must reconfigure the cell, and if you had chosen to preserve the data, then import the cell disks. If you chose not to preserve the data, then you should create new cell disks, and grid disks.

Performing Rescue Using the CELLBOOT USB Flash Drive

You can use the CELLBOOT USB flash drive to perform the rescue procedure.

  1. Connect to the Oracle Exadata Storage Server using the console.
  2. Start the Oracle Exadata Storage Server and enter the boot options menu.

During the initial boot sequence, you will see something like the following:

Press any key to enter the menu

Booting Exadata_DBM_0: CELL_USB_BOOT_trying_C0D0_as_HD1 in 4 seconds…

Booting Exadata_DBM_0: CELL_USB_BOOT_trying_C0D0_as_HD1 in 3 seconds…

Press any key to see the menu.

Note that for old versions of Oracle Exadata System Software, you may see the “Oracle Exadata” splash screen. If the splash screen appears, press any key on the keyboard. The splash screen remains visible for only 5 seconds.

  1. In the list of boot options, scroll down to the last option, CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode, and then press Enter.
  2. When prompted, select the option to reinstall the Oracle Exadata System Software. Then, confirm your selection.

For example:

         Choose from the following by typing letter in ‘()’:

           (e)nter interactive diagnostics shell.

             Use diagnostics shell password to login as root user

             (reboot or power cycle to exit the shell),

           (r)einstall or try to recover damaged system,

Select: r

[INFO     ] Reinstall or try to recover damaged system

Continue (y/n) [n]: y

  1. If prompted, specify the rescue root password.

If you do not have the required password, then contact Oracle Support Services.

  1. When prompted, specify whether you want to erase the data partitions and data disks.

Specify n to preserve existing data on the storage server.

If you specify y, you will permanently erase all of the data on the storage server. Do not specify this option unless you are sure that it is safe.

For example:

Do you want to erase data partitions and data disks (y/n)  [n]: n

  1. If prompted, specify the root password.

If you do not have the required password, then contact Oracle Support Services.

You should now see a message and shell prompt indicating that you are in rescue mode. For example:

======================= NOTE =================================

=                                                            =

= –– YOU ARE IN RESCUE MODE AFTER FIRST PHASE OF RESCUE –– =

= Imaging pre-boot phase finished with success.              =

= Execute reboot to continue installation.                   =

=                                                            =

==============================================================

-sh-4.1#

  1. Using the rescue prompt, reboot the storage server to complete the rescue process.

For example:

-sh-4.1# shutdown -r now

The rescue process typically takes between 45 and 90 minutes to complete. The storage server may reboot a few times during the rescue process. An on-screen message indicates when the rescue process is completed. For example:

Run validation checkconfigs – PASSED

2020-08-17 18:14:01 -0600 The first boot completed with SUCCESS