SQL Server In-Memory OLTP Overview -2

Application Pattern for Partitioning Memory-Optimized Tables

In-Memory OLTP supports an application design pattern that lavishes performance resources on relatively current data. This pattern can apply when current data is read or updated far more frequently than older data is. In this case, we say the current data is active or hot, and the older data is cold.

The main idea is to store hot data in a memory-optimized table. On a weekly or monthly basis, older data that has become cold is moved to a partitioned table. The partitioned table has its data stored on a disk or other hard drive, not in memory.

Typically, this design uses a datetime key to enable the move process to efficiently distinguish between hot versus cold data.

Advanced partitioning

The design intends to mimic having a partitioned table that also has one memory-optimized partition. For this design to work, you must ensure that the tables all share a common schema. The code sample later in this article shows the technique.

New data is presumed to be hot by definition. Hot data is inserted and updated in the memory-optimized table. Cold data is maintained in the traditional partitioned table. Periodically, a stored procedure adds a new partition. The partition contains the latest cold data that has been moved out of the memory-optimized table.

If an operation needs only hot data, it can use natively compiled stored procedures to access the data. Operations that might access hot or cold data must use interpreted Transact-SQL, to join the memory-optimized table with the partitioned table.

Add a partition

Data that has recently become cold must be moved into the partitioned table. The steps for this periodic partition swap are as follows:

  1. For the data in the memory-optimized table, determine the datetime that is the boundary or cutoff between hot versus newly cold data.
  2. Insert the newly cold data, from the In-Memory OLTP table, into a cold_staging table.
  3. Delete the same cold data from the memory-optimized table.
  4. Swap the cold_staging table into a partition.
  5. Add the partition.

 

Maintenance window

One of the preceding steps is to delete the newly cold data from the memory-optimized table. There is a time interval between this deletion and the final step that adds the new partition. During this interval, any application that attempts to read the newly cold data will fail.

Code Sample

The following Transact-SQL sample is displayed in a series of smaller code blocks, only for the ease of presentation. You could append them all into one large code block for your testing.

As a whole, the T-SQL sample shows how to use a memory-optimized table with a partitioned disk-based table.

The first phases of the T-SQL sample create the database, and then create objects such as tables in the database. Later phases show how to move data from a memory-optimized table into a partitioned table.

Create a database

This section of the T-SQL sample creates a test database. The database is configured to support both memory-optimized tables and partitioned tables.

SQL

CREATE DATABASE PartitionSample;
GO
 
-- Add a FileGroup, enabled for In-Memory OLTP.
-- Change file path as needed.
 
ALTER DATABASE PartitionSample
    ADD FILEGROUP PartitionSample_mod
    CONTAINS MEMORY_OPTIMIZED_DATA;
 
ALTER DATABASE PartitionSample
    ADD FILE(
        NAME = 'PartitionSample_mod',
        FILENAME = 'c:\data\PartitionSample_mod')
    TO FILEGROUP PartitionSample_mod;
GO

Create a memory-optimized table for hot data

This section creates the memory-optimized table that holds the latest data, which is mostly still hot data.

SQL

USE PartitionSample;
GO
 
-- Create a memory-optimized table for the HOT Sales Order data.
-- Notice the index that uses datetime2.
 
CREATE TABLE dbo.SalesOrders_hot (
   so_id INT IDENTITY PRIMARY KEY NONCLUSTERED,
   cust_id INT NOT NULL,
   so_date DATETIME2 NOT NULL INDEX ix_date NONCLUSTERED,
   so_total MONEY NOT NULL,
   INDEX ix_date_total NONCLUSTERED (so_date desc, so_total desc)
) WITH (MEMORY_OPTIMIZED=ON);
GO

Create a partitioned table for cold data

This section creates the partitioned table that holds the cold data.

SQL

-- Create a partition and table for the COLD Sales Order data.
-- Notice the index that uses datetime2.
 
CREATE PARTITION FUNCTION [ByDatePF](datetime2) AS RANGE RIGHT
   FOR VALUES();
GO
 
CREATE PARTITION SCHEME [ByDateRange]
   AS PARTITION [ByDatePF]
   ALL TO ([PRIMARY]);
GO
 
CREATE TABLE dbo.SalesOrders_cold (
   so_id INT NOT NULL,
   cust_id INT NOT NULL,
   so_date DATETIME2 NOT NULL,
   so_total MONEY NOT NULL,
   CONSTRAINT PK_SalesOrders_cold PRIMARY KEY (so_id, so_date),
   INDEX ix_date_total NONCLUSTERED (so_date desc, so_total desc)
) ON [ByDateRange](so_date);
GO

Create a table to store cold data during move

This section creates the cold_staging table. A view that unions the hot and cold data from the two tables is also created.

SQL

-- A table used to briefly stage the newly cold data, during moves to a partition.
 
CREATE TABLE dbo.SalesOrders_cold_staging (
   so_id INT NOT NULL,
   cust_id INT NOT NULL,
   so_date datetime2 NOT NULL,
   so_total MONEY NOT NULL,
   CONSTRAINT PK_SalesOrders_cold_staging PRIMARY KEY (so_id, so_date),
   INDEX ix_date_total NONCLUSTERED (so_date desc, so_total desc),
   CONSTRAINT CHK_SalesOrders_cold_staging CHECK (so_date >= '1900-01-01')
);
GO
 
-- A view, for retrieving the aggregation of hot plus cold data.
 
CREATE VIEW dbo.SalesOrders
AS SELECT so_id,
          cust_id,
          so_date,
          so_total,
          1 AS 'is_hot'
       FROM dbo.SalesOrders_hot
   UNION ALL
   SELECT so_id,
          cust_id,
          so_date,
          so_total,
          0 AS 'is_cold'
       FROM dbo.SalesOrders_cold;
GO

Create the stored procedure

This section creates the stored procedure that you run periodically. The procedure moves newly cold data from the memory-optimized table into the partitioned table.

SQL

-- A stored procedure to move all newly cold sales orders data
-- to its staging location.
 
CREATE PROCEDURE dbo.usp_SalesOrdersOffloadToCold @splitdate datetime2
   AS
   BEGIN
      BEGIN TRANSACTION;
 
      -- Insert the cold data as a temporary heap.
      INSERT INTO dbo.SalesOrders_cold_staging WITH (TABLOCKX)
      SELECT so_id , cust_id , so_date , so_total
         FROM dbo.SalesOrders_hot WITH (serializable)
         WHERE so_date <= @splitdate;
 
      -- Delete the moved data from the hot table.
      DELETE FROM dbo.SalesOrders_hot WITH (SERIALIZABLE)
         WHERE so_date <= @splitdate;
 
      -- Update the partition function, and switch in the new partition.
      ALTER PARTITION SCHEME [ByDateRange] NEXT USED [PRIMARY];
 
      DECLARE @p INT = (
        SELECT MAX(partition_number)
            FROM sys.partitions
            WHERE object_id = OBJECT_ID('dbo.SalesOrders_cold'));
 
      EXEC sp_executesql
        N'ALTER TABLE dbo.SalesOrders_cold_staging
            SWITCH TO dbo.SalesOrders_cold partition @i',
        N'@i int',
        @i = @p;
 
      ALTER PARTITION FUNCTION [ByDatePF]()
      SPLIT RANGE( @splitdate);
 
      -- Modify a constraint on the cold_staging table, to align with new partition.
      ALTER TABLE dbo.SalesOrders_cold_staging
         DROP CONSTRAINT CHK_SalesOrders_cold_staging;
 
      DECLARE @s nvarchar( 100) = CONVERT( nvarchar( 100) , @splitdate , 121);
      DECLARE @sql nvarchar( 1000) = N'alter table dbo.SalesOrders_cold_staging 
         add constraint CHK_SalesOrders_cold_staging check (so_date > ''' + @s + ''')';
      PRINT @sql;
      EXEC sp_executesql @sql;
 
      COMMIT;
END;
GO

Prepare sample data, and demo the stored procedure

This section generates and inserts sample data, and then runs the stored procedure as a demonstration.

SQL

-- Insert sample values into the hot table.
INSERT INTO dbo.SalesOrders_hot VALUES(1,SYSDATETIME(), 1);
GO
INSERT INTO dbo.SalesOrders_hot VALUES(1, SYSDATETIME(), 1);
GO
INSERT INTO dbo.SalesOrders_hot VALUES(1, SYSDATETIME(), 1);
GO
 
-- Verify that the hot data is in the table, by selecting from the view.
SELECT * FROM dbo.SalesOrders;
GO
 
-- Treat all data in the hot table as cold data:
-- Run the stored procedure, to move (offload) all sales orders to date to cold storage.
DECLARE @t datetime2 = SYSDATETIME();
EXEC dbo.usp_SalesOrdersOffloadToCold @t;
 
-- Again, read hot plus cold data from the view.
SELECT * FROM dbo.SalesOrders;
GO
 
-- Retrieve the name of every partition.
SELECT OBJECT_NAME( object_id) , * FROM sys.dm_db_partition_stats ps
   WHERE object_id = OBJECT_ID( 'dbo.SalesOrders_cold');
 
-- Insert more data into the hot table.
INSERT INTO dbo.SalesOrders_hot VALUES(2, SYSDATETIME(), 1);
GO
INSERT INTO dbo.SalesOrders_hot VALUES(2, SYSDATETIME(), 1);
GO
INSERT INTO dbo.SalesOrders_hot VALUES(2, SYSDATETIME(), 1);
GO
 
-- Read hot plus cold data from the view.
SELECT * FROM dbo.SalesOrders;
GO
 
-- Again, run the stored procedure, to move all sales orders to date to cold storage.
DECLARE @t datetime2 = SYSDATETIME();
EXEC dbo.usp_SalesOrdersOffloadToCold @t;
 
-- Read hot plus cold data from the view.
SELECT * FROM dbo.SalesOrders;
GO
 
-- Again, retrieve the name of every partition.
-- The stored procedure can modify the partitions.
SELECT OBJECT_NAME( object_id) , partition_number , row_count
  FROM sys.dm_db_partition_stats ps
  WHERE object_id = OBJECT_ID( 'dbo.SalesOrders_cold')
    AND index_id = 1;

Drop all the demo objects

Remember to clean the demo test database off of your test system.

SQL

-- You must first leave the context of the PartitionSample database.
 
-- USE <A-Database-Name-Here>;
GO
 
DROP DATABASE PartitionSample;
GO
 

Implementing MERGE Functionality in a Natively Compiled Stored Procedure

The Transact-SQL code sample in this section demonstrates how you can simulate the T-SQL MERGE statement in a natively compiled module. The sample uses a table variable with an identity column, iterates over the rows in the table variable, and for each row performs the update if the condition matches, and an insert if the condition does not match.

Here is the T-SQL MERGE statement that you wish was supported inside a native proc, and that the code sample simulates.

SQL

MERGE INTO dbo.Table1 t 

    USING @tvp v 

    ON t.Column1 = v.c1 

    WHEN MATCHED THEN  

        UPDATE SET Column2 = v.c2 

    WHEN NOT MATCHED THEN 

        INSERT (Column1, Column2) VALUES (v.c1, v.c2); 

Here is the T-SQL to achieve the workaround and simulate MERGE.

SQL

DROP PROCEDURE IF EXISTS dbo.usp_merge1; 

go 

DROP TYPE IF EXISTS dbo.Type1; 

go 

DROP TABLE IF EXISTS dbo.Table1; 

go 

—————————– 

— target table and table type used for the workaround

—————————– 

CREATE TABLE dbo.Table1 

    Column1  INT  NOT NULL  PRIMARY KEY NONCLUSTERED, 

    Column2  INT  NOT NULL 

)  

    WITH (MEMORY_OPTIMIZED = ON); 

go 

CREATE TYPE dbo.Type1 AS TABLE 

    c1  INT  NOT NULL, 

    c2  INT  NOT NULL, 

    RowID    INT  NOT NULL  IDENTITY(1,1), 

    INDEX ix_RowID HASH (RowID) WITH (BUCKET_COUNT=1024) 

)  

    WITH (MEMORY_OPTIMIZED = ON); 

go 

—————————– 

— stored procedure implementing the workaround

—————————– 

CREATE PROCEDURE dbo.usp_merge1  

    @tvp1 dbo.Type1 READONLY 

    WITH 

    NATIVE_COMPILATION, SCHEMABINDING 

AS  

BEGIN ATOMIC 

    WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, 

            LANGUAGE = N’us_english’) 

    DECLARE  @i INT = 1,  @c1 INT,  @c2 INT; 

    WHILE @i > 0 

    BEGIN 

        SELECT @c1 = c1, @c2 = c2 

            FROM @tvp1 

            WHERE RowID = @i; 

        –test whether the row exists in the TVP; if not, we end the loop

        IF @@ROWCOUNT=0 

            SET @i = 0

        ELSE

        BEGIN

            — try the update

            UPDATE dbo.Table1 

                SET   Column2 = @c2 

                WHERE Column1 = @c1; 

            — if there was no row to update, we insert

            IF @@ROWCOUNT=0 

                INSERT INTO dbo.Table1 (Column1, Column2) 

                    VALUES (@c1, @c2); 

            SET @i += 1

        END

    END 

END 

go 

—————————– 

— test to validate the functionality

—————————– 

INSERT dbo.Table1 VALUES (1,2); 

go 

SELECT N’Before-MERGE’ AS [Before-MERGE], Column1, Column2 

    FROM dbo.Table1; 

go 

DECLARE @tvp1 dbo.Type1; 

INSERT @tvp1 (c1, c2) VALUES (1,33), (2,4); 

EXECUTE dbo.usp_merge1 @tvp1; 

go 

SELECT N’After–MERGE’ AS [After–MERGE], Column1, Column2 

    FROM dbo.Table1; 

go 

Console

    /****  Actual output: 

    Before-MERGE   Column1   Column2 

    Before-MERGE      1         2 

    After–MERGE   Column1   Column2 

    After–MERGE      1        33 

    After–MERGE      2         4 

    ****/ 

SQL Server Management Objects Support for In-Memory OLTP

This topic describes items in SQL Server Management Objects (SMO) that support In-Memory OLTP.

SMO types and members

The following types and members are in the namespace Microsoft.SqlServer.Management.Smo, and they support In-Memory OLTP:

  • DurabilityType (enumeration)
  • FileGroup.FileGroupType (property)
  • FileGroup.FileGroup (constructor)
  • FileGroupType (enumeration)
  • Index.BucketCount (property)
  • IndexType.NonClusteredHashIndex (enumeration member)
  • Index.IsMemoryOptimized (property)
  • Server.IsXTPSupported (property)
  • StoredProcedure.IsNativelyCompiled (property)
  • StoredProcedure.IsSchemaBound (property)
  • Table.Durability (property)
  • Table.IsMemoryOptimized (property)
  • UserDefinedTableType.IsMemoryOptimized (property)

C# code example

Assemblies referenced by the compiled code example

  • Microsoft.SqlServer.ConnectionInfo.dll
  • Microsoft.SqlServer.Management.Sdk.Sfc.dll
  • Microsoft.SqlServer.Smo.dll
  • Microsoft.SqlServer.SqlEnum.dll

Actions taken in the code example

  1. Create a database with memory-optimized filegroup and memory-optimized file.
  2. Create a durable memory-optimized table with a primary key, nonclustered index, and a nonclustered hash index.
  3. Create columns and indexes.
  4. Create a user-defined memory-optimized table type.
  5. Create a natively compiled stored procedure.

Source code

C#

using Microsoft.SqlServer.Management.Smo;  
using System;  
  
public class A
   static void Main(string[] args)
      Server server = new Server("(local)");  
  
      // Create a database with memory-optimized filegroup and memory-optimized file.
      Database db = new Database(server, "MemoryOptimizedDatabase");  
      db.Create();  
      FileGroup fg = new FileGroup(
         db,
         "memOptFilegroup",
         FileGroupType.MemoryOptimizedDataFileGroup);  
      db.FileGroups.Add(fg);  
      fg.Create();  
      // Change this path if needed.
      DataFile file = new DataFile(
         fg,
         "memOptFile",
         @"C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\MSSQLmemOptFileName");  
      file.Create();  
  
      // Create a durable memory-optimized table with primary key, nonclustered index and nonclustered hash index.
      // Define the table as memory optimized and set the durability.
      Table table = new Table(db, "memOptTable");  
      table.IsMemoryOptimized = true
      table.Durability = DurabilityType.SchemaAndData;  
  
      // Create columns.
      Column col1 = new Column(table, "col1", DataType.Int);  
      col1.Nullable = false
      table.Columns.Add(col1);  
      Column col2 = new Column(table, "col2", DataType.Float);  
      col2.Nullable = false
      table.Columns.Add(col2);  
      Column col3 = new Column(table, "col3", DataType.Decimal(2, 10));  
      col3.Nullable = false
      table.Columns.Add(col3);  
  
      // Create indexes.
      Index pk = new Index(table, "PK_memOptTable");  
      pk.IndexType = IndexType.NonClusteredIndex;  
      pk.IndexKeyType = IndexKeyType.DriPrimaryKey;  
      pk.IndexedColumns.Add(new IndexedColumn(pk, col1.Name));  
      table.Indexes.Add(pk);  
  
      Index ixNonClustered = new Index(table, "ix_nonClustered");  
      ixNonClustered.IndexType = IndexType.NonClusteredIndex;  
      ixNonClustered.IndexKeyType = IndexKeyType.None;  
      ixNonClustered.IndexedColumns.Add(
         new IndexedColumn(ixNonClustered, col2.Name));  
      table.Indexes.Add(ixNonClustered);  
  
      Index ixNonClusteredHash = new Index(table, "ix_nonClustered_Hash");  
      ixNonClusteredHash.IndexType = IndexType.NonClusteredHashIndex;  
      ixNonClusteredHash.IndexKeyType = IndexKeyType.None;  
      ixNonClusteredHash.BucketCount = 1024
      ixNonClusteredHash.IndexedColumns.Add(
         new IndexedColumn(ixNonClusteredHash, col3.Name));  
      table.Indexes.Add(ixNonClusteredHash);  
  
      table.Create();  
  
      // Create a user-defined memory-optimized table type.
      UserDefinedTableType uDTT = new UserDefinedTableType(db, "memOptUDTT");  
      uDTT.IsMemoryOptimized = true
  
      // Add columns.
      Column udTTCol1 = new Column(uDTT, "udtCol1", DataType.Int);  
      udTTCol1.Nullable = false
      uDTT.Columns.Add(udTTCol1);  
      Column udTTCol2 = new Column(uDTT, "udtCol2", DataType.Float);  
      udTTCol2.Nullable = false
      uDTT.Columns.Add(udTTCol2);  
      Column udTTCol3 = new Column(uDTT, "udtCol3", DataType.Decimal(2, 10));  
      udTTCol3.Nullable = false
      uDTT.Columns.Add(udTTCol3);  
  
      // Add index.
      Index ix = new Index(uDTT, "IX_UDT");  
      ix.IndexType = IndexType.NonClusteredHashIndex;  
      ix.BucketCount = 1024
      ix.IndexKeyType = IndexKeyType.DriPrimaryKey;  
      ix.IndexedColumns.Add(new IndexedColumn(ix, udTTCol1.Name));  
      uDTT.Indexes.Add(ix);  
  
      uDTT.Create();  
  
      // Create a natively compiled stored procedure.
      StoredProcedure sProc = new StoredProcedure(db, "nCSProc");  
      sProc.TextMode = false
      sProc.TextBody = "--Type body here"
      sProc.IsNativelyCompiled = true
      sProc.IsSchemaBound = true
      sProc.ExecutionContext = ExecutionContext.Owner;  
      sProc.Create();  
   }  

Altering Memory-Optimized Tables

Schema and index changes on memory-optimized tables can be performed by using the ALTER TABLE statement. In SQL Server 2016 and Azure SQL Database ALTER TABLE operations on memory-optimized tables are OFFLINE, meaning that the table is not available for querying while the operation is in progress. The database application can continue to run, and any operation that is accessing the table is blocked until the alteration process is completed. It is possible to combine multiple ADD, DROP or ALTER operations in a single ALTER TABLE statement.

ALTER TABLE

The ALTER TABLE syntax is used for making changes to the table schema, as well as for adding, deleting, and rebuilding indexes. Indexes are considered part of the table definition:

  • The syntax ALTER TABLE … ADD/DROP/ALTER INDEX is supported only for memory-optimized tables.
  • Without using an ALTER TABLE statement, the statements CREATE INDEX, DROP INDEX, ALTER INDEX, and PAD_INDEX are not supported for indexes on memory-optimized tables.

The following types of alterations are supported:

  • Changing the bucket count
  • Adding and removing an index
  • Changing, adding and removing a column
  • Adding and removing a constraint

Schema-bound Dependency

Natively compiled stored procedures are required to be schema-bound, meaning they have a schema-bound dependency on the memory optimized tables they access, and the columns they reference. A schema-bound dependency is a relationship between two entities that prevents the referenced entity from being dropped or incompatibly modified as long as the referencing entity exists.

For example, if a schema-bound natively compiled stored procedure references a column c1 from table mytable, column c1 cannot be dropped. Similarly, if there is such a procedure with an INSERT statement without column list (e.g., INSERT INTO dbo.mytable VALUES (…)), then no column in the table can be dropped.

Logging of ALTER TABLE on memory-optimized tables

On a memory-optimized table, most ALTER TABLE scenarios now run in parallel and result in an optimization of writes to the transaction log. The optimization is achieved by only logging the metadata changes to the transaction log. However, the following ALTER TABLE operations run single-threaded and are not log-optimized.

The single-threaded operation in this case would log the entire content of the altered table to the transaction log. A list of single-threaded operations follows:

  • Alter or add a column to use a large object (LOB) type: nvarchar(max), varchar(max), or varbinary(max).
  • Add or drop a COLUMNSTORE index.
  • Almost anything that affects an off-row column.
    • Cause an on-row column to move off-row.
    • Cause an off-row column to move on-row.
    • Create a new off-row column.
    • Exception: Lengthening an already off-row column is logged in the optimized way.

Examples

The following example alters the bucket count of an existing hash index. This rebuilds the hash index with the new bucket count while other properties of the hash index remain the same.

SQL

ALTER TABLE Sales.SalesOrderDetail_inmem

       ALTER INDEX imPK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 

              REBUILD WITH (BUCKET_COUNT=67108864); 

GO

The following example adds a column with a NOT NULL constraint and with a DEFAULT definition, and uses WITH VALUES to provide values for each existing row in the table. If WITH VALUES is not used, each row has the value NULL in the new column.

SQL

ALTER TABLE Sales.SalesOrderDetail_inmem 

       ADD Comment NVARCHAR(100) NOT NULL DEFAULT N” WITH VALUES; 

GO

The following example adds a primary key constraint to an existing column.

SQL

CREATE TABLE dbo.UserSession (

   SessionId int not null,

   UserId int not null,

   CreatedDate datetime2 not null,

   ShoppingCartId int,

   index ix_UserId nonclustered hash (UserId) with (bucket_count=400000)

)

WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY) ; 

GO 

ALTER TABLE dbo.UserSession 

       ADD CONSTRAINT PK_UserSession PRIMARY KEY NONCLUSTERED (SessionId); 

GO

The following example removes an index.

SQL

ALTER TABLE Sales.SalesOrderDetail_inmem 

       DROP INDEX ix_ModifiedDate; 

GO

The following example adds an index.

SQL

ALTER TABLE Sales.SalesOrderDetail_inmem 

       ADD INDEX ix_ModifiedDate (ModifiedDate); 

GO 

The following example adds multiple columns, with an index and constraints.

SQL

ALTER TABLE Sales.SalesOrderDetail_inmem 

       ADD    CustomerID int NOT NULL DEFAULT -1 WITH VALUES, 

              ShipMethodID int NOT NULL DEFAULT -1 WITH VALUES, 

              INDEX ix_Customer (CustomerID); 

GO

Troubleshooting Hash Indexes for Memory-Optimized Tables

When creating a hash index for a memory-optimized table, the number of buckets needs to be specified at create time. In most cases the bucket count would ideally be between 1 and 2 times the number of distinct values in the index key.

However, even if the BUCKET_COUNT is moderately below or above the preferred range, the performance of your hash index is likely to be tolerable or acceptable. At minimum, consider giving your hash index a BUCKET_COUNT roughly equal to the number of rows you predict your memory-optimized table will grow to have.
Suppose your growing table has 2,000,000 rows, but the prediction is it will grow 10 times to 20,000,000 rows. Start with a bucket count that is 10 times the number of rows in the table. This gives you room for an increased quantity of rows.

  • Ideally you would increase the bucket count when the quantity of rows reaches the initial bucket count.
  • Even if the quantity of rows grows to 5 times larger than the bucket count, the performance is still good in most situations.

Suppose a hash index has 10,000,000 distinct key values.

  • A bucket count of 2,000,000 would be about as low as you could accept. The degree of performance degradation could be tolerable.

Too many duplicate values in the index?

If the hash indexed values have a high rate of duplicates, the hash buckets suffer longer chains.

Assume you have the same SupportEvent table from the earlier T-SQL syntax code block. The following T-SQL code demonstrates how you can find and display the ratio of all values to unique values:

SQL

— Calculate ratio of:  Rows / Unique_Values. 

DECLARE @allValues float(8) = 0.0, @uniqueVals float(8) = 0.0; 

SELECT @allValues = Count(*) FROM SupportEvent; 

SELECT @uniqueVals = Count(*) FROM 

  (SELECT DISTINCT SupportEngineerName 

      FROM SupportEvent) as d; 

    — If (All / Unique) >= 10.0, use a nonclustered index, not a hash.  

SELECT Cast((@allValues / @uniqueVals) as float) as [All_divby_Unique]; 

go 

  • A ratio of 10.0 or higher means a hash would be a poor type of index. Consider using a nonclustered index instead,

Troubleshooting hash index bucket count

This section discusses how to troubleshoot the bucket count for your hash index.

Monitor statistics for chains and empty buckets

You can monitor the statistical health of your hash indexes by running the following T-SQL SELECT. The SELECT uses the data management view (DMV) named sys.dm_db_xtp_hash_index_stats.

SQL

SELECT 

  QUOTENAME(SCHEMA_NAME(t.schema_id)) + N’.’ + QUOTENAME(OBJECT_NAME(h.object_id)) as [table],  

  i.name                   as [index],  

  h.total_bucket_count, 

  h.empty_bucket_count, 

  FLOOR(( 

    CAST(h.empty_bucket_count as float) / 

      h.total_bucket_count) * 100) 

                            as [empty_bucket_percent], 

  h.avg_chain_length,  

  h.max_chain_length 

FROM 

        sys.dm_db_xtp_hash_index_stats  as h  

  JOIN sys.indexes                     as i 

          ON h.object_id = i.object_id 

          AND h.index_id  = i.index_id 

JOIN sys.memory_optimized_tables_internal_attributes ia ON h.xtp_object_id=ia.xtp_object_id

JOIN sys.tables t on h.object_id=t.object_id

WHERE ia.type=1

ORDER BY [table], [index]; 

Compare the SELECT results to the following statistical guidelines:

  • Empty buckets:
    • 33% is a good target value, but a larger percentage (even 90%) is usually fine.
    • When the bucket count equals the number of distinct key values, approximately 33% of the buckets are empty.
    • A value below 10% is too low.
  • Chains within buckets:
    • An average chain length of 1 is ideal in case there are no duplicate index key values. Chain lengths up to 10 are usually acceptable.
    • If the average chain length is greater than 10, and the empty bucket percent is greater than 10%, the data has so many duplicates that a hash index might not be the most appropriate type.

Demonstration of chains and empty buckets

The following T-SQL code block gives you an easy way to test a SELECT * FROM sys.dm_db_xtp_hash_index_stats;. The code block completes in 1 minute. Here are the phases of the following code block:

  1. Creates a memory-optimized table that has a few hash indexes.
  2. Populates the table with thousands of rows.
    a. A modulo operator is used to configure the rate of duplicate values in the StatusCode column.
    b. The loop inserts 262,144 rows in approximately 1 minute.
  3. PRINTs a message asking you to run the earlier SELECT from sys.dm_db_xtp_hash_index_stats.

SQL

DROP TABLE IF EXISTS SalesOrder_Mem; 

go 

CREATE TABLE SalesOrder_Mem 

  SalesOrderId   uniqueidentifier  NOT NULL  DEFAULT newid(), 

  OrderSequence  int               NOT NULL, 

  OrderDate      datetime2(3)      NOT NULL, 

  StatusCode     tinyint           NOT NULL, 

  PRIMARY KEY NONCLUSTERED 

      HASH (SalesOrderId)  WITH (BUCKET_COUNT = 262144), 

  INDEX ix_OrderSequence 

      HASH (OrderSequence) WITH (BUCKET_COUNT = 20000), 

  INDEX ix_StatusCode 

      HASH (StatusCode)    WITH (BUCKET_COUNT = 8), 

  INDEX ix_OrderDate       NONCLUSTERED (OrderDate DESC) 

  WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) 

go 

——————– 

SET NOCOUNT ON; 

— Same as PK bucket_count.  68 seconds to complete. 

DECLARE @i int = 262144; 

BEGIN TRANSACTION; 

WHILE @i > 0 

BEGIN 

  INSERT SalesOrder_Mem 

      (OrderSequence, OrderDate, StatusCode) 

    Values 

      (@i, GetUtcDate(), @i % 8);  — Modulo technique. 

  SET @i -= 1; 

END 

COMMIT TRANSACTION; 

PRINT ‘Next, you should query:  sys.dm_db_xtp_hash_index_stats .’; 

go 

The preceding INSERT loop does the following:

  • Inserts unique values for the primary key index, and for ix_OrderSequence.
  • Inserts a couple hundred thousands rows which represent only 8 distinct values for StatusCode. Therefore there is a high rate of value duplication in index ix_StatusCode.

For troubleshooting when the bucket count is not optimal, examine the following output of the SELECT from sys.dm_db_xtp_hash_index_stats. For these results we added WHERE Object_Name(h.object_id) = ‘SalesOrder_Mem’ to the SELECT copied from section D.1.

Our SELECT results are displayed after the code, artificially split into two narrower results tables for better display.

  • Here are the results for bucket count.
IndexNametotal_bucket_countempty_bucket_countEmptyBucketPercent
ix_OrderSequence32768130
ix_StatusCode8450
PK_SalesOrd_B14003…2621449652536
  • Next are the results for chain length.
IndexNameavg_chain_lengthmax_chain_length
ix_OrderSequence826
ix_StatusCode6553665536
PK_SalesOrd_B14003…18

Let us interpret the preceding results tables for the three hash indexes:

ix_StatusCode:

  • 50% of the buckets are empty, which is good.
  • However, the average chain length is very high at 65536.
    • This indicates a high rate of duplicate values.
    • Therefore, using a hash index is not appropriate in this case. A nonclustered index should be used instead.

ix_OrderSequence:

  • 0% of the buckets are empty, which is too low.
  • The average chain length is 8, even though all values in this index are unique.
    • Therefore the bucket count should be increased, to reduce the average chain length closer to 2 or 3.
  • Because the index key has 262144 unique values, the bucket count should be at least 262144.
    • If future growth is expected, the bucket count should be higher.

Primary key index (PK_SalesOrd_…):

  • 36% of the buckets are empty, which is good.
  • The average chain length is 1, which is also good. No change is needed.

Balancing the trade-off

OLTP workloads focus on individual rows. Full table scans are not usually in the performance critical path for OLTP workloads. Therefore, the trade-off you must balance is between quantity of memory utilization versus performance of equality tests and insert operations.