Application Pattern for Partitioning Memory-Optimized Tables
In-Memory OLTP supports an application design pattern that lavishes performance resources on relatively current data. This pattern can apply when current data is read or updated far more frequently than older data is. In this case, we say the current data is active or hot, and the older data is cold.
The main idea is to store hot data in a memory-optimized table. On a weekly or monthly basis, older data that has become cold is moved to a partitioned table. The partitioned table has its data stored on a disk or other hard drive, not in memory.
Typically, this design uses a datetime key to enable the move process to efficiently distinguish between hot versus cold data.
Advanced partitioning
The design intends to mimic having a partitioned table that also has one memory-optimized partition. For this design to work, you must ensure that the tables all share a common schema. The code sample later in this article shows the technique.
New data is presumed to be hot by definition. Hot data is inserted and updated in the memory-optimized table. Cold data is maintained in the traditional partitioned table. Periodically, a stored procedure adds a new partition. The partition contains the latest cold data that has been moved out of the memory-optimized table.
If an operation needs only hot data, it can use natively compiled stored procedures to access the data. Operations that might access hot or cold data must use interpreted Transact-SQL, to join the memory-optimized table with the partitioned table.
Add a partition
Data that has recently become cold must be moved into the partitioned table. The steps for this periodic partition swap are as follows:
- For the data in the memory-optimized table, determine the datetime that is the boundary or cutoff between hot versus newly cold data.
- Insert the newly cold data, from the In-Memory OLTP table, into a cold_staging table.
- Delete the same cold data from the memory-optimized table.
- Swap the cold_staging table into a partition.
- Add the partition.
Maintenance window
One of the preceding steps is to delete the newly cold data from the memory-optimized table. There is a time interval between this deletion and the final step that adds the new partition. During this interval, any application that attempts to read the newly cold data will fail.
Code Sample
The following Transact-SQL sample is displayed in a series of smaller code blocks, only for the ease of presentation. You could append them all into one large code block for your testing.
As a whole, the T-SQL sample shows how to use a memory-optimized table with a partitioned disk-based table.
The first phases of the T-SQL sample create the database, and then create objects such as tables in the database. Later phases show how to move data from a memory-optimized table into a partitioned table.
Create a database
This section of the T-SQL sample creates a test database. The database is configured to support both memory-optimized tables and partitioned tables.
SQL
CREATE DATABASE PartitionSample; GO
-- Add a FileGroup, enabled for In-Memory OLTP. -- Change file path as needed.
ALTERDATABASE
PartitionSample
ADD
FILEGROUP PartitionSample_mod
CONTAINS MEMORY_OPTIMIZED_DATA;
ALTERDATABASE
PartitionSample
ADD
FILE
(
NAME
=
'PartitionSample_mod',
FILENAME =
'c:\data\PartitionSample_mod')
TO
FILEGROUP PartitionSample_mod;
GO
Create a memory-optimized table for hot data
This section creates the memory-optimized table that holds the latest data, which is mostly still hot data.
SQL
USE PartitionSample; GO
-- Create a memory-optimized table for the HOT Sales Order data. -- Notice the index that uses datetime2.
CREATETABLE
dbo.SalesOrders_hot (
so_id
INTIDENTITY
PRIMARY
KEYNONCLUSTERED,
cust_id
INTNOT
NULL
,
so_date DATETIME2
NOTNULL
INDEX
ix_date NONCLUSTERED,
so_total MONEY
NOTNULL
,
INDEX
ix_date_total NONCLUSTERED (so_date
desc, so_total
desc)
)
WITH(MEMORY_OPTIMIZED=
ON);
GO
Create a partitioned table for cold data
This section creates the partitioned table that holds the cold data.
SQL
— Create a partition and table for the COLD Sales Order data.
— Notice the index that uses datetime2.
CREATEPARTITION
FUNCTION
[ByDatePF](datetime2)
ASRANGE
RIGHT
FOR
VALUES
();
GO
CREATEPARTITION
SCHEME [ByDateRange]
AS
PARTITION
[ByDatePF]
ALL
TO
([PRIMARY]);
GO
CREATETABLE
dbo.SalesOrders_cold (
so_id
INTNOT
NULL
,
cust_id
INTNOT
NULL
,
so_date DATETIME2
NOTNULL
,
so_total MONEY
NOTNULL
,
CONSTRAINT
PK_SalesOrders_cold PRIMARY
KEY(so_id, so_date),
INDEX
ix_date_total NONCLUSTERED (so_date
desc, so_total
desc)
)
ON[ByDateRange](so_date);
GO
Create a table to store cold data during move
This section creates the cold_staging table. A view that unions the hot and cold data from the two tables is also created.
SQL
-- A table used to briefly stage the newly cold data, during moves to a partition.
CREATETABLE
dbo.SalesOrders_cold_staging (
so_id
INTNOT
NULL
,
cust_id
INTNOT
NULL
,
so_date datetime2
NOTNULL
,
so_total MONEY
NOTNULL
,
CONSTRAINT
PK_SalesOrders_cold_staging PRIMARY
KEY(so_id, so_date),
INDEX
ix_date_total NONCLUSTERED (so_date
desc, so_total
desc),
CONSTRAINT
CHK_SalesOrders_cold_staging
CHECK(so_date >=
'1900-01-01')
); GO
-- A view, for retrieving the aggregation of hot plus cold data.
CREATEVIEW
dbo.SalesOrders
ASSELECT
so_id,
cust_id, so_date, so_total,1
AS
'is_hot'
FROM
dbo.SalesOrders_hot
UNION
ALL
SELECT
so_id,
cust_id, so_date, so_total,0
AS
'is_cold'
FROM
dbo.SalesOrders_cold;
GO
Create the stored procedure
This section creates the stored procedure that you run periodically. The procedure moves newly cold data from the memory-optimized table into the partitioned table.
SQL
-- A stored procedure to move all newly cold sales orders data -- to its staging location.
CREATEPROCEDURE
dbo.usp_SalesOrdersOffloadToCold @splitdate datetime2
AS
BEGIN
BEGIN
TRANSACTION
;
-- Insert the cold data as a temporary heap.
INSERT
INTO
dbo.SalesOrders_cold_staging
WITH(TABLOCKX)
SELECT
so_id , cust_id , so_date , so_total
FROM
dbo.SalesOrders_hot
WITH(
serializable)
WHERE
so_date <= @splitdate;
-- Delete the moved data from the hot table.
DELETE
FROM
dbo.SalesOrders_hot
WITH(
SERIALIZABLE)
WHERE
so_date <= @splitdate;
-- Update the partition function, and switch in the new partition.
ALTER
PARTITION
SCHEME [ByDateRange]
NEXTUSED [PRIMARY];
DECLARE
@p
INT= (
SELECT
MAX
(partition_number)
FROM
sys.partitions
WHERE
object_id = OBJECT_ID(
'dbo.SalesOrders_cold'));
EXEC sp_executesqlN'
ALTERTABLE
dbo.SalesOrders_cold_staging
SWITCH
TO
dbo.SalesOrders_cold
partition@i
', N'@i
int', @i = @p;
ALTER PARTITION FUNCTION [ByDatePF]() SPLIT RANGE( @splitdate);
-- Modify a constraint on the cold_staging table, to align with new partition.
ALTER TABLE dbo.SalesOrders_cold_staging
DROP CONSTRAINT CHK_SalesOrders_cold_staging;
DECLARE @s nvarchar( 100) = CONVERT( nvarchar( 100) , @splitdate , 121);
DECLARE @sql nvarchar( 1000) = N'altertable
dbo.SalesOrders_cold_staging
add
constraint
CHK_SalesOrders_cold_staging
check(so_date >
''' + @s + ''')
'; PRINT @sql; EXEC sp_executesql @sql; COMMIT; END; GO
Prepare sample data, and demo the stored procedure
This section generates and inserts sample data, and then runs the stored procedure as a demonstration.
SQL
-- Insert sample values into the hot table.
INSERTINTO
dbo.SalesOrders_hot
VALUES(
1,SYSDATETIME(),
1);
GO
INSERTINTO
dbo.SalesOrders_hot
VALUES(
1, SYSDATETIME(),
1);
GO
INSERTINTO
dbo.SalesOrders_hot
VALUES(
1, SYSDATETIME(),
1);
GO
-- Verify that the hot data is in the table, by selecting from the view.
SELECT*
FROMdbo.SalesOrders;
GO
-- Treat all data in the hot table as cold data:
-- Run the stored procedure, to move (offload) all sales orders to date to cold storage.
DECLARE @t datetime2 = SYSDATETIME();
EXEC dbo.usp_SalesOrdersOffloadToCold @t;
-- Again, read hot plus cold data from the view.
SELECT*
FROMdbo.SalesOrders;
GO
-- Retrieve the name of every partition.
SELECTOBJECT_NAME( object_id) , *
FROMsys.dm_db_partition_stats ps
WHERE
object_id = OBJECT_ID(
'dbo.SalesOrders_cold');
-- Insert more data into the hot table.
INSERTINTO
dbo.SalesOrders_hot
VALUES(
2, SYSDATETIME(),
1);
GO
INSERTINTO
dbo.SalesOrders_hot
VALUES(
2, SYSDATETIME(),
1);
GO
INSERTINTO
dbo.SalesOrders_hot
VALUES(
2, SYSDATETIME(),
1);
GO
-- Read hot plus cold data from the view.
SELECT*
FROMdbo.SalesOrders;
GO
-- Again, run the stored procedure, to move all sales orders to date to cold storage.
DECLARE @t datetime2 = SYSDATETIME();
EXEC dbo.usp_SalesOrdersOffloadToCold @t;
-- Read hot plus cold data from the view.
SELECT*
FROMdbo.SalesOrders;
GO
-- Again, retrieve the name of every partition.
-- The stored procedure can modify the partitions.
SELECTOBJECT_NAME( object_id) , partition_number ,
row_countFROM
sys.dm_db_partition_stats ps
WHERE
object_id = OBJECT_ID(
'dbo.SalesOrders_cold')
AND
index_id =
1;
Drop all the demo objects
Remember to clean the demo test database off of your test system.
SQL
-- You must first leave the context of the PartitionSample database.
-- USE <A-Database-Name-Here>; GO
DROPDATABASE
PartitionSample;
GO
Implementing MERGE Functionality in a Natively Compiled Stored Procedure
The Transact-SQL code sample in this section demonstrates how you can simulate the T-SQL MERGE statement in a natively compiled module. The sample uses a table variable with an identity column, iterates over the rows in the table variable, and for each row performs the update if the condition matches, and an insert if the condition does not match.
Here is the T-SQL MERGE statement that you wish was supported inside a native proc, and that the code sample simulates.
SQL
MERGE INTO dbo.Table1 t
USING @tvp v
ON t.Column1 = v.c1
WHEN MATCHED THEN
UPDATE SET Column2 = v.c2
WHEN NOT MATCHED THEN
INSERT (Column1, Column2) VALUES (v.c1, v.c2);
Here is the T-SQL to achieve the workaround and simulate MERGE.
SQL
DROP PROCEDURE IF EXISTS dbo.usp_merge1;
go
DROP TYPE IF EXISTS dbo.Type1;
go
DROP TABLE IF EXISTS dbo.Table1;
go
—————————–
— target table and table type used for the workaround
—————————–
CREATE TABLE dbo.Table1
(
Column1 INT NOT NULL PRIMARY KEY NONCLUSTERED,
Column2 INT NOT NULL
)
WITH (MEMORY_OPTIMIZED = ON);
go
CREATE TYPE dbo.Type1 AS TABLE
(
c1 INT NOT NULL,
c2 INT NOT NULL,
RowID INT NOT NULL IDENTITY(1,1),
INDEX ix_RowID HASH (RowID) WITH (BUCKET_COUNT=1024)
)
WITH (MEMORY_OPTIMIZED = ON);
go
—————————–
— stored procedure implementing the workaround
—————————–
CREATE PROCEDURE dbo.usp_merge1
@tvp1 dbo.Type1 READONLY
WITH
NATIVE_COMPILATION, SCHEMABINDING
AS
BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N’us_english’)
DECLARE @i INT = 1, @c1 INT, @c2 INT;
WHILE @i > 0
BEGIN
SELECT @c1 = c1, @c2 = c2
FROM @tvp1
WHERE RowID = @i;
–test whether the row exists in the TVP; if not, we end the loop
IF @@ROWCOUNT=0
SET @i = 0
ELSE
BEGIN
— try the update
UPDATE dbo.Table1
SET Column2 = @c2
WHERE Column1 = @c1;
— if there was no row to update, we insert
IF @@ROWCOUNT=0
INSERT INTO dbo.Table1 (Column1, Column2)
VALUES (@c1, @c2);
SET @i += 1
END
END
END
go
—————————–
— test to validate the functionality
—————————–
INSERT dbo.Table1 VALUES (1,2);
go
SELECT N’Before-MERGE’ AS [Before-MERGE], Column1, Column2
FROM dbo.Table1;
go
DECLARE @tvp1 dbo.Type1;
INSERT @tvp1 (c1, c2) VALUES (1,33), (2,4);
EXECUTE dbo.usp_merge1 @tvp1;
go
SELECT N’After–MERGE’ AS [After–MERGE], Column1, Column2
FROM dbo.Table1;
go
Console
/**** Actual output:
Before-MERGE Column1 Column2
Before-MERGE 1 2
After–MERGE Column1 Column2
After–MERGE 1 33
After–MERGE 2 4
****/
SQL Server Management Objects Support for In-Memory OLTP
This topic describes items in SQL Server Management Objects (SMO) that support In-Memory OLTP.
SMO types and members
The following types and members are in the namespace Microsoft.SqlServer.Management.Smo, and they support In-Memory OLTP:
- DurabilityType (enumeration)
- FileGroup.FileGroupType (property)
- FileGroup.FileGroup (constructor)
- FileGroupType (enumeration)
- Index.BucketCount (property)
- IndexType.NonClusteredHashIndex (enumeration member)
- Index.IsMemoryOptimized (property)
- Server.IsXTPSupported (property)
- StoredProcedure.IsNativelyCompiled (property)
- StoredProcedure.IsSchemaBound (property)
- Table.Durability (property)
- Table.IsMemoryOptimized (property)
- UserDefinedTableType.IsMemoryOptimized (property)
C# code example
Assemblies referenced by the compiled code example
- Microsoft.SqlServer.ConnectionInfo.dll
- Microsoft.SqlServer.Management.Sdk.Sfc.dll
- Microsoft.SqlServer.Smo.dll
- Microsoft.SqlServer.SqlEnum.dll
Actions taken in the code example
- Create a database with memory-optimized filegroup and memory-optimized file.
- Create a durable memory-optimized table with a primary key, nonclustered index, and a nonclustered hash index.
- Create columns and indexes.
- Create a user-defined memory-optimized table type.
- Create a natively compiled stored procedure.
Source code
C#
using Microsoft.SqlServer.Management.Smo;
using System;
public
class
A {
static void Main(string[] args) {
Server server =
new Server(
“(local)”);
// Create a database with memory-optimized filegroup and memory-optimized file. Database db =
new Database(server,
“MemoryOptimizedDatabase”);
db.Create();
FileGroup fg =
new FileGroup(
db,
"memOptFilegroup"
,
FileGroupType.MemoryOptimizedDataFileGroup); db.FileGroups.Add(fg); fg.Create();
// Change this path if needed.
DataFile file =
newDataFile(
fg,"memOptFile"
,
@"C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\MSSQLmemOptFileName"
);
file.Create();
// Create a durable memory-optimized table with primary key, nonclustered index and nonclustered hash index.
// Define the table as memory optimized and set the durability.
Table table =
newTable(db,
"memOptTable");
table.IsMemoryOptimized =
true;
table.Durability = DurabilityType.SchemaAndData;// Create columns.
Column col1 =
newColumn(table,
"col1", DataType.Int);
col1.Nullable =
false;
table.Columns.Add(col1);Column col2 =
newColumn(table,
"col2", DataType.Float);
col2.Nullable =
false;
table.Columns.Add(col2);Column col3 =
newColumn(table,
"col3", DataType.Decimal(
2,
10));
col3.Nullable =
false;
table.Columns.Add(col3);// Create indexes.
Index pk =
newIndex(table,
"PK_memOptTable");
pk.IndexType = IndexType.NonClusteredIndex; pk.IndexKeyType = IndexKeyType.DriPrimaryKey;pk.IndexedColumns.Add(
newIndexedColumn(pk, col1.Name));
table.Indexes.Add(pk);Index ixNonClustered =
newIndex(table,
"ix_nonClustered");
ixNonClustered.IndexType = IndexType.NonClusteredIndex; ixNonClustered.IndexKeyType = IndexKeyType.None; ixNonClustered.IndexedColumns.Add(new
IndexedColumn(ixNonClustered, col2.Name));
table.Indexes.Add(ixNonClustered);Index ixNonClusteredHash =
newIndex(table,
"ix_nonClustered_Hash");
ixNonClusteredHash.IndexType = IndexType.NonClusteredHashIndex; ixNonClusteredHash.IndexKeyType = IndexKeyType.None;ixNonClusteredHash.BucketCount =
1024;
ixNonClusteredHash.IndexedColumns.Add(new
IndexedColumn(ixNonClusteredHash, col3.Name));
table.Indexes.Add(ixNonClusteredHash); table.Create();
// Create a user-defined memory-optimized table type.
UserDefinedTableType uDTT =
newUserDefinedTableType(db,
"memOptUDTT");
uDTT.IsMemoryOptimized =
true;
// Add columns.
Column udTTCol1 =
newColumn(uDTT,
"udtCol1", DataType.Int);
udTTCol1.Nullable =
false;
uDTT.Columns.Add(udTTCol1);Column udTTCol2 =
newColumn(uDTT,
"udtCol2", DataType.Float);
udTTCol2.Nullable =
false;
uDTT.Columns.Add(udTTCol2);Column udTTCol3 =
newColumn(uDTT,
"udtCol3", DataType.Decimal(
2,
10));
udTTCol3.Nullable =
false;
uDTT.Columns.Add(udTTCol3);
// Add index.
Index ix =
newIndex(uDTT,
"IX_UDT");
ix.IndexType = IndexType.NonClusteredHashIndex;ix.BucketCount =
1024;
ix.IndexKeyType = IndexKeyType.DriPrimaryKey;ix.IndexedColumns.Add(
newIndexedColumn(ix, udTTCol1.Name));
uDTT.Indexes.Add(ix); uDTT.Create();
// Create a natively compiled stored procedure.
StoredProcedure sProc =
newStoredProcedure(db,
"nCSProc");
sProc.TextMode =
false;
sProc.TextBody =
"--Type body here";
sProc.IsNativelyCompiled =
true;
sProc.IsSchemaBound =
true;
sProc.ExecutionContext = ExecutionContext.Owner; sProc.Create(); } }
Altering Memory-Optimized Tables
Schema and index changes on memory-optimized tables can be performed by using the ALTER TABLE statement. In SQL Server 2016 and Azure SQL Database ALTER TABLE operations on memory-optimized tables are OFFLINE, meaning that the table is not available for querying while the operation is in progress. The database application can continue to run, and any operation that is accessing the table is blocked until the alteration process is completed. It is possible to combine multiple ADD, DROP or ALTER operations in a single ALTER TABLE statement.
The ALTER TABLE syntax is used for making changes to the table schema, as well as for adding, deleting, and rebuilding indexes. Indexes are considered part of the table definition:
- The syntax ALTER TABLE … ADD/DROP/ALTER INDEX is supported only for memory-optimized tables.
- Without using an ALTER TABLE statement, the statements CREATE INDEX, DROP INDEX, ALTER INDEX, and PAD_INDEX are not supported for indexes on memory-optimized tables.
The following types of alterations are supported:
- Changing the bucket count
- Adding and removing an index
- Changing, adding and removing a column
- Adding and removing a constraint
Natively compiled stored procedures are required to be schema-bound, meaning they have a schema-bound dependency on the memory optimized tables they access, and the columns they reference. A schema-bound dependency is a relationship between two entities that prevents the referenced entity from being dropped or incompatibly modified as long as the referencing entity exists.
For example, if a schema-bound natively compiled stored procedure references a column c1 from table mytable, column c1 cannot be dropped. Similarly, if there is such a procedure with an INSERT statement without column list (e.g., INSERT INTO dbo.mytable VALUES (…)), then no column in the table can be dropped.
Logging of ALTER TABLE on memory-optimized tables
On a memory-optimized table, most ALTER TABLE scenarios now run in parallel and result in an optimization of writes to the transaction log. The optimization is achieved by only logging the metadata changes to the transaction log. However, the following ALTER TABLE operations run single-threaded and are not log-optimized.
The single-threaded operation in this case would log the entire content of the altered table to the transaction log. A list of single-threaded operations follows:
- Alter or add a column to use a large object (LOB) type: nvarchar(max), varchar(max), or varbinary(max).
- Add or drop a COLUMNSTORE index.
- Almost anything that affects an off-row column.
- Cause an on-row column to move off-row.
- Cause an off-row column to move on-row.
- Create a new off-row column.
- Exception: Lengthening an already off-row column is logged in the optimized way.
The following example alters the bucket count of an existing hash index. This rebuilds the hash index with the new bucket count while other properties of the hash index remain the same.
SQL
ALTER TABLE Sales.SalesOrderDetail_inmem
ALTER INDEX imPK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID
REBUILD WITH (BUCKET_COUNT=67108864);
GO
The following example adds a column with a NOT NULL constraint and with a DEFAULT definition, and uses WITH VALUES to provide values for each existing row in the table. If WITH VALUES is not used, each row has the value NULL in the new column.
SQL
ALTER TABLE Sales.SalesOrderDetail_inmem
ADD Comment NVARCHAR(100) NOT NULL DEFAULT N” WITH VALUES;
GO
The following example adds a primary key constraint to an existing column.
SQL
CREATE TABLE dbo.UserSession (
SessionId int not null,
UserId int not null,
CreatedDate datetime2 not null,
ShoppingCartId int,
index ix_UserId nonclustered hash (UserId) with (bucket_count=400000)
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY) ;
GO
ALTER TABLE dbo.UserSession
ADD CONSTRAINT PK_UserSession PRIMARY KEY NONCLUSTERED (SessionId);
GO
The following example removes an index.
SQL
ALTER TABLE Sales.SalesOrderDetail_inmem
DROP INDEX ix_ModifiedDate;
GO
The following example adds an index.
SQL
ALTER TABLE Sales.SalesOrderDetail_inmem
ADD INDEX ix_ModifiedDate (ModifiedDate);
GO
The following example adds multiple columns, with an index and constraints.
SQL
ALTER TABLE Sales.SalesOrderDetail_inmem
ADD CustomerID int NOT NULL DEFAULT -1 WITH VALUES,
ShipMethodID int NOT NULL DEFAULT -1 WITH VALUES,
INDEX ix_Customer (CustomerID);
GO
Troubleshooting Hash Indexes for Memory-Optimized Tables
When creating a hash index for a memory-optimized table, the number of buckets needs to be specified at create time. In most cases the bucket count would ideally be between 1 and 2 times the number of distinct values in the index key.
However, even if the BUCKET_COUNT is moderately below or above the preferred range, the performance of your hash index is likely to be tolerable or acceptable. At minimum, consider giving your hash index a BUCKET_COUNT roughly equal to the number of rows you predict your memory-optimized table will grow to have.
Suppose your growing table has 2,000,000 rows, but the prediction is it will grow 10 times to 20,000,000 rows. Start with a bucket count that is 10 times the number of rows in the table. This gives you room for an increased quantity of rows.
- Ideally you would increase the bucket count when the quantity of rows reaches the initial bucket count.
- Even if the quantity of rows grows to 5 times larger than the bucket count, the performance is still good in most situations.
Suppose a hash index has 10,000,000 distinct key values.
- A bucket count of 2,000,000 would be about as low as you could accept. The degree of performance degradation could be tolerable.
Too many duplicate values in the index?
If the hash indexed values have a high rate of duplicates, the hash buckets suffer longer chains.
Assume you have the same SupportEvent table from the earlier T-SQL syntax code block. The following T-SQL code demonstrates how you can find and display the ratio of all values to unique values:
SQL
— Calculate ratio of: Rows / Unique_Values.
DECLARE @allValues float(8) = 0.0, @uniqueVals float(8) = 0.0;
SELECT @allValues = Count(*) FROM SupportEvent;
SELECT @uniqueVals = Count(*) FROM
(SELECT DISTINCT SupportEngineerName
FROM SupportEvent) as d;
— If (All / Unique) >= 10.0, use a nonclustered index, not a hash.
SELECT Cast((@allValues / @uniqueVals) as float) as [All_divby_Unique];
go
- A ratio of 10.0 or higher means a hash would be a poor type of index. Consider using a nonclustered index instead,
Troubleshooting hash index bucket count
This section discusses how to troubleshoot the bucket count for your hash index.
Monitor statistics for chains and empty buckets
You can monitor the statistical health of your hash indexes by running the following T-SQL SELECT. The SELECT uses the data management view (DMV) named sys.dm_db_xtp_hash_index_stats.
SQL
SELECT
QUOTENAME(SCHEMA_NAME(t.schema_id)) + N’.’ + QUOTENAME(OBJECT_NAME(h.object_id)) as [table],
i.name as [index],
h.total_bucket_count,
h.empty_bucket_count,
FLOOR((
CAST(h.empty_bucket_count as float) /
h.total_bucket_count) * 100)
as [empty_bucket_percent],
h.avg_chain_length,
h.max_chain_length
FROM
sys.dm_db_xtp_hash_index_stats as h
JOIN sys.indexes as i
ON h.object_id = i.object_id
AND h.index_id = i.index_id
JOIN sys.memory_optimized_tables_internal_attributes ia ON h.xtp_object_id=ia.xtp_object_id
JOIN sys.tables t on h.object_id=t.object_id
WHERE ia.type=1
ORDER BY [table], [index];
Compare the SELECT results to the following statistical guidelines:
- Empty buckets:
- 33% is a good target value, but a larger percentage (even 90%) is usually fine.
- When the bucket count equals the number of distinct key values, approximately 33% of the buckets are empty.
- A value below 10% is too low.
- Chains within buckets:
- An average chain length of 1 is ideal in case there are no duplicate index key values. Chain lengths up to 10 are usually acceptable.
- If the average chain length is greater than 10, and the empty bucket percent is greater than 10%, the data has so many duplicates that a hash index might not be the most appropriate type.
Demonstration of chains and empty buckets
The following T-SQL code block gives you an easy way to test a SELECT * FROM sys.dm_db_xtp_hash_index_stats;. The code block completes in 1 minute. Here are the phases of the following code block:
- Creates a memory-optimized table that has a few hash indexes.
- Populates the table with thousands of rows.
a. A modulo operator is used to configure the rate of duplicate values in the StatusCode column.
b. The loop inserts 262,144 rows in approximately 1 minute. - PRINTs a message asking you to run the earlier SELECT from sys.dm_db_xtp_hash_index_stats.
SQL
DROP TABLE IF EXISTS SalesOrder_Mem;
go
CREATE TABLE SalesOrder_Mem
(
SalesOrderId uniqueidentifier NOT NULL DEFAULT newid(),
OrderSequence int NOT NULL,
OrderDate datetime2(3) NOT NULL,
StatusCode tinyint NOT NULL,
PRIMARY KEY NONCLUSTERED
HASH (SalesOrderId) WITH (BUCKET_COUNT = 262144),
INDEX ix_OrderSequence
HASH (OrderSequence) WITH (BUCKET_COUNT = 20000),
INDEX ix_StatusCode
HASH (StatusCode) WITH (BUCKET_COUNT = 8),
INDEX ix_OrderDate NONCLUSTERED (OrderDate DESC)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
go
——————–
SET NOCOUNT ON;
— Same as PK bucket_count. 68 seconds to complete.
DECLARE @i int = 262144;
BEGIN TRANSACTION;
WHILE @i > 0
BEGIN
INSERT SalesOrder_Mem
(OrderSequence, OrderDate, StatusCode)
Values
(@i, GetUtcDate(), @i % 8); — Modulo technique.
SET @i -= 1;
END
COMMIT TRANSACTION;
PRINT ‘Next, you should query: sys.dm_db_xtp_hash_index_stats .’;
go
The preceding INSERT loop does the following:
- Inserts unique values for the primary key index, and for ix_OrderSequence.
- Inserts a couple hundred thousands rows which represent only 8 distinct values for StatusCode. Therefore there is a high rate of value duplication in index ix_StatusCode.
For troubleshooting when the bucket count is not optimal, examine the following output of the SELECT from sys.dm_db_xtp_hash_index_stats. For these results we added WHERE Object_Name(h.object_id) = ‘SalesOrder_Mem’ to the SELECT copied from section D.1.
Our SELECT results are displayed after the code, artificially split into two narrower results tables for better display.
- Here are the results for bucket count.
IndexName | total_bucket_count | empty_bucket_count | EmptyBucketPercent |
ix_OrderSequence | 32768 | 13 | 0 |
ix_StatusCode | 8 | 4 | 50 |
PK_SalesOrd_B14003… | 262144 | 96525 | 36 |
- Next are the results for chain length.
IndexName | avg_chain_length | max_chain_length |
ix_OrderSequence | 8 | 26 |
ix_StatusCode | 65536 | 65536 |
PK_SalesOrd_B14003… | 1 | 8 |
Let us interpret the preceding results tables for the three hash indexes:
ix_StatusCode:
- 50% of the buckets are empty, which is good.
- However, the average chain length is very high at 65536.
- This indicates a high rate of duplicate values.
- Therefore, using a hash index is not appropriate in this case. A nonclustered index should be used instead.
ix_OrderSequence:
- 0% of the buckets are empty, which is too low.
- The average chain length is 8, even though all values in this index are unique.
- Therefore the bucket count should be increased, to reduce the average chain length closer to 2 or 3.
- Because the index key has 262144 unique values, the bucket count should be at least 262144.
- If future growth is expected, the bucket count should be higher.
Primary key index (PK_SalesOrd_…):
- 36% of the buckets are empty, which is good.
- The average chain length is 1, which is also good. No change is needed.
Balancing the trade-off
OLTP workloads focus on individual rows. Full table scans are not usually in the performance critical path for OLTP workloads. Therefore, the trade-off you must balance is between quantity of memory utilization versus performance of equality tests and insert operations.