Always On Availability Group on Linux

This describes how to create a SQL Server Always On Availability Group (AG) for high availability on Linux. There are two configuration types for AGs. A high availability configuration uses a cluster manager to provide business continuity. This configuration can also include read-scale replicas. This document explains how to create the AG for high availability.

You can also create an AG without a cluster manager for read-scale. The AG for read scale only provides read-only replicas for performance scale-out. It does not provide high availability.

Configurations that guarantee high availability and data protection require either two or three synchronous commit replicas. With three synchronous replicas, the AG can automatically recover even if one server is not available.

All servers must be either physical or virtual, and virtual servers must be on the same virtualization platform. This requirement is because the fencing agents are platform specific.

Roadmap

The steps to create an AG on Linux servers for high availability are different from the steps on a Windows Server failover cluster. The following list describes the high-level steps:

  1. Configure SQL Server on three cluster servers.
  2. Create the AG. This step is covered in this current article.
  3. Configure a cluster resource manager, like Pacemaker.

Production environments require a fencing agent, like STONITH for high availability. The demonstrations in this documentation do not use fencing agents. The demonstrations are for testing and validation only.

A Linux cluster uses fencing to return the cluster to a known state. The way to configure fencing depends on the distribution and the environment. Currently, fencing is not available in some cloud environments.

  • Add the AG as a resource in the cluster.

Prerequisites

Before you create the availability group, you need to:

  • Set your environment so that all the servers that will host availability replicas can communicate.
  • Install SQL Server.

To set the computer name, edit /etc/hostname. The following script lets you edit /etc/hostname with vi:

Bash

sudo vi /etc/hostname

  1. Configure the hosts file.

The hosts file on every server contains the IP addresses and names of all servers that will participate in the availability group.

The following command returns the IP address of the current server:

Bash

sudo ip addr show

Update /etc/hosts. The following script lets you edit /etc/hosts with vi:

Bash

sudo vi /etc/hosts

The following example shows /etc/hosts on node1 with additions for node1node2, and node3. In this document, node1 refers to the server that hosts the primary replica. And node2 and node3 refer to servers that host the secondary replicas.

127.0.0.1   localhost localhost4 localhost4.localdomain4

::1       localhost localhost6 localhost6.localdomain6

10.128.18.12 node1

10.128.16.77 node2

10.128.15.33 node3

Enable AlwaysOn availability groups and restart mssql-server

Enable AlwaysOn availability groups on each node that hosts a SQL Server instance. Then restart mssql-server. Run the following script:

Bash

sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled  1

sudo systemctl restart mssql-server

Enable an AlwaysOn_health event session

You can optionally enable AlwaysOn availability groups extended events to help with root-cause diagnosis when you troubleshoot an availability group. Run the following command on each instance of SQL Server:

SQL

ALTER EVENT SESSION  AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);

GO

For more information about this XE session, see Always On extended events.

Create a certificate

The SQL Server service on Linux uses certificates to authenticate communication between the mirroring endpoints.

The following Transact-SQL script creates a master key and a certificate. It then backs up the certificate and secures the file with a private key. Update the script with strong passwords. Connect to the primary SQL Server instance. To create the certificate, run the following Transact-SQL script:

SQL

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘**<Master_Key_Password>**’;

CREATE CERTIFICATE dbm_certificate WITH SUBJECT = ‘dbm’;

BACKUP CERTIFICATE dbm_certificate

   TO FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’

   WITH PRIVATE KEY (

           FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,

           ENCRYPTION BY PASSWORD = ‘**<Private_Key_Password>**’

       );

At this point, your primary SQL Server replica has a certificate at /var/opt/mssql/data/dbm_certificate.cer and a private key at var/opt/mssql/data/dbm_certificate.pvk. Copy these two files to the same location on all servers that will host availability replicas. Use the mssql user, or give permission to the mssql user to access these files.

For example, on the source server, the following command copies the files to the target machine. Replace the **<node2>** values with the names of the SQL Server instances that will host the replicas.

Bash

cd /var/opt/mssql/data

scp dbm_certificate.* [email protected]**<node2>**:/var/opt/mssql/data/

On each target server, give permission to the mssql user to access the certificate.

Bash

cd /var/opt/mssql/data

chown mssql:mssql dbm_certificate.*

Create the certificate on secondary servers

The following Transact-SQL script creates a master key and a certificate from the backup that you created on the primary SQL Server replica. Update the script with strong passwords. The decryption password is the same password that you used to create the .pvk file in a previous step. To create the certificate, run the following script on all secondary servers:

SQL

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘**<Master_Key_Password>**’;

CREATE CERTIFICATE dbm_certificate

    FROM FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’

    WITH PRIVATE KEY (

    FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,

    DECRYPTION BY PASSWORD = ‘**<Private_Key_Password>**’

            );

Create the database mirroring endpoints on all replicas

Database mirroring endpoints use the Transmission Control Protocol (TCP) to send and receive messages between the server instances that participate in database mirroring sessions or host availability replicas. The database mirroring endpoint listens on a unique TCP port number.

The following Transact-SQL script creates a listening endpoint named Hadr_endpoint for the availability group. It starts the endpoint and gives connection permission to the certificate that you created. Before you run the script, replace the values between **< … >**. Optionally you can include an IP address LISTENER_IP = (0.0.0.0). The listener IP address must be an IPv4 address. You can also use 0.0.0.0.

Update the following Transact-SQL script for your environment on all SQL Server instances:

SQL

CREATE ENDPOINT [Hadr_endpoint]

    AS TCP (LISTENER_PORT = **<5022>**)

    FOR DATABASE_MIRRORING (

                ROLE = ALL,

                AUTHENTICATION = CERTIFICATE dbm_certificate,

                        ENCRYPTION = REQUIRED ALGORITHM AES

                        );

ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

If you use SQL Server Express Edition on one node to host a configuration-only replica, the only valid value for ROLE is WITNESS. Run the following script on SQL Server Express Edition:

SQL

CREATE ENDPOINT [Hadr_endpoint]

    AS TCP (LISTENER_PORT = **<5022>**)

    FOR DATABASE_MIRRORING (

                ROLE = WITNESS,

                AUTHENTICATION = CERTIFICATE dbm_certificate,

                        ENCRYPTION = REQUIRED ALGORITHM AES

                        );

ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

The TCP port on the firewall must be open for the listener port.

Create the AG

The examples in this section explain how to create the availability group using Transact-SQL. You can also use the SQL Server Management Studio Availability Group Wizard. When you create an AG with the wizard, it will return an error when you join the replicas to the AG. To fix this, grant ALTER, CONTROL, and VIEW DEFINITIONS to the pacemaker on the AG on all replicas. Once permissions are granted on the primary replica, join the nodes to the AG through the wizard, but for HA to function properly, grant permission on all replicas.

Create the AG for high availability on Linux. Use the CREATE AVAILABILITY GROUP with CLUSTER_TYPE = EXTERNAL.

  • Availability group – CLUSTER_TYPE = EXTERNAL Specifies that an external cluster entity manages the AG. Pacemaker is an example of an external cluster entity. When the AG cluster type is external,
  • Set Primary and secondary replicas FAILOVER_MODE = EXTERNAL. Specifies that the replica interacts with an external cluster manager, like Pacemaker.

The following Transact-SQL scripts create an AG for high availability named ag1. The script configures the AG replicas with SEEDING_MODE = AUTOMATIC. This setting causes SQL Server to automatically create the database on each secondary server. Update the following script for your environment. Replace the <node1>, <node2>, or <node3> values with the names of the SQL Server instances that host the replicas. Replace the <5022> with the port you set for the data mirroring endpoint. To create the AG, run the following Transact-SQL on the SQL Server instance that hosts the primary replica.

  • Create AG with three synchronous replicas

SQL

CREATE AVAILABILITY GROUP [ag1]

     WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)

     FOR REPLICA ON

         N'<node1>’

               WITH (

         ENDPOINT_URL = N’tcp://<node1>:<5022>’,

         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

          FAILOVER_MODE = EXTERNAL,

         SEEDING_MODE = AUTOMATIC

         ),

         N'<node2>’

      WITH (

         ENDPOINT_URL = N’tcp://<node2>:<5022>’,

         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

         FAILOVER_MODE = EXTERNAL,

         SEEDING_MODE = AUTOMATIC

         ),

     N'<node3>’

         WITH(

        ENDPOINT_URL = N’tcp://<node3>:<5022>’,

        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

        FAILOVER_MODE = EXTERNAL,

        SEEDING_MODE = AUTOMATIC

        );

ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;

  • Create AG with two synchronous replicas and a configuration replica:

SQL

CREATE AVAILABILITY GROUP [ag1]

    WITH (CLUSTER_TYPE = EXTERNAL)

    FOR REPLICA ON

     N'<node1>’ WITH (

        ENDPOINT_URL = N’tcp://<node1>:<5022>’,

        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

        FAILOVER_MODE = EXTERNAL,

        SEEDING_MODE = AUTOMATIC

        ),

     N'<node2>’ WITH

        ENDPOINT_URL = N’tcp://<node2>:<5022>’, 

        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

        FAILOVER_MODE = EXTERNAL,

        SEEDING_MODE = AUTOMATIC

        ),

     N'<node3>’ WITH (

        ENDPOINT_URL = N’tcp://<node3>:<5022>’,

        AVAILABILITY_MODE = CONFIGURATION_ONLY 

        );

ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;

  • ·         Create AG with two synchronous replicas

Include two replicas with synchronous availability mode. For example, the following script creates an AG called ag1. node1 and node2 host replicas in synchronous mode, with automatic seeding and automatic failover.

SQL

CREATE AVAILABILITY GROUP [ag1]

    WITH (CLUSTER_TYPE = EXTERNAL)

    FOR REPLICA ON

    N’node1′ WITH (

       ENDPOINT_URL = N’tcp://node1:5022′,

       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

       FAILOVER_MODE = EXTERNAL,

       SEEDING_MODE = AUTOMATIC

    ),

    N’node2′ WITH

       ENDPOINT_URL = N’tcp://node2:5022′, 

       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

       FAILOVER_MODE = EXTERNAL,

       SEEDING_MODE = AUTOMATIC

    );

ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;

You can also configure an AG with CLUSTER_TYPE=EXTERNAL using SQL Server Management Studio or PowerShell.

Join secondary replicas to the AG

The pacemaker user requires ALTER, CONTROL, and VIEW DEFINITION permissions on the availability group on all replicas. To grant permissions, run the following Transact-SQL script after the availability group is created on the primary replica and each secondary replica immediately after they are added to the availability group. Before you run the script, replace <pacemakerLogin> with the name of the pacemaker user account. If you do not have a login for pacemaker, create a sql server login for pacemaker.

SQL

GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO <pacemakerLogin>

GRANT VIEW SERVER STATE TO <pacemakerLogin>

The following Transact-SQL script joins a SQL Server instance to an AG named ag1. Update the script for your environment. On each SQL Server instance that hosts a secondary replica, run the following Transact-SQL to join the AG.

SQL

ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);

ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;

Add a database to the availability group

Ensure that the database you add to the availability group is in full recovery mode and has a valid log backup. If this is a test database or a newly created database, take a database backup. On the primary SQL Server, run the following Transact-SQL script to create and back up a database called db1:

SQL

CREATE DATABASE [db1];

ALTER DATABASE [db1] SET RECOVERY FULL;

BACKUP DATABASE [db1]

   TO DISK = N’/var/opt/mssql/data/db1.bak’;

On the primary SQL Server replica, run the following Transact-SQL script to add a database called db1 to an availability group called ag1:

SQL

ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1];

Verify that the database is created on the secondary servers

On each secondary SQL Server replica, run the following query to see if the db1 database was created and is synchronized:

SQL

SELECT * FROM sys.databases WHERE name = ‘db1’;

GO

SELECT DB_NAME(database_id) AS ‘database’, synchronization_state_desc FROM sys.dm_hadr_database_replica_states;

Configure RHEL Cluster for SQL Server Availability Group

This explains how to create a three-node availability group cluster for SQL Server on Red Hat Enterprise Linux. The clustering layer is based on Red Hat Enterprise Linux (RHEL) HA add-on built on top of Pacemaker.

The following sections walk through the steps to set up a Pacemaker cluster and add an availability group as resource in the cluster for high availability.

Roadmap

The steps to create an availability group on Linux servers for high availability are different from the steps on a Windows Server failover cluster. The following list describes the high-level steps:

  1. Configure SQL Server on the cluster nodes.
  2. Create the availability group.
  3. Configure a cluster resource manager, like Pacemaker. These instructions are in this document.

The way to configure a cluster resource manager depends on the specific Linux distribution.

  • Add the availability group as a resource in the cluster.

Configure high availability for RHEL

To configure high availability for RHEL, enable the high availability subscription and then configure Pacemaker.

Enable the high availability subscription for RHEL

Each node in the cluster must have an appropriate subscription for RHEL and the High Availability Add on. Follow these steps to configure the subscription and repos:

  1. Register the system.

Bash

sudo subscription-manager register

Provide your user name and password.

  • List the available pools for registration.

Bash

sudo subscription-manager list –available

From the list of available pools, note the pool ID for the high availability subscription.

  • Update the following script. Replace <pool id> with the pool ID for high availability from the preceding step. Run the script to attach the subscription.

Bash

sudo subscription-manager attach –pool=<pool id>

  • Enable the repository.

RHEL 7

Bash

sudo subscription-manager repos –enable=rhel-ha-for-rhel-7-server-rpms

RHEL 8

Bash

sudo subscription-manager repos –enable=rhel-8-for-x86_64-highavailability-rpms

After you have configured the subscription, complete the following steps to configure Pacemaker:

Configure Pacemaker

After you register the subscription, complete the following steps to configure Pacemaker:

  1. On all cluster nodes, open the Pacemaker firewall ports. To open these ports with firewalld, run the following command:

Bash

sudo firewall-cmd –permanent –add-service=high-availability

sudo firewall-cmd –reload

If the firewall doesn’t have a built-in high-availability configuration, open the following ports for Pacemaker.

  • TCP: Ports 2224, 3121, 21064
    • UDP: Port 5405
  • Install Pacemaker packages on all nodes.

Bash

sudo yum install pacemaker pcs fence-agents-all resource-agents

  • Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on all nodes.

Bash

sudo passwd hacluster

  • To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following command on all nodes.

Bash

sudo systemctl enable pcsd

sudo systemctl start pcsd

sudo systemctl enable pacemaker

  • Create the Cluster. To create the cluster, run the following command:

RHEL 7

Bash

sudo pcs cluster auth <node1> <node2> <node3> -u hacluster -p <password for hacluster>

sudo pcs cluster setup –name <clusterName> <node1> <node2> <node3>

sudo pcs cluster start –all

sudo pcs cluster enable –all

RHEL8

For RHEL 8, you will need to authenticate the nodes separately. Manually enter in the Username and Password for hacluster when prompted.

Bash

sudo pcs host auth <node1> <node2> <node3>

sudo pcs cluster setup <clusterName> <node1> <node2> <node3>

sudo pcs cluster start –all

sudo pcs cluster enable –all

  • Install SQL Server resource agent for SQL Server. Run the following commands on all nodes.

Bash

sudo yum install mssql-server-ha

After Pacemaker is configured, use pcs to interact with the cluster. Execute all commands on one node from the cluster.

Configure fencing (STONITH)

Pacemaker cluster vendors require STONITH to be enabled and a fencing device configured for a supported cluster setup. STONITH stands for “shoot the other node in the head.” When the cluster resource manager cannot determine the state of a node or of a resource on a node, fencing brings the cluster to a known state again.

Resource level fencing ensures that there is no data corruption in case of an outage by configuring a resource. For example, you can use resource level fencing to mark the disk on a node as outdated when the communication link goes down.

Node level fencing ensures that a node does not run any resources. This is done by resetting the node. Pacemaker supports a great variety of fencing devices. Examples include an uninterruptible power supply or management interface cards for servers.

Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later). The following script disables node level fencing:

Bash

sudo pcs property set stonith-enabled=false

Disabling STONITH is just for testing purposes. If you plan to use Pacemaker in a production environment, you should plan a STONITH implementation depending on your environment and keep it enabled.

Set cluster property cluster-recheck-interval

cluster-recheck-interval indicates the polling interval at which the cluster checks for changes in the resource parameters, constraints or other cluster options. If a replica goes down, the cluster tries to restart the replica at an interval that is bound by the failure-timeout value and the cluster-recheck-interval value. For example, if failure-timeout is set to 60 seconds and cluster-recheck-interval is set to 120 seconds, the restart is tried at an interval that is greater than 60 seconds but less than 120 seconds. We recommend that you set failure-timeout to 60s and cluster-recheck-interval to a value that is greater than 60 seconds. Setting cluster-recheck-interval to a small value is not recommended.

To update the property value to 2 minutes run:

Bash

sudo pcs property set cluster-recheck-interval=2min

To update the property value to true run:

Bash

sudo pcs property set start-failure-is-fatal=true

To update the ag_cluster resource property failure-timeout to 60s run:

Bash

pcs resource update ag_cluster meta failure-timeout=60s

Create a SQL Server login for Pacemaker

  1. On all SQL Servers, create a Server login for Pacemaker. The following Transact-SQL creates a login:

Transact-SQL

USE [master]

GO

CREATE LOGIN [pacemakerLogin] with PASSWORD= N’[email protected]$$w0rd!’

ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]

At the time of availability group creation, the pacemaker user will require ALTER, CONTROL and VIEW DEFINITION permissions on the availability group, after it’s created but before any nodes are added to it.

  1. On all SQL Servers, save the credentials for the SQL Server login.

Bash

echo ‘pacemakerLogin’ >> ~/pacemaker-passwd

echo ‘[email protected]$$w0rd!’ >> ~/pacemaker-passwd

sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd

sudo chown root:root /var/opt/mssql/secrets/passwd

sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root

Create availability group resource

To create the availability group resource, use pcs resource create command and set the resource properties. The following command creates a ocf:mssql:ag master/subordinate type resource for availability group with name ag1.

RHEL 7

Bash

sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s master notify=true

RHEL 8

With the availability of RHEL 8, the create syntax has changed. If you are using RHEL 8, the terminology master has changed to promotable. Use the following create command instead of the above command:

Bash

sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s promotable notify=true

 Note

When you create the resource, and periodically afterwards, the Pacemaker resource agent automatically sets the value of REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT on the availability group based on the availability group’s configuration. For example, if the availability group has three synchronous replicas, the agent will set REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT to 1.

Create virtual IP resource

To create the virtual IP address resource, run the following command on one node. Use an available static IP address from the network. Replace the IP address between <10.128.16.240> with a valid IP address.

Bash

sudo pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=<10.128.16.240>

There is no virtual server name equivalent in Pacemaker. To use a connection string that points to a string server name instead of an IP address, register the virtual IP resource address and desired virtual server name in DNS. For DR configurations, register the desired virtual server name and IP address with the DNS servers on both primary and DR site.

Add colocation constraint

Almost every decision in a Pacemaker cluster, like choosing where a resource should run, is done by comparing scores. Scores are calculated per resource. The cluster resource manager chooses the node with the highest score for a particular resource. If a node has a negative score for a resource, the resource cannot run on that node.

On a pacemaker cluster, you can manipulate the decisions of the cluster with constraints. Constraints have a score. If a constraint has a score lower than INFINITY, Pacemaker regards it as recommendation. A score of INFINITY is mandatory.

To ensure that primary replica and the virtual ip resources run on the same host, define a colocation constraint with a score of INFINITY. To add the colocation constraint, run the following command on one node.

RHEL 7

When you create the ag_cluster resource in RHEL 7, it creates the resource as ag_cluster-master. Use the following command for RHEL 7:

Bash

sudo pcs constraint colocation add virtualip ag_cluster-master INFINITY with-rsc-role=Master

RHEL 8

When you create the ag_cluster resource in RHEL 8, it creates the resource as ag_cluster-clone. Use the following command for RHEL 8:

Bash

sudo pcs constraint colocation add virtualip with master ag_cluster-clone INFINITY with-rsc-role=Master

Add ordering constraint

The colocation constraint has an implicit ordering constraint. It moves the virtual IP resource before it moves the availability group resource. By default the sequence of events is:

  1. User issues pcs resource move to the availability group primary from node1 to node2.
  2. The virtual IP resource stops on node 1.
  3. The virtual IP resource starts on node 2.
  4. The availability group primary on node 1 is demoted to secondary.
  5. The availability group secondary on node 2 is promoted to primary.

To prevent the IP address from temporarily pointing to the node with the pre-failover secondary, add an ordering constraint.

To add an ordering constraint, run the following command on one node:

RHEL 7

Bash

sudo pcs constraint order promote ag_cluster-master then start virtualip

RHEL 8

Bash

sudo pcs constraint order promote ag_cluster-clone then start virtualip

Manually fail over the availability group with pcs. Do not initiate failover with Transact-SQL. For instructions, see Failover.