Failover Cluster Instances – SQL Server on Linux

This describes the concepts related to SQL Server failover cluster instances (FCI) on Linux.

The Clustering Layer

  • In RHEL, the clustering layer is based on Red Hat Enterprise Linux (RHEL) HA add-on.
  • In SLES, the clustering layer is based on SUSE Linux Enterprise High Availability Extension (HAE).

Both the RHEL HA add-on and the SUSE HAE are built on Pacemaker.

As the following diagram shows, storage is presented to two servers. Clustering components – Corosync and Pacemaker – coordinate communications and resource management. One of the servers has the active connection to the storage resources and the SQL Server. When Pacemaker detects a failure the clustering components manage moving the resources to the other node.

At this point, SQL Server’s integration with Pacemaker on Linux is not as coupled as with WSFC on Windows. From within SQL, there is no knowledge about the presence of the cluster, all orchestration is outside in and the service is controlled as a standalone instance by Pacemaker. Also, virtual network name is specific to WSFC, there is no equivalent of the same in Pacemaker. It is expected that @@servername and sys.servers to return the node name, while the cluster dmvs sys.dm_os_cluster_nodes and sys.dm_os_cluster_properties will no records. To use a connection string that points to a string server name and not use the IP, they will have to register in their DNS server the IP used to create the virtual IP resource (as explained in the following sections) with the chosen server name.

Number of Instances and Nodes

One key difference with SQL Server on Linux is that there can only be one install of SQL Server per Linux server. That installation is called an instance. This means that unlike Windows Server which supports up to 25 FCIs per Windows Server failover cluster (WSFC), a Linux-based FCI will only have a single instance. This one instance is also a default instance; there is no concept of a named instance on Linux.

A Pacemaker cluster can only have up to 16 nodes when Corosync is involved, so a single FCI can span up to 16 servers. An FCI implemented with Standard Edition of SQL Server supports up to two nodes of a cluster even if the Pacemaker cluster has the maximum 16 nodes.

In a SQL Server FCI, the SQL Server instance is active on either one node or the other.

IP Address and Name

On a Linux Pacemaker cluster, each SQL Server FCI needs its own unique IP address and name. If the FCI configuration spans multiple subnets, one IP address will be required per subnet. The unique name and IP address(es) are used to access the FCI so that applications and end users do not need to know which underlying server of the Pacemaker cluster.

The name of the FCI in DNS should be the same as the name of the FCI resource that gets created in the Pacemaker cluster. Both the name and IP address must be registered in DNS.

Shared Storage

All FCIs, whether they are on Linux or Windows Server, require some form of shared storage. This storage is presented to all servers that can possibly host the FCI, but only a single server can use the storage for the FCI at any given time. The options available for shared storage under Linux are:

  • iSCSI
  • Network File System (NFS)
  • Server Message Block (SMB) Under Windows Server, there are slightly different options. One option not currently supported for Linux-based FCIs is the ability to use a disk that is local to the node for TempDB, which is SQL Server’s temporary workspace.

In a configuration that spans multiple locations, what is stored at one data center must be synchronized with the other. In the event of a failover, the FCI will be able to come online and the storage is seen to be the same. Achieving this will require some external method for storage replication, whether it is done via the underlying storage hardware or some software-based utility.

The process for presenting shared storage is the same for the different supported methods:

  • Configure the shared storage
  • Mount the storage as a folder to the servers that will serve as nodes of the Pacemaker cluster for the FCI
  • If required, move the SQL Server system databases to shared storage
  • Test that SQL Server works from each server connected to the shared storage

One major difference with SQL Server on Linux is that while you can configure the default user data and log file location, the system databases must always exist at /var/opt/mssql/data. On Windows Server, there is the ability to move the system databases including TempDB. This fact plays into how shared storage is configured for an FCI.

The default paths for non-system databases can be changed using the mssql-conf utility. You can also store SQL Server data and transaction in other locations as long as they have the proper security even if it is not a default location; the location would need to be stated.

Configure RHEL failover cluster instance (FCI) cluster for SQL Server

This provides instructions to create a two-node shared disk failover cluster for SQL Server on Red Hat Enterprise Linux. The clustering layer is based on Red Hat Enterprise Linux (RHEL) HA add-on built on top of Pacemaker. The SQL Server instance is active on either one node or the other.

As the following diagram shows, storage is presented to two servers. Clustering components – Corosync and Pacemaker – coordinate communications and resource management. One of the servers has the active connection to the storage resources and the SQL Server. When Pacemaker detects a failure the clustering components manage moving the resources to the other node.

The following sections walk through the steps to set up a failover cluster solution.

To complete the following end-to-end scenario, you need two machines to deploy the two nodes cluster and another server to configure the NFS server. Below steps outline how these servers will be configured.

Setup and configure the operating system on each cluster node

The first step is to configure the operating system on the cluster nodes. For this walk through, use RHEL with a valid subscription for the HA add-on.

Install and configure SQL Server on each cluster node

  1. Install and setup SQL Server on both nodes.
  2. Designate one node as primary and the other as secondary, for purposes of configuration. Use these terms for the following this guide.
  3. On the secondary node, stop and disable SQL Server.

The following example stops and disables SQL Server:

Bash

sudo systemctl stop mssql-server

sudo systemctl disable mssql-server

At setup time, a Server Master Key is generated for the SQL Server instance and placed at /var/opt/mssql/secrets/machine-key. On Linux, SQL Server always runs as a local account called mssql. Because it’s a local account, its identity isn’t shared across nodes. Therefore, you need to copy the encryption key from primary node to each secondary node so each local mssql account can access it to decrypt the Server Master Key.

  1. On the primary node, create a SQL server login for Pacemaker and grant the login permission to run sp_server_diagnostics. Pacemaker uses this account to verify which node is running SQL Server.

Bash

sudo systemctl start mssql-server

Connect to the SQL Server master database with the sa account and run the following:

bashsql

USE [master]

GO

CREATE LOGIN [<loginName>] with PASSWORD= N'<loginPassword>’

ALTER SERVER ROLE [sysadmin] ADD MEMBER [<loginName>]

Alternatively, you can set the permissions at a more granular level. The Pacemaker login requires VIEW SERVER STATE to query health status with sp_server_diagnostics, setupadmin and ALTER ANY LINKED SERVER to update the FCI instance name with the resource name by running sp_dropserver and sp_addserver.

  • On the primary node, stop and disable SQL Server.
  • Configure the hosts file for each cluster node. The host file must include the IP address and name of every cluster node.

Check the IP address for each node. The following script shows the IP address of your current node.

Bash

sudo ip addr show

Set the computer name on each node. Give each node a unique name that is 15 characters or less. Set the computer name by adding it to /etc/hosts. The following script lets you edit /etc/hosts with vi.

Bash

sudo vi /etc/hosts

The following example shows /etc/hosts with additions for two nodes named sqlfcivm1 and sqlfcivm2.

Bash

127.0.0.1   localhost localhost4 localhost4.localdomain4

::1       localhost localhost6 localhost6.localdomain6

10.128.18.128 sqlfcivm1

10.128.16.77 sqlfcivm2

In the next section you will configure shared storage and move your database files to that storage.

Configure shared storage and move database files

There are a variety of solutions for providing shared storage. This walk-through demonstrates configuring shared storage with NFS. We recommend to follow best practices and use Kerberos to secure NFS.

On the NFS Server do the following:

  1. Install nfs-utils

Bash

sudo yum -y install nfs-utils

  • Enable and start rpcbind

Bash

sudo systemctl enable rpcbind && sudo systemctl start rpcbind

  • Enable and start nfs-server

Bash

sudo systemctl enable nfs-server && sudo systemctl start nfs-server

  • Edit /etc/exports to export the directory you want to share. You need 1 line for each share you want. For example:

Bash

/mnt/nfs  10.8.8.0/24(rw,sync,no_subtree_check,no_root_squash)

  1. Export the shares

Bash

sudo exportfs -rav

  • Verify that the paths are shared/exported, run from the NFS server

Bash

sudo showmount -e

  • Add exception in SELinux

Bash

sudo setsebool -P nfs_export_all_rw 1

  • Open the firewall the server.

Bash

sudo firewall-cmd –permanent –add-service=nfs

sudo firewall-cmd –permanent –add-service=mountd

sudo firewall-cmd –permanent –add-service=rpc-bind

sudo firewall-cmd –reload

Configure all cluster nodes to connect to the NFS shared storage

Do the following steps on all cluster nodes.

  1. Install nfs-utils

Bash

sudo yum -y install nfs-utils

  1. Open up the firewall on clients and NFS server

Bash

sudo firewall-cmd –permanent –add-service=nfs

sudo firewall-cmd –permanent –add-service=mountd

sudo firewall-cmd –permanent –add-service=rpc-bind

sudo firewall-cmd –reload

  • Verify that you can see the NFS shares on client machines

Bash

sudo showmount -e <IP OF NFS SERVER>

  • Repeat these steps on all cluster nodes.

Mount database files directory to point to the shared storage

  1. On the primary node only, save the database files to a temporary location.The following script, creates a new temporary directory, copies the database files to the new directory, and removes the old database files. As SQL Server runs as local user mssql, you need to make sure that after data transfer to the mounted share, local user has read-write access to the share.

$ sudo su mssql

$ mkdir /var/opt/mssql/tmp

$ cp /var/opt/mssql/data/* /var/opt/mssql/tmp

$ rm /var/opt/mssql/data/*

$ exit

  1. On all cluster nodes edit /etc/fstab file to include the mount command.

Bash

<IP OF NFS SERVER>:<shared_storage_path> <database_files_directory_path> nfs timeo=14,intr

The following script shows an example of the edit.

10.8.8.0:/mnt/nfs /var/opt/mssql/data nfs timeo=14,intr

  1. Run mount -a command for the system to update the mounted paths.
  2. Copy the database and log files that you saved to /var/opt/mssql/tmp to the newly mounted share /var/opt/mssql/data. This only needs to be done on the primary node. Make sure that you give read write permissions to ‘mssql’ local user.

$ sudo chown mssql /var/opt/mssql/data

$ sudo chgrp mssql /var/opt/mssql/data

$ sudo su mssql

$ cp /var/opt/mssql/tmp/* /var/opt/mssql/data/

$ rm /var/opt/mssql/tmp/*

$ exit

  1. Validate that SQL Server starts successfully with the new file path. Do this on each node. At this point only one node should run SQL Server at a time. They cannot both run at the same time because they will both try to access the data files simultaneously (to avoid accidentally starting SQL Server on both nodes, use a File System cluster resource to make sure the share is not mounted twice by the different nodes). The following commands start SQL Server, check the status, and then stop SQL Server.

Bash

sudo systemctl start mssql-server

sudo systemctl status mssql-server

sudo systemctl stop mssql-server

At this point both instances of SQL Server are configured to run with the database files on the shared storage. The next step is to configure SQL Server for Pacemaker.

Install and configure Pacemaker on each cluster node

  • On both cluster nodes, create a file to store the SQL Server username and password for the Pacemaker login. The following command creates and populates this file:

Bash

sudo touch /var/opt/mssql/secrets/passwd

echo ‘<loginName>’ | sudo tee -a /var/opt/mssql/secrets/passwd

echo ‘<loginPassword>’ | sudo tee -a /var/opt/mssql/secrets/passwd

sudo chown root:root /var/opt/mssql/secrets/passwd

sudo chmod 600 /var/opt/mssql/secrets/passwd   

  • On both cluster nodes, open the Pacemaker firewall ports. To open these ports with firewalld, run the following command:

Bash

sudo firewall-cmd –permanent –add-service=high-availability

sudo firewall-cmd –reload

If you’re using another firewall that doesn’t have a built-in high-availability configuration, the following ports need to be opened for Pacemaker to be able to communicate with other nodes in the cluster

  • TCP: Ports 2224, 3121, 21064
    • UDP: Port 5405
  • Install Pacemaker packages on each node.

Bash

sudo yum install pacemaker pcs fence-agents-all resource-agents

  • Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on both nodes.

Bash

sudo passwd hacluster

  • Enable and start pcsd service and Pacemaker. This will allow nodes to rejoin the cluster after the reboot. Run the following command on both nodes.

Bash

sudo systemctl enable pcsd

sudo systemctl start pcsd

sudo systemctl enable pacemaker

  • Install the FCI resource agent for SQL Server. Run the following commands on both nodes.

Bash

sudo yum install mssql-server-ha

Create the cluster

  1. On one of the nodes, create the cluster.

Bash

sudo pcs cluster auth <nodeName1 nodeName2 …> -u hacluster

sudo pcs cluster setup –name <clusterName> <nodeName1 nodeName2 …>

sudo pcs cluster start –all

  • Configure the cluster resources for SQL Server, File System and virtual IP resources and push the configuration to the cluster. You need the following information:
    • SQL Server Resource Name: A name for the clustered SQL Server resource.
    • Floating IP Resource Name: A name for the virtual IP address resource.
    • IP Address: The IP address that clients will use to connect to the clustered instance of SQL Server.
    • File System Resource Name: A name for the File System resource.
    • device: The NFS share path
    • device: The local path that it’s mounted to the share
    • fstype: File share type (i.e. nfs)

Update the values from the following script for your environment. Run on one node to configure and start the clustered service.

Bash

sudo pcs cluster cib cfg

sudo pcs -f cfg resource create <sqlServerResourceName> ocf:mssql:fci

sudo pcs -f cfg resource create <floatingIPResourceName> ocf:heartbeat:IPaddr2 ip=<ip Address>

sudo pcs -f cfg resource create <fileShareResourceName> Filesystem device=<networkPath> directory=<localPath>         fstype=<fileShareType>

sudo pcs -f cfg constraint colocation add <virtualIPResourceName> <sqlResourceName>

sudo pcs -f cfg constraint colocation add <fileShareResourceName> <sqlResourceName>

sudo pcs cluster cib-push cfg

For example, the following script creates a SQL Server clustered resource named mssqlha, and a floating IP resources with IP address 10.0.0.99. It also creates a Filesystem resource and adds constraints so all resources are colocated on same node as SQL resource.

Bash

sudo pcs cluster cib cfg

sudo pcs -f cfg resource create mssqlha ocf:mssql:fci

sudo pcs -f cfg resource create virtualip ocf:heartbeat:IPaddr2 ip=10.0.0.99

sudo pcs -f cfg resource create fs Filesystem device=”10.8.8.0:/mnt/nfs” directory=”/var/opt/mssql/data” fstype=”nfs”

sudo pcs -f cfg constraint colocation add virtualip mssqlha

sudo pcs -f cfg constraint colocation add fs mssqlha

sudo pcs cluster cib-push cfg

After the configuration is pushed, SQL Server will start on one node.

  • Verify that SQL Server is started.

Bash

sudo pcs status

The following examples shows the results when Pacemaker has successfully started a clustered instance of SQL Server.

fs     (ocf::heartbeat:Filesystem):    Started sqlfcivm1

virtualip     (ocf::heartbeat:IPaddr2):      Started sqlfcivm1

mssqlha  (ocf::mssql:fci): Started sqlfcivm1

PCSD Status:

 sqlfcivm1: Online

 sqlfcivm2: Online

Daemon Status:

 corosync: active/disabled

 pacemaker: active/enabled

 pcsd: active/enabled

Operate RHEL failover cluster instance (FCI) for SQL Server

This describes how to do the following tasks for SQL Server on a shared disk failover cluster with Red Hat Enterprise Linux.

  • Manually failover the cluster
  • Monitor a failover cluster SQL Server service
  • Add a cluster node
  • Remove a cluster node
  • Change the SQL Server resource monitoring frequency

Architecture description

The clustering layer is based on Red Hat Enterprise Linux (RHEL) HA add-on built on top of Pacemaker. Corosync and Pacemaker coordinate cluster communications and resource management. The SQL Server instance is active on either one node or the other.

The following diagram illustrates the components in a Linux cluster with SQL Server.

Failover cluster manually

The resource move command creates a constraint forcing the resource to start on the target node.  After executing the move command, executing resource clear will remove the constraint so it is possible to move the resource again or have the resource automatically fail over.

Bash

sudo pcs resource move <sqlResourceName> <targetNodeName> 

sudo pcs resource clear <sqlResourceName>

The following example moves the mssqlha resource to a node named sqlfcivm2, and then removes the constraint so that the resource can move to a different node later.

Bash

sudo pcs resource move mssqlha sqlfcivm2

sudo pcs resource clear mssqlha

Monitor a failover cluster SQL Server service

View the current cluster status:

Bash

sudo pcs status 

View live status of cluster and resources:

Bash

sudo crm_mon

View the resource agent logs at /var/log/cluster/corosync.log

Add a node to a cluster

  1. Check the IP address for each node. The following script shows the IP address of your current node.

Bash

ip addr show

  • The new node needs a unique name that is 15 characters or less. By default in Red Hat Linux the computer name is localhost.localdomain. This default name may not be unique and is too long. Set the computer name the new node. Set the computer name by adding it to /etc/hosts. The following script lets you edit /etc/hosts with vi.

Bash

sudo vi /etc/hosts

The following example shows /etc/hosts with additions for three nodes named sqlfcivm1, sqlfcivm2, andsqlfcivm3.

127.0.0.1   localhost localhost4 localhost4.localdomain4

::1         localhost localhost6 localhost6.localdomain6

10.128.18.128 fcivm1

10.128.16.77 fcivm2

10.128.14.26 fcivm3

The file should be the same on every node.

  • Stop the SQL Server service on the new node.
  • Follow the instructions to mount the database file directory to the shared location:

From the NFS server, install nfs-utils

Bash

sudo yum -y install nfs-utils

Open up the firewall on clients and NFS server

Bash

sudo firewall-cmd –permanent –add-service=nfs

sudo firewall-cmd –permanent –add-service=mountd

sudo firewall-cmd –permanent –add-service=rpc-bind

sudo firewall-cmd –reload

Edit /etc/fstab file to include the mount command:

Bash

<IP OF NFS SERVER>:<shared_storage_path> <database_files_directory_path> nfs timeo=14,intr

Run mount -a for the changes to take effect.

  • On the new node, create a file to store the SQL Server username and password for the Pacemaker login. The following command creates and populates this file:

Bash

sudo touch /var/opt/mssql/passwd

sudo echo “<loginName>” >> /var/opt/mssql/secrets/passwd

sudo echo “<loginPassword>” >> /var/opt/mssql/secrets/passwd

sudo chown root:root /var/opt/mssql/passwd

sudo chmod 600 /var/opt/mssql/passwd

  • On the new node, open the Pacemaker firewall ports. To open these ports with firewalld, run the following command:

Bash

sudo firewall-cmd –permanent –add-service=high-availability

sudo firewall-cmd –reload

  • Install Pacemaker packages on the new node.

Bash

sudo yum install pacemaker pcs fence-agents-all resource-agents

  • Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password as the existing nodes.

Bash

sudo passwd hacluster

  • Enable and start pcsd service and Pacemaker. This will allow the new node to rejoin the cluster after the reboot. Run the following command on the new node.

Bash

sudo systemctl enable pcsd

sudo systemctl start pcsd

sudo systemctl enable pacemaker

  1. Install the FCI resource agent for SQL Server. Run the following commands on the new node.

Bash

sudo yum install mssql-server-ha

  1. On an existing node from the cluster, authenticate the new node and add it to the cluster:

Bash

sudo pcs    cluster auth <nodeName3> -u hacluster

sudo pcs    cluster node add <nodeName3>

The following example adds a node named vm3 to the cluster.

Bash

sudo pcs    cluster auth 

sudo pcs    cluster start

Remove nodes from a cluster

To remove a node from a cluster run the following command:

Bash

sudo pcs    cluster node remove <nodeName> 

Change the frequency of sqlservr resource monitoring interval

Bash

sudo pcs    resource op monitor interval=<interval>s <sqlResourceName>

The following example sets the monitoring interval to 2 seconds for the mssql resource:

Bash

sudo pcs    resource op monitor interval=2s mssqlha

Troubleshoot Red Hat Enterprise Linux shared disk cluster for SQL Server

In troubleshooting the cluster it may help to understand how the three daemons work together to manage cluster resources.

TROUBLESHOOT RED HAT ENTERPRISE LINUX SHARED DISK CLUSTER FOR SQL SERVER
DaemonDescription
CorosyncProvides quorum membership and messaging between cluster nodes.
PacemakerResides on top of Corosync and provides state machines for resources.
PCSDManages both Pacemaker and Corosync through the pcs tools

PCSD must be running in order to use pcs tools.

Current cluster status

sudo pcs status returns basic information about the cluster, quorum, nodes, resources, and daemon status for each node.

An example of a healthy pacemaker quorum output would be:

Cluster name: MyAppSQL

Last updated: Wed Oct 31 12:00:00 2016  Last change: Wed Oct 31 11:00:00 2016 by root via crm_resource on sqlvmnode1

Stack: corosync

Current DC: sqlvmnode1  (version 1.1.13-10.el7_2.4-44eb2dd) – partition with quorum

3 nodes and 1 resource configured

Online: [ sqlvmnode1 sqlvmnode2 sqlvmnode3]

Full list of resources:

mssqlha (ocf::sql:fci): Started sqlvmnode1

PCSD Status:

sqlvmnode1: Online

sqlvmnode2: Online

sqlvmnode3: Online

Daemon Status:

corosync: active/disabled

pacemaker: active/enabled

In the example, partition with quorum means that a majority quorum of nodes is online. If the cluster loses a majority quorum of nodes , pcs status will return partition WITHOUT quorum and all resources will be stopped.

online: [sqlvmnode1 sqlvmnode2 sqlvmnode3] returns the name of all nodes currently participating in the cluster. If any nodes are not participating, pcs status returns OFFLINE: [<nodename>].

PCSD Status shows the cluster status for each node.

Reasons why a node may be offline

Check the following items when a node is offline.

  • Firewall

The following ports need to be open on all nodes for Pacemaker to be able to communicate.

  • **TCP: 2224, 3121, 21064
  • Pacemaker or Corosync services running
  • Node communication
  • Node name mappings