Upgrade Availability Groups on Linux

The following sections explain how to perform a rolling upgrade with SQL Server instances on Linux with availability groups.

Upgrade steps on Linux

When availability group replicas are on instances of SQL Server in Linux, the cluster type of the availability group is either EXTERNAL or NONE. An availability group that is managed by a cluster manager besides Windows Server Failover Cluster (WSFC) is EXTERNAL. Pacemaker with Corosync is an example of an external cluster manager. An availability group with no cluster manager has cluster type NONE The upgrade steps outlined here are specific for availability groups of cluster type EXTERNAL or NONE.

The order in which you upgrade instances depends on if their role is secondary and whether or not they host synchronous or asynchronous replicas. Upgrade instances of SQL Server that host asynchronous secondary replicas first. Then upgrade instances that host synchronous secondary replicas.

Before you begin, back up each database.

  1. Stop the resource on the node hosting the secondary replica targeted for upgrade.

Before running the upgrade command, stop the resource so the cluster will not monitor it and fail it unnecessarily. The following example adds a location constraint on the node that will result on the resource to be stopped. Update ag_cluster-master with the resource name and nodeName1 with the node hosting the replica targeted for upgrade.

Bash

pcs constraint location ag_cluster-master avoids nodeName1

  • Upgrade SQL Server on the secondary replica.

The following example upgrades mssql-server and mssql-server-ha packages.

Bash

sudo yum update mssql-server

sudo yum update mssql-server-ha

  • Remove the location constraint.

Before running the upgrade command, stop the resource so the cluster will not monitor it and fail it unnecessarily. The following example adds a location constraint on the node that will result on the resource to be stopped. Update ag_cluster-master with the resource name and nodeName1 with the node hosting the replica targeted for upgrade.

Bash

pcs constraint remove location-ag_cluster-master-rhel1–INFINITY

As a best practice, ensure the resource is started (using pcs status command) and the secondary replica is connected and synchronized state after upgrade.

  • After all secondary replicas are upgraded, manually fail over to one of the synchronous secondary replicas.

For availability groups with EXTERNAL cluster type, use the cluster management tools to fail over; availability groups with NONE cluster type should use Transact-SQL to fail over. The following example fails over an availability group with the cluster management tools. Replace <targetReplicaName> with the name of the synchronous secondary replica that will become primary:

Bash

sudo pcs resource move ag_cluster-master <targetReplicaName> –master 

The following steps only apply to availability groups that do not have a cluster manager.

If the availability group cluster type is NONE, manually fail over. Complete the following steps in order:

a. The following command sets the primary replica to secondary. Replace AG1 with the name of your availability group. Run the Transact-SQL command on the instance of SQL Server that hosts the primary replica.

transact-sql

ALTER AVAILABILITY GROUP [ag1] SET (ROLE = SECONDARY);

b. The following command sets a synchronous secondary replica to primary. Run the following Transact-SQL command on the target instance of SQL Server – the instance that hosts the synchronous secondary replica.

transact-sql

ALTER AVAILABILITY GROUP [ag1] FAILOVER;

  • After failover, upgrade SQL Server on the old primary replica by repeating the preceding procedure.

The following example upgrades mssql-server and mssql-server-ha packages.

Bash

# add constraint for the resource to stop on the upgraded node

# replace ‘nodename2’ with the name of the cluster node targeted for upgrade

pcs constraint location ag_cluster-master avoids nodeName2

sudo yum update mssql-server

sudo yum update mssql-server-ha

Bash

# upgrade mssql-server and mssql-server-ha packages

sudo yum update mssql-server

sudo yum update mssql-server-ha

Bash

# remove the constraint; make sure the resource is started and replica is connected and synchronized

pcs constraint remove location-ag_cluster-master-rhel1–INFINITY

  • For an availability groups with an external cluster manager – where cluster type is EXTERNAL, clean up the location constraint that was caused by the manual failover.

Bash

sudo pcs constraint remove cli-prefer-ag_cluster-master 

  • Resume data movement for the newly upgraded secondary replica – the former primary replica. This step is required when a higher version instance of SQL Server is transferring log blocks to a lower version instance in an availability group. Run the following command on the new secondary replica (the previous primary replica).

transact-sql

ALTER DATABASE database_name SET HADR RESUME;

After upgrading all servers, you can fail back. Fail over back to the original primary – if necessary.

Drop an availability group

To delete an availability group, run DROP AVAILABILITY GROUP. If the cluster type is EXTERNAL or NONE run the command on every instance of SQL Server that hosts a replica. For example, to drop an availability group named group_name run the following command:

transact-sql

DROP AVAILABILITY GROUP group_name

Configure multiple-subnet Always On Availability Groups and failover cluster instances

When an Always On Availability Group (AG) or failover cluster instance (FCI) spans more than one site, each site usually has its own networking. This often means that each site has its own IP addressing. For example, Site A’s addresses start with 192.168.1.x and Site B’s addresses start with 192.168.2.x, where x is the part of the IP address that is unique to the server. Without some sort of routing in place at the networking layer, these servers will not be able to communicate with each other. There are two ways to handle this scenario: set up a network that bridges the two different subnets, known as a VLAN, or configure routing between the subnets.

VLAN-based solution

Prerequisite: For a VLAN-based solution, each server participating in an AG or FCI needs two network cards (NICs) for proper availability (a dual port NIC would be a single point of failure on a physical server), so that it can be assigned IP addresses on its native subnet as well as one on the VLAN. This is in addition to any other network needs, such as iSCSI, which also needs its own network.

The IP address creation for the AG or FCI is done on the VLAN. In the following example, the VLAN has a subnet of 192.168.3.x, so the IP address created for the AG or FCI is 192.168.3.104. Nothing additional needs to be configured, since there is a single IP address assigned to the AG or FCI.

Configuration with Pacemaker

In the Windows world, a Windows Server Failover Cluster (WSFC) natively supports multiple subnets and handles multiple IP addresses via an OR dependency on the IP address. On Linux, there is no OR dependency, but there is a way to achieve a proper multi-subnet natively with Pacemaker, as shown by the following. You cannot do this by simply using the normal Pacemaker command line to modify a resource. You need to modify the cluster information base (CIB). The CIB is an XML file with the Pacemaker configuration.

Update the CIB

  1. Export the CIB.

Red Hat Enterprise Linux (RHEL) and Ubuntu

Bash

sudo pcs cluster cib <filename>

SUSE Linux Enterprise Server (SLES)

Bash

sudo cibadmin -Q > <filename>

Where filename is the name you want to call the CIB.

  • Edit the file that was generated. Look for the <resources> section. You will see the various resources that were created for the AG or FCI. Find the one associated with the IP address. Add a <instance attributes> section with the information for the second IP address either above or below the existing one, but before <operations>. It is similar to the following syntax:

XML

<instance attributes id=”<NameForAttribute>”>

    <nvpair id=”<NameForIP>” name=”ip” value=”<IPAddress>”/>

</instance attributes>

where NameForAttribute is the unique name for this attribute, NameForIP is the name associated with the IP address, IPAddress is the IP address for the second subnet.

The following shows an example.

XML

<instance attributes id=”virtualip-instance_attributes”>

    <nvpair id=”virtualip-instance_attributes-ip” name=”ip” value=”192.168.1.102″/>

</instance attributes>

By default, there is only one in the CIB XML file exported. Let’s say there are two subnets, you need to have two entries. Here is an example of entries for two subnets

XML

<instance attributes id=”virtualip-instance_attributes1″>

    <rule id=”Subnet1-IP” score=”INFINITY” boolean-op=”or”>

        <expression id=”Subnet1-Node1″ attribute=”#uname” operation=”eq” value=”Node1″ />

        <expression id=”Subnet1-Node2″ attribute=”#uname” operation=”eq” value=”Node2″ />

    </rule>

    <nvpair id=”IP-In-Subnet1″ name=”ip” value=”192.168.1.102″/>

</instance attributes>

<instance attributes id=”virtualip-instance_attributes2″>

    <rule id=”Subnet2-IP” score=”INFINITY”>

        <expression id=”Subnet2-Node1″ attribute=”#uname” operation=”eq” value=”Node3″ />

    </rule>

    <nvpair id=”IP-In-Subnet2″ name=”ip” value=”192.168.2.102″/>

</instance attributes>

The ‘boolean-op=”or”‘ is used when the subnet has more than one server.

  • Import the modified CIB and reconfigure Pacemaker.

RHEL/Ubuntu

Bash

sudo pcs cluster cib-push <filename>

SLES

Bash

sudo cibadmin -R -x <filename>

where filename is the name of the CIB file with the modified IP address information.

Check and verify failover

  1. After the CIB is successfully applied with the updated configuration, ping the DNS name associated with the IP address resource in Pacemaker. It should reflect the IP address associated with the subnet currently hosting the AG or FCI.
  2. Fail the AG or FCI to the other subnet.
  3. After the AG or FCI is fully online, ping the DNS name associated with the IP address. It should reflect the IP address in the second subnet.
  4. If desired, fail the AG or FCI back to the original subnet.

Here is a CSS post showing how to configure the CIB for three subnets, please review for detail: Configure multiple-subnet AlwaysOn Availability Group by modifying CIB.