High availability disaster recovery (HADR)

Do you want to SAVE?
Switch to us!

✔️ Corporate Email M365. 50GB per user
✔️ 1 TB of cloud space per user

What is HADR?

“HADR” stands for High Availability Disaster Recovery.

HADR is a feature of DB2 that replicates data in real time from a primary database to one or more secondary databases, also known as standby databases. The primary database is open for read and write operations, while the secondary or standby databases can only be in read mode and are divided into Primary Standby and Auxiliary Standby.

Primary Standby: This is the database where the synchronization mode configured on the primary server will be applied. Only one database can be Primary Standby at a time, but it can rotate with other existing standby databases in the cluster.
Auxiliary Standby: This refers to any other standby database that is not the Primary Standby. Its configuration is recommended for Disaster Recovery (DR). The synchronization mode used for these databases is SUPERASYNC, which I will explain later.

That said, high availability is based on duplicating the hardware and software in such a way that if something happens to our primary server, we will have another server that can take over as the primary server, thus avoiding service interruption for our users.

High availability is also very useful in scenarios where we wish to perform maintenance on our servers or database without affecting system availability. For example, to apply patches to the operating system or upgrade the database version, maintenance can start on the secondary servers. Subsequently, we move the primary database to one of the auxiliary servers, which will become the new primary. Finally, we apply patches to the server that used to be the primary.

What Does It Offer?

High Availability Disaster Recovery (HADR) provides a high availability solution for partial and complete site failures. HADR protects against data loss by replicating data changes from a source database, referred to as the primary database, to target databases, referred to as standby databases. HADR supports up to three remote standby servers.

A partial site failure can be due to hardware, network, or software (DB2® database system or operating system) failures. Without HADR, a partial site failure requires restarting the database management system (DBMS) server that contains the database. The time it takes to restart the database and the server it resides on is unpredictable.

It may take several minutes before the database returns to a consistent state and is available. With HADR, a standby database can take over in seconds. Furthermore, it can redirect clients that used the original primary database to the new primary database through automatic client redirection or retry logic in the application.

A complete site failure may occur when a disaster, such as a fire, causes the entire site to be destroyed. However, since HADR uses TCP/IP for communication between the primary and standby databases, they can be located in different places. For instance, the primary database could be in your headquarters in one city, while a standby database could be in your sales office in another city.

If a disaster occurs at the primary site, data availability is maintained by promoting the remote standby database to become the primary database with full DB2 functionality.

After a failover operation occurs, you can recover the original primary database and revert it back to its primary database state; this procedure is known as failover recovery. You can initiate a failover recovery if you can make the former primary database consistent with the new primary database.

After reintegrating the old primary database into the HADR configuration as a standby database, you can switch the roles of the databases. This operation would allow the original primary database to become the primary database again.

How to Keep Servers in Sync?

Servers are kept in sync by sending logs from the primary server to the secondary server.

It is important to note that the logs transferred from the primary server come from the Log Buffer, not from the log files.

The primary server contains a sending buffer for HADR, as well as a log buffer and log files, while the secondary server contains a receiving buffer for HADR, along with a log buffer and log files.

The logs will be sent to the standby server depending on the synchronization mode chosen. It is essential to understand the synchronization modes to make the right decision before configuring HADR on our databases. Let's see what happens in each of them.

SYNC Mode (Synchronous)

In the case of a COMMIT, it is considered successful only when the logs have been written to disk in both the primary and secondary databases.

Here’s what happens in detail. When a COMMIT occurs in the primary database, the sending buffer for HADR in the primary server sends the log pages across the network to the receiving buffer for HADR in the secondary server. The secondary server receives the logs and sends them to the log buffer. Finally, it writes them to disk in the standby database. Then, the secondary server sends a confirmation to the primary server indicating that the transaction has been written to disk. Subsequently, the transaction is written to the log buffer of the primary server, and finally to disk. This is when the COMMIT is considered successful.

As we can see, this scenario maximally protects the data. However, it could impact the performance of the database because we are writing to disk twice and depend on the speed of the network response.

This synchronization mode is recommended for databases that are in close proximity, as well as for a fast network. This means it is a good option for High Availability (HA).

NEARSYNC Mode (Almost Synchronous)

In this mode, logs are written to disk in the primary server and, in parallel, sent to the secondary server. When the receiving buffer for HADR in the secondary server receives the logs, it sends a confirmation to the primary server before writing them to disk.

Note that the database does not wait for the logs to be written in the secondary server, which improves performance. However, there is still a network dependency, so it is recommended that the databases be in close proximity. It is also recommended for High Availability (HA), but not for Disaster Recovery (DR) due to the dependency on geographic distance.

There could be data loss if both the primary and secondary servers are lost at the same time, and we cannot recover the primary server, with the logs existing only in the memory of the secondary server at the time of the loss. This scenario is very unlikely, which is why many prefer to take the risk to improve performance.

ASYNC Mode (Asynchronous)

Logs are written to disk in the primary server and, in parallel, sent to the secondary server. The primary server does not wait for the secondary server to confirm receipt of the data.

This mode eliminates the dependency on geographic locality and increases performance. However, if the primary server is lost, there is a greater risk of losing logs that are in transit to the secondary server. Due to this risk, this synchronization mode is more appropriate for disaster recovery (DR), where only the auxiliary database is needed in case of losing our primary data center.

SUPERASYNC Mode (Super Asynchronous)

When a COMMIT occurs, the logs are written to disk in the primary database, and this is considered a successful COMMIT. There is no verification that the data has been sent to the auxiliary database, so performance is not impacted, but the risk of data loss is high.

This mode is recommended for disaster recovery (DR) when your HADR configuration has a third or fourth server.

Do you want to SAVE?
Switch to us!

✔️ Corporate Email M365. 50GB per user
✔️ 1 TB of cloud space per user

High availability disaster recovery (HADR)

Anthropic warns about the risks of artificial intelligence despite leading the AI race

Bob and the new era of software development: The end of engineers or the beginning of a productive revolution?

Meta intensifies its offensive against NSO Group: The case that could forever change the global digital espionage industry

base de datos

middleware

sistemas operativos

servicios

¿Quieres AHORRAR? ¡Cámbiate con nosotros!

¡Compártenos tus datos de contacto y nos comunicaremos contigo!