Manage High Availability Errors

This section describes the concepts of High Availability error management. It also describes the following use cases:

  • Disconnection issues between the appliances (short or long outage)
  • Service issues
  • Hardware installation problems
  • View conflict warnings

Conflict Management

In a High Availability Dual Mode deployment, the two nodes are symmetric.

You can perform other daily operations from either one of the nodes. As long as the nodes are synchronized, each operation is replicated to the other node.

Important: Although it is possible to carry out most management operations from either one of the nodes (as noted above), it is recommended to use the same node as much as possible. Using a different node should be limited to emergency situations. This will greatly reduce conflict management-related tasks.

When the two nodes become desynchronized, all features are still available on both nodes (except for two exceptions - add and delete domain operations).

Therefore, data update conflict might appear if the same data is updated with different values on the two nodes (during a short network outage, or simultaneously on both nodes).

For example, while the appliances are disconnected, you can update the same field with different values on both nodes, or create the same user name twice. This will result in a data conflict when the appliances are reconnected.

The following sections describe how such conflicts are managed and audited. Monitoring and Reporting also describes how they are monitored.

If you run into any conflicts, contact HID Global Technical Support for assistance on their resolution.

Data Conflict Management

The system can resolve potential database conflicts based on the following principles:

  • Timing – when 2 conflicting updates are identified, the last one wins, the earlier one is discarded. This applies to conflicting data update and data creation.
  • For example, during a short outage, one administrator updates one field for a user at T0 on the first appliance, and another operator updates this same field at T0+1 on the second appliance. Once the appliances are reconnected, the last update made is inserted into the database while the update made on the first appliance will be discarded.

  • Deletion wins – when a deletion conflicts with a data update, the deletion wins.
  • Most usage wins – when a token has a higher total usage on one of the nodes (that is, the total number of failed and/or successful authentication). This information will be used to update the database (instead of taking into account the last update time).
Note:  
  • If conflicts are detected between the two appliances when they are reconnected, you will receive a warning.
  • If conflict resolution fails for some objects, you will receive both a warning and a short list of items to check, once the appliances are in a synchronized state. See Data Replication in Dual Mode.

Conflict Resolution Auditing

Automatic changes to database content due to conflict resolution are audited in a dedicated table (that is, not using the ActivID Server audit feature). Changes are audited locally. On each node, changes are audited on the node only.

The following is audited:

  • Date and time of the conflict resolution
  • Date and time of the conflict
  • Security domain and database table name
  • Change type - Insert/Update/Delete
  • Row with conflicting data
  • Row before change
  • Row after conflict resolution

The conflict resolution audit cannot be accessed until archived. It is archived with ActivID Authentication Server audit archive. It is purged together with ActivID Authentication Server audit purge. The conflict resolution audit archive is also uploaded to the same location. The archive file name is Audit_conflict_XXXXXXXX.tar.

Manage Short Outage

The following are the reasons why a short outage could occur:

  • A short power outage for one of the appliances occurs.
  • A short network disconnection between the two appliances occurs.
  • Hardware maintenance is required, and the appliance must be shut down for a short period of time.

Since the shutdown is short (that is, as long as the other appliance can store the synchronization data while waiting for the synchronization to be initiated again), it does not impact the activity on the other appliance.

During the process, both databases are no longer synchronized, but the difference is recoverable. The active appliance continues to record database updates. These updates will be sent to the other appliance when it is up and running again.

  • There is no noticeable interruption of service for end users. Certain sessions on the ActivID Management Console or Self-Service Portal might be closed. However, you can log on to the applications again.
  • There is no data loss.
  • A RADIUS authentication could be aborted, but the next one might be successful on both nodes or only on one node.

The duration over which the two nodes still remains Out Of Sync. Recoverable depends on the following factors:

  • The activity of the service and the number of authentication per second
  • The network bandwidth between the two nodes
  • The lag time between the two nodes

When you restart node B or the network resumes connection, the database is automatically synchronized.

As the recovery is automatic, the Synchronization Status switches to the Synchronized state. No manual intervention is required.

If you do not want to trigger the automatic recovery process, you can click Cancel Synchronization when the node is in the Out of Synchronization state.

Manage Long Outage

When the outage of Node B is long, Node A automatically cancels the synchronization.

Note: If you plan the shutdown in advance, it is recommended that you perform a manual Cancel Synchronization. This updates the status on both nodes.

During the process, both databases are no longer synchronized, and the difference is not recoverable automatically.

To perform a manual recovery, you have two options:

  • Repair the connection issue between the appliances, and initiate the synchronization again.
  • Note: The Initiate Synchronization option is available only when the connection between the appliances is restored.

Manage Hardware Issues

Note: This section is only applicable to hardware appliances.

When a serious hardware issue occurs, the Dual Mode configuration must be stopped manually.

For example, if the appliance (Node B – the second appliance) experiences a hardware failure:

  1. Shut down the second appliance.
  2. Set the first appliance back to the single mode.
  3. If the hardware failure on the second appliance is fixed, perform a factory reset. Otherwise, replace it with a new hardware appliance.
  4. On the first appliance, set it to Dual Mode, and download the configuration file.
  5. On the second appliance, set it to Dual Mode using the first appliance’s configuration file.