Auto Configuration Roll-Back for Cloud Connectivity

The Needs / Function Introduction

Incorrect device configurations pushed by the EnGenius Cloud, such as incorrect VLAN configuration, incorrect IP address settings, incorrect management VLAN configuration,configuration file incompatibility after firmware version upgrades”,or “Unexpected software or hardware anomalies in devices, such as system failures, insufficient memory, application/protocol crashes, and uplink port problems", often result in lost connection from the EnGenius Cloud. As usual, the ONLY solution is for an operator to go ON-SITE to directly connect to the device for further configuration and troubleshooting. This not only wastes manpower but also results in prolonged downtime for the device, leading to network instability for downstream devices.

To enable devices to automatically attempt to resolve issues without on-site support,rapid fault recovery, reduce labor costs, and minimize downtime, we have introduced an automatic self-healing mechanism for EnGenius devices. This mechanism can attempt to (1) rollback configuraiton to a previous stable configuration or (2) reboot the device if it loses connection to the cloud. It can gracefully handle connectivity issues, providing an opportunity for administrators to quickly address urgent network disconnections.

How it works

What is Stable Configuration and How is it Generated

Stable Configuration is a key feature of EnGenius Cloud, designed to ensure connectivity stability after configuration updates. When EnGenius Cloud pushes a new configuration to a device, it may result in the device losing connection. To resolve connectivity issues as much as possible, the primary purpose of Stable Configuration is to quickly revert to the previous stable connection settings (Stable Configuration) after EnGenius Cloud push configuration to the device and a disconnection occurs.

Therefore, whenever a new configuration is pushed to the device, it will go through a series of state machines to determine if this new configuration can become the next stable configuration, as illustrated in Figure 1.

Stable configuration is generated when a device remains connected to the cloud and has not rebooted within 30 minutes after an EnGenius Cloud push configuration change to the device. Essentially, a stable configuration is the latest configuration pushed by EnGenius Cloud that has not resulted in a lost connection or been followed by a reboot within the subsequent 30 minutes.

A 30-minute duration is selected to confirm that the configuration is reliable, ensuring the device maintains a successful connection to the Cloud over time.

Figure 1 How to Generate a Stable Configuration

Step 1. EnGenius Cloud pushes configuration to EnGenius device at T1 time.

Step 2. After the configuration is pushed, if the device meets the following requirements for 30 consecutive minutes, it will proceed to Step 3. Otherwise, the device will enter a Self-healing mechanism:

(1) Remains connected to the cloud

(2) Does not reboot

Step 3. The EnGenius device will copy the running configuration applied at T1 time to the stable configuration.

What are Self-Healing Mechanisms, Why is it Needed, and When to Activate it

Activating the device's automatic self-healing mechanism can gracefully handle "lose connection to the cloud" and help administrators quickly resolve urgent device connection issues.

What is the definition of "lose connection to the cloud " for EnGenius devices?

  • For EnGenius Gateway: Loss of WAN connection: Unable to ping 8.8.8.8/domain continuously for 2 hours.

  • For EnGenius AP/Switch/PDU/Extender: Gateway ARP unreachability: Continuous failure in the gateway ARP reachability test / inability to ping the Gateway for 2 hours.

The 2-hour duration is chosen to ensure that a "roll-back" is not triggered too easily, preventing new configurations from being applied due to temporary disconnections or device reboots. Without the rollback self-healing mechanism, the device cannot be managed remotely by the Cloud for diagnostics or firmware upgrades. and require someone travels on-site to access LSP or do reboot.

When to Activate Self-healing mechanisms

When the device loses connection to the cloud, the EnGenius device will activate different Self-healing mechanisms depending on the cause of the disconnection: (1) pushed configuraiton by cloud or (2) other issues unrelated to cloud push settings, such as device malfunctions, etc., to restore the connection. We expect that the device will reconnect to the EnGenius Cloud after the self-healing process.

  • (1) Pushed configuraiton by cloud

    When the cloud pushes a configuration to the device, and the new configuration has not yet become a stable configuration, the device may lose connection. Since this situation is often caused by incorrect configuration settings leading to disconnection, the device will activate a self-healing mechanism to revert to the Stable Configuration state prior to the erroneous settings, as illustrated in Figure 2.

Figure 2. Self-healing mechanisim- Rollback running configuraiton to Stable Configuration

Step 1. EnGenius Cloud pushes configuration to EnGenius device at T2 time.

Step 2 and Step3. If (1) EnGenius device loses connection and has not restored connectivity within 2 hours, and (2) the new configuration applied at T2 time has NOT become the stable configuration, EnGenius devices will activate a Self-healing mechanism to revert to the previous stable configuration, such as the one applied at T1 time.

Step 4. When the device reconnects to the cloud, it will send an event log to EnGenius Cloud containing information about reverting to the stable configuration.

  • (2) Other issues unrelated to cloud push settings Before disconnection, if the device is in a stable connection state, the running configuration will be copied as the stable configuration. In this lost connection situation, it could indicate an issue with the upstream network equipment or a problem with the device's own system or hardware.

    • For EnGenius Gateway: it will attempt to reboot once after 8 consecutive hours of lost connection to restore normal operation, as illustrated in Figure 3. Rebooting the gateway will restore the EnGenius Gateway system (software & hardware) to a healthy and stable state, including CPU, memory, and application operations. Once rebooted, the EnGenius gateway may reconnect to the Cloud, and all downstream devices on the gateway's LAN side may return to their normal routing paths, resolving urgent issues.

    Figure 3. EnGenius Gateway Self-healing mechanisim - Reboot

Step 1. EnGenius Gateway loses connection.

Step 2. When the device is in a Stable State, it means the Stable Configuration is the same as the running configuration, and then EnGenius gateway loses connection with the cloud for 8 consecutive hours.

Step 3. EnGenius Gateway will active a Self-healing mechanism and Reboot the device once.

Step 4. Send an event log to EnGenius Cloud to log that the "Reboot" action was activated by the self-healing mechanism when reconnected to the Cloud.

  • For EnGenius APs, Switches, PDUs, and Extenders:

    • EnGenius Switch/AP/extender/PDU will NOT reboot.

    • Network administrators often need to further troubleshoot issues due to connection losses often caused by other netwrok devices in the path between the AP and its gateway. For instance, if an AP can't connect, they should check for failures or routing problems between the AP and its gateway.

How can users know if the device has activated the self-healing mechanism to perform a rollback configuration?

When an EnGenius device rollbacks to a "Stable Configuration" via an activated Self-healing mechanism and reconnects to the cloud, users can be notified/ know through the following methods:

  1. Event Log - Navigate to ANALYZE > Device Event to view the Event log with Rollback Configuration Type.

    Event Log Page

    - An additional "Rollback Configuration" Type can be used for filtering.

    Event Tyep - Rollback Configuraiton
  2. Alert Click the bell/alert icon in the top right corner to view the event log of the rollback configuration.

  3. Email When the event of the rollback configuration occurs, if the user has enabled email notifications, the EnGenius Cloud will send notification emails to the administrator based on the configured interval.

  4. App Notification When the event occurs, if the user has enabled Mobile App notifications, EnGenius Cloud will immediately send App notification to the administrator.

  5. Configuration Status on the Device’s Detail page Navigate to MANAGE > AP/Switch/Gateway/PDU/Extender > Select Specific Device > Click Details. Users can view the configuration status through this.

  6. Configuration Status on the Device’s Status in Device’s list page. For the detail, user can see "Configuration Rollback" Status and please refer to Device status" in the Device Status and Warning Enhancement section.

  1. Cloud Configuraiton on Local Status Page(LSP) When the device rollback to a stable configuration, it may still be in a lost connection state, or the upstream connection may be unable to access the internet, preventing the user from entering EnGenius Cloud to troubleshoot the issue. Users can access the Local Status Page (LSP) to check the current configuration status. Users can also check the “Cloud Configuration” field on the LSP page to see if the Self-healing mechanism has been activated to revert the device to its Stable Configuration.

"Cloud Configuration" ONLY appears when the device's settings revert to a stable configuration. This means that once the cloud pushes new configuration files again, this information will disappear.

Q&A

How users can utilize existing information to handle incorrect configurations.

Once the device has been restored to a Stable Configuration through the self-healing mechanism, users and FAEs can access the change log page & Event log page to track configuration changes in chronological order, as the Stable Configuration includes timestamps.

Move device between Networks

When the device moves to another network, all rollback data & Stable Configuration will be erased and reset to default values.

Availability

AP: v1.x.85

Switch: v1.2.100

Extender: v1.0.25

Gateway: v1.2.65

PDU: v1.0.25

Last updated

Was this helpful?