Description: Network Outage
Date: 29 Oct: 13:30hrs - 14:30 hrs
Severity: Complete outage
We use LXC containers to provide high availability (HA) redundancy with all core assets within our business. As part of our normal growth our DevOps team is continuing to add new server infrastructure to our core clusters.
This outage was caused by the additional of a new physical node into our LXC Container cluster. We believe the fault was a software issue within the distributed filesystem required for HA and redundancy between containers.
While this shouldn’t have happened, we will continue investigating the underlying cause; however, the simple resolution is to avoid adding new nodes (or servers) during operational hours.
Mike Johnstone | CTO