At face value, immutable infrastructure seems like a process best suited for large business that deals with consistently changing infrastructure. Managing infrastructure at scale is a valid use case but it is not the only one, and a large environment isn’t a requirement to benefit.
This article follows on for the previous article “Introducing Immutable Infrastructure” and covers several use cases.
Managing infrastructure at scale
Headaches caused by small configuration inconsistencies grow exponentially as an infrastructure scales out. The monitoring footprint increases, which reduces the signal to noise ratio, allowing configuration drifts to remain hidden until too late.
Immutable infrastructure prevents configuration drift by validating the current state against what is defined.
A key point to make is that most configuration managers only check settings they are configured to check and ignore the rest, creating a blind spot.
Typically, it is not feasible to configure every possible setting to address the above-mentioned blind spot. It is desirable only to specify setting configuration where needed and leave the rest as default; think minimum viable product.
As the number of systems increases, so does the chance of configuration changes outside the scope of settings specified.
The primary mechanism to counter drift is not just to apply configuration but rebuild infrastructure components periodically. For example, Microsoft destroys and rebuild servers on a monthly basis in Azure, something not possible without immutable infrastructure.
Servers fail all the time, and after a certain scaling point there is always something in a failed state, this goes for hardware and software. Immutable infrastructure enables the ability to replace or rebuild failed components rapidly.
Immutable infrastructure simplifies the replacement of physical devices as the latest configuration is immediately available.
Similar approaches to the server replacement can be used for software infrastructure components such as a Kubernetes management node.
Managed service providers
Backstory time. I have spent most of my time in the IT industry working at managed service providers and system integrators. A common theme across my employers was that they have a stack that they were comfortable to deploy and manage. Similar infrastructure and configurations were present at a majority of clients.
The point of the above is that by defining a modular and scalable, standardized infrastructure to clients, investment into code-based deployment and management could have had a significant ROI. Build times would have been reduced, knowledge between clients would have been transferable, and certain services could have been automated increasing the profit margins.
Moving to an immutable infrastructure approach requires upfront investment in R&D and client research. If successfully done, each client appears most like others, which means technicians can become agnostic between clients.
There can be some benefit in reducing the risk, which comes with deploying updates. However, the application layer may see significant differences between clients and negatively impact this aspect of immutable infrastructure.
Using immutable infrastructure increases the speed to build or rebuild infrastructure, this translates to faster turn-around times for project delivery. Additionally, providing a common platform enables an MSP to maintain additional devices, which can be rapidly deployed to clients in the event of an outage, using immutable infrastructure to get the device configured for use rapidly.
Even with best intentions change management can be a thorn in the side, with a common gripe being the need to answer the same questions every time you need to make a change. From combining unrelated changes to shadow IT, engineers are quite skilled in sidestepping change process.
For immutable infrastructure to work with confidence, code and configurations must be checked multiple times from commit to posting run validations. These tests form part of the CICD pipeline to ensure that they are run efficiently and not skipped.
From a change management perspective, tests are providing definite answers to the questions from the change board. Therefore, a thorough and reliable testing strategy can pave the way to improve the change process and automate away lengthy CIPs.
Errors at any stage during the process can trigger actions to prevent unexpected impacts or ending up with a system in an inconsistent state. For example, if a change to networking devices are executed, and route tables do not converge, the last successful configuration could be applied to roll back the configuration to a known good state.