Immutable infrastructure is the end goal of adopting Infrastructure as Code into an environment. Infrastructure configuration is performed entirely through code and configuration files, with validation checks ensuring that the running configuration always matches the defined desired state.
Achieving a genuinely immutable environment requires significant process maturity from infrastructure management to code management. There must be an inherent trust in the operations teams who are writing the code and configurations. Therefore, the business must be able to trust the code.
There are no shortcuts for building trust and maturity, built through both successful and unsuccessful experiences. Learning from other external experiences is a great way to help prevent making someone else’s mistakes but is not a shortcut.
The adoption of infrastructure as code practices provides gradual steps towards immutable infrastructure, building a framework for deploying code-based configuration. The reason to differentiate immutable infrastructure and infrastructure as code is the keyword ‘immutable.’
Achieving immutability means that there is no difference between settings defined and those implemented. This requires an end-to-end automation of code shipping and validation; configuration updates need to begin after acceptance of new a configuration into the master repository.
Periodic checks need to take place to detect configuration drift and trigger remediation tasks or alerts upon finding drift.
The definition of infrastructure is not limited to physical hardware; it extends to software platforms such as Kubernetes, Ceph, vSphere, and OpenStack.
API exposure through protocols such as REST and RedFish is improving the ability to interact programmatically and manage endpoints. Standard API’s improve the ability to interact but do not simplify the task at hand.
Putting immutable infrastructure into practice
For the rest of this article, I’d like to highlight some of the key concepts for immutable infrastructure.
Defining the desired state: To begin, you’ll need to define the settings that are required for the infrastructure; this could be DNS or NTP settings or anything else. If the target infrastructure for configuration already exists (brownfield), as-built documentation could be a good initial source of information. Don’t forget to validate against what is configured on the running environment as the documentation could be out of date.
Most configuration managers don’t require all settings to be applied from the start. You could start with a limited scope to build an understanding or validate the concept before expanding the settings covered. If you take this approach, it can be helpful to maintain a list of total settings and track the coverage status of each one.
Tests: Tests are run at various stages to decrease the risk of mistakes and increase the overall chance of success. Proper testing provides a high percentage of coverage, accounting for multiple success and failure scenarios that might occur.
Unit tests test the code at a function level, checking the output of a function meets an expected value based on a specified input. Test-driven design dictates writing unit tests before the code is written to pass the tests.
Integration tests are used to test integrations between code functions or between systems. They can be difficult to perform.
Orchestrators: Avoiding service interruption when making infrastructure changes usually requires additional tasks to run first. For example, moving workloads from one host to another before starting a task.
Orchestrators link multiple workflows and provide external logic specific to the task. Perhaps before configuring a switch, the orchestrator might run checks to ensure that its redundant pair is functioning and gracefully failover services.
CICD Pipelines: Pipelines handle moving code through different development environments into production, triggering validation and compliance tests.
The goal of using pipelines is to automate getting the code from the developer’s computer, integrate it with the main code base, and deliver that code. In this case could be configuration file changes.
Workflow Execution: A trigger is an event which starts a workflow. A trigger could be a user manually starting the workflow, or a scheduled task, or event based.
A scheduled trigger runs periodically and may require a ‘splay’ to stagger start times. A splay adds slight randomization to the start time within a given scope, typically ± the configured value.
A common event-based trigger is running when a git pull request is received, starting workflows to run various tests against the submitted code which must pass before the code can be merged. A merger could then trigger workflows to apply the new configuration to infrastructure immediately.
Using triggers to automate the execution of configuration changes is easy in concept, but difficult in practice. It requires knowledge of any potential impact of the changes made and the changes themselves. Additionally, thorough code-based validation and roll back plans need to be in place.
In my next article, I’ll look at immutable infrastructure use cases.