Ansible versus Terraform Demystified

Ansible versus Terraform Demystified

Consider also checking out the article Ansible vs. Terraform, clarified on RedHat.com

Ansible and Terraform are two very powerful but unique open source IT tools that are often compared in competitive discussions.  We often see comparisons of the two tools - but many times, these comparisons are done purely from a "spec sheet" comparison. This type of comparison, while an interesting read, doesn't take into account using the products at scale or if the comparison is realistic as a binary all-or-nothing approach. We at Red Hat have been helping enterprises for over 20 years and have a good idea how most IT administrators are using these two tools in production. Although both tools can generally do most things, we typically see that they are each leveraged by means of their biggest strengths as opposed to having to choose one or the other.

Spoiler:  The two tools are better together and can work in harmony to create a better experience for developers and operations teams.

Both Ansible and Terraform are open source tools with huge user bases, which often leads to cult followings because of the classical "hammer" approach.  That is, if my only tool is a hammer, every problem will start resembling a nail. This ends up trying to solve new problems the only way I know how, rather than trying another tool that might be more effective.  It is never a great idea to only understand one tool and its approach and philosophy. Instead, you should open your mind to understanding why different tools and platforms exist, and why successful organizations may be using both.  In this blog we will go over the differences and similarities between Ansible and Terraform, the open source projects and their downstream enterprise products.  Keep in mind that this is a blog and to check the date for relevancy as products and projects are constantly changing and evolving.

Terraform

Terraform is an open source project that is sponsored by the company HashiCorp. Terraform is one of several open source projects that have been productized by HashiCorp; other projects include Vagrant, Packer, Consul and Vault.  HashiCorp specifically has a design philosophy called the Tao of HashiCorp where they want their projects and products to be simple, modular, and composable. In this case, each project and product pairing has well defined scopes and for larger workflows you would combine multiple projects and products.  They define Terraform with the following purpose:

Terraform is the infrastructure as code offering from HashiCorp. It is a tool for building, changing, and managing infrastructure in a safe, repeatable way. Operators and Infrastructure teams can use Terraform to manage environments with a configuration language called the HashiCorp Configuration Language (HCL) for human-readable, automated deployments. source

Terraform is mainly command-line only, but is well integrated with a set of popular public clouds. Terraform is great at provisioning fixed sets of cloud infrastructure and tearing them down afterwards.  HashiCorp provides two productization methods of Terraform for customers, they can either self-manage their custom deployment with Terraform Enterprise or they can use their managed service Terraform Cloud. Their business tier provides drift detection, SSO Audit logs, self-hosted agents and customized concurrency.

Ansible

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.  Most people are familiar with community Ansible, which is the command-line tool for running Ansible Playbooks.  Like Terraform, Ansible focuses on simplicity and ease-of-use.  Ansible uses YAML syntax for Ansible playbooks.  We use YAML because it is easier for humans to read and write than other common data formats like XML or JSON.

Red Hat Ansible Automation Platform is the product that is offered to customers.  It is built on the foundations of Ansible with numerous enterprise features, combining more than a dozen upstream projects into an integrated, streamlined product. Each product component also has a specific purpose with a well defined scope similar to HashiCorp's design philosophy.  For example, the automation controller is the Web UI and API for Ansible automation, which is based on the upstream project AWX.  This component is bundled into the platform to manage automation. Ansible Automation Platform is available to be run on-premises and charged by node (rather than by user) or you can use the managed service offering on Microsoft Azure.

To summarize, both Ansible and Terraform have open source command-line only versions.  They both have products available with enterprise features such as a Web UI or SSO.  The primary difference for their community versions is that Ansible is an multi-purpose automation tool, whereas Terraform is an infrastructure as code tool.  The confusion starts occurring because there are numerous use cases that could potentially be solved by either tool, and both Ansible and Terraform have plugins to call each other.  For example, many Ansible experts simply provision AWS resources with an Ansible Playbook and might not understand why others use an entirely different tool. Similarly, Terraform experts might create and destroy entire instances for even the smallest configuration change (see next section about immutability).

Immutable Infrastructure: The Killer App?

Terraform takes an immutable approach to infrastructure.  If you are unfamiliar with immutable infrastructure, it is defined as instances that do not change over time or are unable to be changed. To greatly simplify, an IT operator can create a declarative file (a Terraform HCL file) that represents in structured data what they want their end-state cloud footprint to look like and deploy this with Terraform.  One of the advantages of this approach is that it creates a single source of truth (that HCL file) that can be deployed over and over again without having to understand how it gets to the end-state.  This approach can be simple and elegant for individuals getting started quickly but depending on the size of infrastructure can become complex and hard to manage. Another advantage of an immutable approach is that it is just as easy to tear down (de-provision) your cloud resources. This allows developers to quickly spin up resources, test something, then tear them down.

Ansible, by design, takes an imperative approach to automation.  You simply have a task list that iterates through each resource.  You would tell it to provision this VPC, this subnet, then this VM.  The advantage of this approach is it is very simple to understand, there is no hidden magic, which helps it become easy to troubleshoot.  The disadvantage is usually it is more cumbersome to do teardowns and de-provision without knowing the correct order.  I have to delete the instance, then the security group, and so on and so forth. However, Ansible has support for calling both AWS CloudFormation (another immutable and declarative approach for AWS), and Terraform.  In fact, Ansible Automation Platform does this for all major public clouds, and encourages people to use their preference for provisioning and de-provisioning.  This is a great example of how Terraform and Ansible are better together.

Important: Although Ansible is not universally immutable, depending on how you implement your individual tasks, some Ansible tasks can be immutable.

Here's an example: You can have an Ansible Playbook that provisions a Linux virtual machine into a public cloud using a CloudFormation Template, and then subsequently installs an application via the dnf Ansible module. This activity would be entirely immutable by Ansible.  Most Ansible modules are designed to be idempotent so that they only make changes when they need to.  Ansible is extremely flexible, and it's easy to just automate shell commands which are not idempotent and change every time the playbook is run.  This showcases how Ansible shines as a multi-purpose automation tool versus a discrete infrastructure as code tool.  

Use Cases Compared

If you read all the articles about Terraform, you will find they are public-cloud focused. This is where immutable infrastructure works well and Terraform is great at provisioning cloud resources and applications for AWS, Azure, Docker, GCP, and OCI.  However, there is more to IT operations than automated infrastructure provisioning and this is why Ansible is extremely popular as well.  This is not a knock on Terraform, it is a specific tool with a specific purpose and ethos designed purposely to do infrastructure as code.  However, this infrastructure as code wholly depends on how you define your infrastructure.  Is my critical Cisco IOS network switch not infrastructure?  IT Infrastructure can mean a lot of different things to different IT administrators depending on if they are a network engineer, cloud operations engineer, system administrator or have another title or role.

Ansible focuses on automation with a variety of use cases that are typically divided up into domains, due to their legacy silos:

  • Infrastructure automation - includes automation of Linux and Microsoft Windows, as well as storage vendors like NetApp, PureStorage, and HPE.
  • Network automation - includes physical switches, routers, load balancers, and SDN controllers from popular vendors such as Arista, Cisco, F5 and Juniper.
  • Security automation - integrates SIEM, IDPS, and firewalls from vendors like IBM, Checkpoint, and ITSM tools like ServiceNow.  
  • Edge and hybrid cloud footprints.

Moving to an Event-Driven IT Strategy

As opposed to Terraform, Ansible is more focused on the entire IT workflow. For example, consider the following workflow:

  1. Deploy a Web Application to AWS.
  2. Update your ServiceNow ITSM with Web Application Information.
  3. Run a schedule to check every hour that the Web Application is responding on the correct ports or use event streams to monitor ports and the application for further automation.
  4. Update/Create a ServiceNow ticket if the Web Application stops responding and attempts automation for remediation.

In the above example, it is not enough to simply provision a web application into a public cloud.  There are other steps that need to take place in this automation workflow.  We need the automation to sync with the customer\'s ITSM tool, and include event-driven checks for the web application to ensure it is operating correctly (we call this continuous IT compliance).  Stateful automation can even guarantee this service is kept running while human operators make changes out of band from your automation.

Better: Ansible Orchestrating Terraform

Terraform is an excellent cloud provisioning and de-provisioning tool for infrastructure as code.  Ansible is a great all-purpose, cross-domain automation solution.  Both have an amazing open source communities and well supported downstream paid products.  What we see with the community, customers and even our own IT workflows is that you can combine these tools and solutions to create even more amazing IT workflows.  If you are already invested with Terraform, Ansible simply allows you to wrap those HCL templates into more holistic automation workflows. Ansible further extends your automation allowing you to add tasks like configuration management and application deployment to the Terraform IaC deployment.

How are people using Ansible?

We've noticed that many IT administrators refer to the specific "cloud deployment and retirement" use case rather than looking at other cloud operations use cases, such as Day 2 operations.  To help spark some ideas, let's highlight some Ansible cloud automation use cases today outside of just provisioning and de-provisioning cloud resources.

  • Infrastructure visibility - This is simply using Ansible to retrieve information from your public clouds to understand your cloud footprint.  This is very helpful for brownfield environments where there are numerous IT administrators configuring resources out-of-band from each other.  When there isn't a forced IT process, it is a great starter use case because it is read-only and requires no production changes.
  • Compliance - We need to not only treat cloud infrastructure as code, but also the cloud as code.  For example, we can enforce IAM policies and make sure there is a common experience across public clouds.  Another example would be to force a tag policy across your instances for billing and auditing and shutting down instances out of compliance.  What's great about Ansible is that it can operate and enforce these policies on mutable and immutable infrastructure.
  • Business continuity - Ansible can help keep the lights on.  Move and copy resources off cloud, create and manage policies for backups and build automation to manage disruptions and failures.
  • Cloud operations - Ansible can automate Day 2 activities.  This includes application deployments and CI/CD pipelines, lifecycle management and enforcement as well as OS patching and maintenance.
  • Cloud migration - Ansible can help move workloads to where you need them.  For example, adopting automation for your on-premises infrastructure can help operators adopt public cloud.  Making sure your source of truth is automation versus the on-box configurations is the first step for cloud migration.  Ansible automation can also reduce friction for migration to cloud native, allowing developers to migrate off legacy infrastructure.  By using Ansible automation, an IT group can help unify automation architecture across legacy and cloud-native.
  • Infrastructure optimization - Adopting clouds can help IT operators save time and money, but initially it\'s hard to predict costs and understand how your billing requirements change.  Having an automation strategy can help you keep costs under control by turning off unused resources, rightsizing cloud resources and combining with use cases like infrastructure visibility, you can easily recover orphaned resources and make sure there are no surprise costs.
  • Infrastructure orchestration - We talked about this previously, but how are you integrating everything that's not in your public cloud? Orchestration is simply how we break down silos and integrate with infrastructure outside the cloud.  This allows IT operators to orchestrate business outcomes versus tech silos and apply consistent compliance across all infrastructure.
  • Automated troubleshooting - As your IT team gains confidence with automation, we can move towards an event-driven architecture.  This allows IT teams to respond faster to incidents, speed up meantime to resolution and integrate with an organization's ITSM solution.

Are people succeeding?

The quick answer? Yes! Even automating Terraform with Ansible! But holistic automation goes beyond doing one thing well in the cloud. Ansible can automate and orchestrate physical, virtual and cloud resources. It can automate the provisioning, configuration management, and manage Day 2 operations of network devices, Windows servers, storage and of course Linux. But regardless of what people decide to use to solve a problem, we've found that the real issues aren't with "what" or "how" a problem is solved from a technology perspective, but more about standardizing across technology domains while growing up and out to scale across the entire IT organization.

One of the most impressive and recent success stories using Ansible Automation Platform in the cloud was by Asian Development Bank. The published case study details how they modernized their infrastructure while at the same time modernizing their workforce, allowing them more time to focus on more important things, like innovative projects and new service offerings. They standardized on Terraform for Day 0 while standardizing on Ansible Automation Platform for Day 1 and Day 2 operations. Check out their story in the embedded video!

Final Thoughts

The confusion between Ansible and Terraform has existed for some time, either through inaccurate (or outdated) source material or through inexperience in using either/both technologies. This blog post (while somewhat biased) should help to at least start the conversation around the deeper connections between Ansible and Terraform. Every situation, use case, and person implementing the solution can be different, but because of these factors we believe Ansible is the best solution for automation.




The anatomy of automation execution environments

The anatomy of automation execution environments

Red Hat Ansible Automation Platform 2 introduced  major architectural changes, like automation mesh and automation execution environments, that help extend Ansible automation across your organization in a flexible manner, providing a single solution to all your organizational and hybrid cloud automation needs.

Automation execution environments are container images that act as Ansible runtimes for automation controller jobs. Ansible Automation Platform also includes a command-line tool called ansible-builder(execution environment builder)that lets you create automation execution environments by specifying Ansible Content Collections and Python dependencies.

In general, an automation execution environment includes:

  • A version of Python.
  • A version of ansible-core.
  • Python modules/dependencies.
  • Ansible Content Collections (optional).

diagram of an execution environment

In this blog, I will take you through the inner workings of ansible-builder and how all the above requirements are packaged inside automation execution environments and delivered as part of Ansible Automation Platform.

A tale of two ansible-builder packages

As all projects in Red Hat, ansible-builder follows an open development model and an upstream-first approach. The upstream project for ansible-builder is distributed as a Python package, and then packaged into an RPM for Ansible Automation Platform downstream. This also means that there are different ways to install the upstream package and the downstream ansible-builder.

NOTE: To get the downstream packages, you must subscribe to Ansible Automation Platform repos from Red Hat.

Upstream:

pip3 install ansible-builder

Downstream: 

dnf install ansible-builder

This has sometimes led to confusions among users, as customers of Ansible Automation Platform can also install the Python package for free. There are minor differences between both upstream and downstream packages that you should understand before diving deeper into building automation execution environments.

As mentioned earlier, automation execution environments are container images that act as Ansible runtimes and ansible-builder is quite similar to generally available container engines such as Podman and Docker. So like any other container engine, the concept of building an image starts with a base image; that is where the upstream and downstream packages for ansible-builder differ. The base images used in upstream ansible-builder (Python package) as predefined constants are as follows:

EE_BASE_IMAGE='quay.io/ansible/ansible-runner:latest'
EE_BUILDER_IMAGE='quay.io/ansible/ansible-builder:latest'

Base images in the downstream package are as follows:

EE_BASE_IMAGE='registry.redhat.io/ansible-automation-platform-22/ee-minimal-rhel8:latest'
EE_BUILDER_IMAGE='registry.redhat.io/ansible-automation-platform-22/ansible-builder-rhel8:latest'

Upstream base images are available through Red Hat Quay.io, while the downstream ones come from Red Hat Ecosystem Catalog(registry.redhat.io), which requires authentication with a Red Hat account. The other difference in these images is that upstream ones use CentOS image as the base image while the downstream ones use Red Hat Universal Base Image (UBI). UBI offers greater reliability, security, and performance for official Red Hat container images compared with CentOS images.

One commonality for the upstream and downstream packages is that they both allow image configuration through an automation execution environment specification file called execution-environment.yml.

Whether you are an Ansible Automation Platform customer or a community user of ansible-builder, you can use UBI images as base images or the CentOS images for your automation execution environments based on the package or by  passing a different set of base images to your automation execution environment specification file.

Why does the ansible-builder package have two base images?

Continuing from the previous section that introduce the upstream and downstream base images for ansible-builder, there are two arguments that specify which images to use:

  • The EE_BASE_IMAGE build argument specifies the parent image for the automation execution environment.
  • The EE_BUILDER_IMAGE build argument specifies the image used for compiling type tasks.

For most container images, you generally only need one base image on top of which you add different instructions, also known as build steps, to create your final container image.

However, the base automation execution environment (ee-minimal) is built using the multi-stage build concept of containers. The EE_BUILDER_IMAGE build argument serves as the intermediary step to install Collections and build dependencies to keep the base image size as low as possible.

Let's take an example: Suppose your Ansible Content Collection depends on a Python package that needs to be compiled using python-dev package (e.g. NumPy). Because python-dev is a compile time dependency, you don't necessarily need it in the final package (you just need the NumPy package). You wouldn't want to include python-dev in the final image to keep the image size as low as possible. For this purpose, the EE_BUILDER_IMAGE is used to build dependencies and then copy over only the package wheels needed for the final automation execution environment.

Does this matter if I want to build a custom automation execution environment?

In most cases it doesn't matter. When you build your automation execution environment using ansible-builder, you just need EE_BASE_IMAGE and not EE_BUILDER_IMAGE. However, you should understand how a compile time binary dependency is applied in the execution-environment definition file called bindep.txt. For the above example, if you need to install the NumPy Python package as a dependency for your Collection on UBI8, you specify the bindep.txt and requirements.txt as follows:

# bindep.txt
python38-devel [compile platform:rhel-8] #compile time dependency
# requirements.txt
NumPy

There will be instances where the configuration in the automation execution environment specification isn't reflected or errors occur when you're building the automation execution environment. In these instances, it's important to understand the role for the EE_BUILDER_IMAGE. The next section explains this in more detail.

Automation execution environment design

diagram picture of automation execution environment design

The above diagram outlines how automation execution environments are designed. I have mentioned the upstream image name and the downstream counterparts in the same boxes.

For reference, CentOS 8 and UBI8 (for downstream) serve as the base images for the python-base container image, which acts as the image for running python-based projects, hence it bundles a version of Python that is supported by the ansible-core package (python 3.8 for reference).

This python-base image serves as the base image for both the python-builder image as well as the ansible-runner (ee-minimal downstream) image. To summarize the purpose of python-builder and ansible-builder images, they build Python projects such as ansible-core and any Collections that are dependent on Python. For instance, if your Collection relies on Python dependencies for which wheels need to be built on the machine itself, they are built on the python-builder image.

Finally, the ansible-runner (ee-minimal downstream) image includes a version of the ansible-core package. The ansible-builder image works in conjunction with this image to build Python wheels, so that the final automation execution environment size is minimal by only keeping things that are necessary to run your required automation. custom-ee1 and custom-ee2 in the diagram represent any custom automation execution environments that can be created using ansible-runner (ee-minimal downstream) and the ansible-builder image.

Verifying your base images

To start building your custom automation execution environments, you should first verify which EE_BASE_IMAGE and EE_BUILDER_IMAGE are used in ansible-builder by default. To verify, first create an empty automation execution environment definition file called execution-environment.yml

touch execution-environment.yml

Then create a build context from the empty definition file by running this command in the same directory where you created the empty definition file:

ansible-builder create

This will create a context directory in your working directory which includes a Containerfile. Opening the Containerfile shows which images are set as the BASE and BUILDER images and tells you which ansible-builder you are using, the upstream or the downstream one. For instance, if you open the Containerfile created through the above process and a pip install of ansible-builder, you see the following content:

ARG EE_BASE_IMAGE=quay.io/ansible/ansible-runner:latest
ARG EE_BUILDER_IMAGE=quay.io/ansible/ansible-builder:latest

FROM $EE_BASE_IMAGE as galaxy
ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
USER root


FROM $EE_BUILDER_IMAGE as builder

FROM $EE_BASE_IMAGE
USER root
COPY --from=builder /output/ /output/
RUN /output/install-from-bindep && rm -rf /output/wheels

In the first two lines you can observe that the images point to the upstream images. If you do the same process on the downstream install of ansible-builder, you find the downstream images in a similar Containerfile.

Using the ansible-builder context

The context building is an important aspect of ansible-builder. You can use the context to change the Containerfile and customize your automation execution environments to your needs. You can use this context and the knowledge of multi-stage builds using BUILDER and BASE images to build an automation execution environment in a disconnected environment. The following shows an execution-environment definition that pulls the BUILDER and BASE images from a private automation hub instance:

# cat execution-environment.yml
---
version: 1
build_arg_defaults:
  EE_BASE_IMAGE: 'automation-hub.demolab.local/ansible-automation-platform-22/ee-minimal-rhel8:latest'
  EE_BUILDER_IMAGE: 'automation-hub.demolab.local/ansible-automation-platform-22/ansible-builder-rhel8:latest'

dependencies:
  python: requirements.txt

And the contents of the requirements.txt file are as follows:

# cat requirements.txt
dnspython==1.15.0

Let's create a context for the above definition file, execution-environment.yml:

# ansible-builder create
Complete! The build context can be found at: /root/disconnected_ee/context

The following issues may arise when building an automation execution environment in a disconnected environment (this example takes into account the building of a downstream image):

  • Cannot reach the external yum repositories.
  • Cannot pull Python dependencies from an external PyPI server, so using an internal PyPI proxy when building an automation execution environment.
  • (Optional) SSL certificate issues when pulling from internal PyPI mirror.

Firstly, create a pip.conf that points to the local mirror:

# cat context/pip.conf
[global]
index-url = https://nexus-nexus.apps.celeron.demolab.local/repository/pypi-proxy/simple/

You add the above pip.conf file and the certificate to the context folder for the targeted automation execution environment creation to add these files inside your custom execution-environment.

Using the multi-stage build knowledge and context editing, edit the Containerfile. Note the sections marked in bold text as well as some comments. These are the changes to build an automation execution environment in a disconnected fashion.

# cat Containerfile
ARG EE_BASE_IMAGE=automation-hub.demolab.local/ansible-automation-platform-21/ee-supported-rhel8:latest
ARG EE_BUILDER_IMAGE=automation-hub.demolab.local/ansible-automation-platform-21/ansible-builder-rhel8:latest

FROM $EE_BASE_IMAGE as galaxy
ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
USER root

ADD _build /build
WORKDIR /build


FROM $EE_BUILDER_IMAGE as builder
ADD _build/requirements.txt requirements.txt
RUN ansible-builder introspect --sanitize --user-pip=requirements.txt --write-bindep=/tmp/src/bindep.txt --write-pip=/tmp/src/requirements.txt

####### Changes to create EE in a disconnected environment
# Remove ubi repo which tries to reach external links
RUN rm -f /etc/yum.repos.d/ubi.repo
# Add pip.conf for internal pypi proxy
ADD pip.conf /etc/pip.conf
# Add CA certificate and update trust
ADD demolab-ca.crt /etc/pki/ca-trust/source/anchors/demolab-ca.crt
RUN update-ca-trust
####### This marks the end of edits for the builder stage

RUN assemble

FROM $EE_BASE_IMAGE
USER root
COPY --from=builder /output/ /output/

####### Changes to create EE in a disconnected environment
# Remove ubi repo which tries to reach external links
RUN rm -f /etc/yum.repos.d/ubi.repo
# Add pip.conf for internal pypi proxy
ADD pip.conf /etc/pip.conf
# Add CA certificate and update trust
ADD demolab-ca.crt /etc/pki/ca-trust/source/anchors/demolab-ca.crt
RUN update-ca-trust
####### This marks the end of edits for the main image

RUN /output/install-from-bindep && rm -rf /output/wheels

If you look closely in the above Containerfile, you can notice the additions that fix all the issues previously mentioned in both the BUILDER and the BASE image stages because both images use this information to pull and build Python dependencies.

Understanding what happens in each stage helps you understand where to edit your Containerfile, and at which stage, allowing you to make endless customizations to your custom automation execution environments.

Finally let's build the above execution-environment with the following command:

podman build -f context/Containerfile -t disconnected_ee:1.0

When the build succeeds, you should see a message like this:

--> 2316db485a1
Successfully tagged localhost/disconnected_ee:1.0
2316db485a1c4e7be4a687c682d0fc90335372d7e5564774f1ff6451840ac35f

Looking forward

Our ultimate goal is to make the developer experience as seamless as possible for customers. Ansible engineering teams are working on enhancements to the automation execution environment building experience, with several improvements already in the planning stage. Until those enhancements are available, this blog should help you tackle any challenges around the process of building automation execution environments. Following the upstream first model means you can also participate in community discussions and provide your thoughts and feedback through IRC. Please follow the link here to join us. One of the main enhancements to the automation execution environment experience is being discussed in this GitHub pull request, so you can participate in the GitHub discussions as well.