Ansible Server Configuration
-
Analyzing and Fixing Failed to Template String Errors in Ansible Playbooks
I’ve lost count of the 3 a.m. pages where a playbook just stops dead with “Failed to template string.” The first time it happened, I stared at a wall of Jinja2 tracebacks convinced I’d somehow broken Python itself. Since then, I’ve learned to stop guessing and start peeling back the layers, fast. This post walks through exactly how I troubleshoot…
Read More » -
Correcting Variable Precedence Conflicts in Complex Nested Ansible Inventories
Several months ago, I made a simple playbook modification to update a firewall rule on our application servers. The playbook executed successfully with no error messages on the monitoring dashboard; however, the following morning, around 50% of customer traffic became unreachable. After frantically diving into the configurations, I determined that there was an issue with a host-specific port override in…
Read More » -
Debugging Python Interpreter Discovery Failures on Legacy Managed Nodes
At 7 PM on a Friday night I was using a simple patch playbook against multiple CentOS 6 boxes that had been consecutive performance since 2014. Instead of receiving a wall of green “ok” responses I received a hard red failure on every node. The response contained an error that mentioned a “Python interpreter” and I continued staring at the…
Read More » -
Fixing Ansible Become Sudo Password Errors and Permission Denied Failures
I’ll never forget that 11: PM page after a routine patching playbook stalled out across 300 servers; it just sat there without throwing any errors or timing out, with nothing but a blinking cursor staring at me. So I killed it and re‑ran it with verbose logging, only to find that all of the tasks were missing sudo passwords. The…
Read More » -
Resolving Unreachable Host Errors and SSH Connection Refused in Ansible Inventories
After a kernel patch was installed, my deployment pipeline started showing errors in relation to Ansible. All of my playbooks returned the same unreachable error message when attempting to run against a host: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection refused", "unreachable": true}. Even though I was able to manually connect to the host…
Read More » -
Configuring Persistent System Logging and Log Rotation via Ansible Playbooks
About a month ago, I woke up to a freezing cold Monday morning, as I experienced a tenfold production crash when our app went offline and our monitoring dashboards were not providing any relevant info on the outage. I took too long to find out what caused the outage; /var/log/MYSERVER had completely filled up all disk space because log rotation…
Read More » -
Designing Secure Credential Storage with Ansible Vault for Production Environments
I mistakenly shared an unencrypted production database password on a shared git repository while deploying at 2 am several years ago. This caused panic to ensue as I frantically tried to replace passwords, look through logs, and consume copious amounts of caffeine! Following this experience, I started to use ansible vault appropriately. Once you start to grow your use of…
Read More » -
Standardizing Package Management Across Hybrid Linux Environments Using Ansible Modules
Two years ago I inherited a fleet that consisted of a mix of Ubuntu 18.04, CentOS 7, and several Amazon Linux 2 boxes. Prior to our standards approach, each of these platforms had their own custom way of patching (so called “bug fixing”). This was typically done through multiple shell scripts that included an if [ -f /etc/debian_version ] conditional and nobody…
Read More » -
Implementing SSH Key-Based Authentication for Secure Ansible Managed Node Connectivity
The initial time I experienced an Ansible playbook freeze up, I was aghast when I finally tracked down the root cause as being the prompt for a password and could not use an automation tool that did not require me giving out passwords, and it was late and I was getting “Host key verification failed” due to a result of…
Read More » -
Automating Multi-Node Web Server Provisioning with Ansible Roles and Handlers
I still remember the first time I had to roll out a basic Nginx config to a handful of VPS instances. I SSH’d into each node, ran the same apt install, copied over a vhost file with scp, and reloaded the service. It worked. The next month, when the devs asked for a new vhost on all six nodes, I did…
Read More »