Several months ago, I made a simple playbook modification to update a firewall rule on our application servers. The playbook executed successfully with no error messages on the monitoring dashboard; however, the following morning, around 50% of customer traffic became unreachable. After frantically diving into the configurations, I determined that there was an issue with a host-specific port override in the host_vars, which Ansible had ignored entirely because it also had a group-specific port override in the group_vars/all with the same name; thus, Ansible always referenced the group-specific variable instead of the host variable, so my host port override was never utilized. I spent an inordinate amount of time fixing that problem.
It was at that moment that I recognized the importance of understanding the Ansible variable precedence and began conducting audits of my inventories as if I were a detective. I will explain in detail how to identify potential overlaps and conflicts with Ansible variable precedence as well as provide steps to avoid any future Ansible variable precedence conflicts permanently.
Quick Summary
- Ansible has a strict 22-level hierarchy for variable precedence; the directory hierarchy determines which one takes precedence.
- If you do not know how to properly nest directories when creating
group_vars/all, agroup_vars/allvariable could override ahost_varsvariable. - Use
ansible-inventoryand the debug module to trace the origin of a variable. - Setting
hash_behaviour=mergewill resolve the dictionary overwrite problem; however, it may be a universal bug. - Organize your directories and file naming conventions in a way that prevents future overrides from causing problems.
Understanding the Inventory Hierarchy and Precedence Order
Ansible does not flatten all variables equally; in fact, it flattens them “layer on top of layer” like the layers of a wedding cake; therefore, the top layer is always the winner. Knowing that you have to follow an established order is the first part of winning.
Prerequisites and Environmental Assumptions
For this example, I am going to assume you have created your inventory structure using the multiple environments, such as inventories/production/, and that you have included at least one nested directory, for example, inventories/production/group_vars/all, inventories/production/group_vars/webservers, and inventories/production/host_vars/hostname. The pain of these conflicts typically only appears when you have multiple membership groups and an overriding global variable combined with an overriding host-specific variable.
The Default 22-Level Precedence Order
The official Ansible variable precedence documentation lists twenty-two levels of variable setting in Ansible that you need to be aware of. Although it isn’t important to memorize the entire list, it is essential to realise how the command line variable extra-vars (the -e option) has a precedence above every other variable defined at role default. To elaborate, all inventory variables live in the middle of the precedence order, where the host_vars take precedence over the group_vars, but the group_vars/all has the least precedence of all of the inventory vars. The problem arises when the order of directories flips our expectations.
How Inventory Layout Dictates Variable Merging
Ansible merges variables found in group_vars according to the alphabetical name of groups. If you were to name your groups all, eu_cluster, and webservers, then the group_vars/all would load before both of the other two groups. Therefore, a variable in group_vars/webservers will take precedence over any variable in the group_vars/all. This should make sense as far as precedence is concerned. However, there is a “gotcha” here: even when the order of groups dictates that variables from group_vars/all take precedence over host_vars, the order of groups does not override that the host_vars are loaded before or after all the group_vars.
Identifying Conflicts: Debugging Variables in the Wild
You must identify when you have an error in precedence before you can fix it. The evidence of Configuration Drift is often first identified by the fact a configuration is different from what was originally written.
Spotting Unexpected Configuration Drift
An error occurred in a task because the port was defined incorrectly, or the template produced an incorrect IP address. The output of the playbook will show what the defined variables are, but they won’t match what was specified in host_vars. Example:
TASK [app : Open firewall port] **********************************************
fatal: [web01]: FAILED! => {"changed": false, "msg": "Firewall rule for port 8080 already exists but does not match expected 8443. Variable 'app_port' appears to be 8080."}
The app_port is 8080 instead of the expected value of 8443. This indicates that an incorrect variable was provided from an alternate source.
Utilizing the Ansible Debug Module Effectively
When I need to see what Ansible thinks a variable value is, I will use a debugging task to print its contents and type. Additionally, I will often do a dump of all facts and use grep to search for the variable. The debug module is the best option for examining variable types. Here is a simple example:
- name: Dump the offending variable
ansible.builtin.debug:
var: app_port
verbosity: 1
- name: Show complete host facts filtered
ansible.builtin.debug:
var: hostvars[inventory_hostname]
when: "'app_port' in hostvars[inventory_hostname]"
The first task will print the exact value with its type, allowing you to easily find whether or not the string “8443” was ever converted into an integer. The second task will dump the host’s entire variable bag, which allows for easier identification of duplicates coming from different groups.
How to Resolve Ansible Variable Precedence Conflicts
You’ve pinpointed the conflicting variable now you have two options to resolve it; a quick fix, or create a permanent change to your architecture.
Auditing Variable Origins with Ansible-Inventory
The fastest way to see where a variable came from is to use the --host flag when using the ansible-inventory command. By using --host, you can break down all of the sources that contribute to a host’s final variable dictionary.
It’s very clear from the outputs that the group_vars/all.yml file has defined app_port as 8080, while the host_vars/web01.yml file had set it to 8443. However, after merging the two files, the final result is 8080.
$ ansible-inventory --host web01 --yaml
...
app_port: 8080
...
vars:
app_port: 8080
_sources:
- path: group_vars/all.yml
vars:
app_port: 8080
- path: host_vars/web01.yml
vars:
app_port: 8443
...
This is due to the load order. If all loaded after host, then that wouldn’t work; host_vars should always override group_vars. So then what’s going on? Why is 8080 winning out? The only way this can happen is if host_vars isn’t correctly positioned relative to group_vars, or if the variable is being set again in a different group_vars file that loads after host_vars.
The _sources key in the output that is generated by Ansible during the execution of a playbook provides you with the ability to track where each variable comes from by looking up in which file it was defined. In the case of app_port, there might be a group_vars/zz_fallback file that is overriding any value set in host_vars.
Now that you can see how the files loaded, you are able to re‑organise or restructure the loading order to your advantage.
Applying the extra-vars Override as a Temporary Hotfix
If you require an immediate fix, and cannot reorganise your inventory at that time, you can force the value using an extra-vars override.
$ ansible-playbook site.yml -e "app_port=8443"
This override is at the top of the precedence ladder and guarantees that the current run will use port 8443 regardless of what is set in the inventory.
That being said, you must take note of how and when this command is being used to ensure that when everything works properly again, you move this variable to its correct location. If you keep using extra-vars to fix problems, eventually you will forget that you’ve used a magic command.
Restructuring Variable Placement for Permanent Fixes
To fix the issue permanently, the variable would usually need to be moved to a location where it can load later than the conflicting group, or where the new location naturally has higher precedence. If the group_vars/all variable is too easily overwritten by the host variable, I would move the specific host override to a higher precedence name in a host_vars file (preferably) or avoid defining any host-level variables as global variables completely. The real problem here is that the host’s variables are in conflict with the group’s. Therefore, a better pattern would be to use different variable names, such as changing the global variable to default_app_port and only referencing it if the host does not have an app_port defined.
The Deep Merge Trap: Hash Behaviour and Play Scoped Vars
Flat variables are straightforward to work with, but it is the dictionary variables where this can become very messy.
Why Dictionary Variables Overwrite Instead of Merging
The default value of Ansible’s hash_behaviour is to replace. Therefore, if I create a dictionary in my group_vars/all and I then add an additional key to that same dictionary in my host_vars, the original dictionary is deleted and only the host variable will remain. This means that if I have a global firewall rules dictionary I will have lost all of my base rules when I add a specific key at the host level. This is rarely what you want to happen.
Enabling hash_behaviour to Deep Merge Arrays
You can change the default globally for all of your playbooks by adding the hash_behaviour configuration to your ansible.cfg file. However, be careful, because this will change how you merge dictionaries for all of your playbooks. The default hash behaviour setting is replace, but if you change it to merge all dictionaries will be recursively combined. Here is the configuration area to do so:
[defaults]
hash_behaviour = merge
This is a configuration that I use in unique, legacy environments where many roles depend on having shallow dictionary overrides. However, I do not recommend enabling the merge option as the default for every Ansible environment, as this can complicate the debugging process for variables due to not being able to tell easily if a key will be overwritten or merged at first glance. One way to avoid combining variables into roles is with the combine filter. In addition to eliminating merging at the role level, it also can eliminate multiple set_fact calls for local variable setting from within a playbook.
Mitigating Play Scoped Vars Contamination
Variables defined at the play level with the vars keyword have a higher precedence over inventory variables. Therefore, when I define a variable at the play level, I am at risk of overwriting (clobbering) an inventory value that is set at the group or host level. For this reason, I limit the number of play-level variables I define and I will not reuse the same name within an inventory group or host file. Instead, I utilize set_fact to create local variables that do not pollute the global namespace.
Structuring group_vars vs host_vars to Prevent Future Clashes
As mentioned above, one of the main things to consider when constructing the directory structure of variable files at the top level of an Ansible project is that a cluttered directory will lead to difficulty managing merges down the road. A well-defined hierarchy is best in order to avoid unintentional overwriting of group/host variables.
Designing Non-Overlapping Group Hierarchies
I try to keep my top-level group directories small. The more files there are under the same directory, the easier it is to accidentally merge variable files that have the same name. I prefer a cleaner design by keeping each group file clean and avoiding deep nesting of group files. It looks like this:
inventories/production/
├── group_vars/
│ ├── all.yml
│ ├── webservers.yml
│ └── databases.yml
├── host_vars/
│ ├── web01.yml
│ └── db01.yml
└── hosts.yml
Every group file should only have group-specific values. If I see the same variable in the all.yml file and a host file, I know I am in trouble. I always give distinct names to my globals and my local variable overrides with a naming convention.
Establishing Naming Conventions for Global and Local Scopes
I use a default_ prefix for all global variables or I place them in a dictionary called defaults. The line default_app_port is defined in group_vars/all.yml and the role will merge the value of app_port with the default value of default_app_port. The variable app_port will be specific to each host and therefore there’s no possible way for a variable conflict to occur. As such the precedence of the conflicting values is irrelevant. Using this practice has eliminated around 90% of the conflicts in my inventory.
Frequently Asked Questions
Why does my role default variable override my inventory group variable?
Because role defaults are located at the lowest point in the variable precedence hierarchy; they exist below the group variables. Therefore, any group variable should take precedence above any role default variable. If you see otherwise, double check that you haven’t set the value of the group variable in a higher precedence location – such as in the play-level vars; or in an include_vars task which loads after the inventory group variable. Also check to see that your role default variable is indeed inside of the defaults/main.yml file and not the vars folder – since all variables defined under vars have higher precedence than those defined in defaults.
How can I force a specific group_vars file to take highest priority?
You cannot force a group_vars file to have greater precedence than either host_vars or extra-vars; however, you can create a group file which loads last in alphabetical order by naming it z_override or similar. Since Ansible loads group variables in alphabetical order, if you find a variable file saved as group_vars/z_custom, Ansible will load that file last, thus overwriting any previously defined groups. This isn’t a very nice solution, but it does work in a pinch. However, it is usually better to define the variable within the proper host scope.
What is the difference between include_vars and inventory variables?
include_vars is a task that loads variable values from either a YAML or JSON file at the time your playbook runs. By contrast, inventory variables (from group_vars and host_vars) are loaded during the initial inventory parsing phase before any plays start. The precedence of the variables contained within an include_vars task will sit between the vars definition at the play level and the host-specific facts defined within your inventory, depending on when those variables are called. Since they can be loaded in the correct timeframe, you can overwrite inventory variables. Therefore they are a double-edged sword since they can contain debugging variables.
In terms of Ansible variable resolution, you should consider variable precedence as being a rite of passage. A hidden variable override can lead to a world of pain. Once you have experienced the pain of an overridden variable, you will never view your inventory the same way. Working through the problem scenarios presented by ansible-inventory --host, the debug module, and the documentation on variable precedence is the first step in resolving variable conflicts. The best fix would be to create a naming convention and organizational structure for your inventory and variable files which allow Ansible to easily determine which files to look at before any play starts.