Terraform Infrastructure Automation

Automating Proxmox Virtual Machine Deployment Using Terraform and Cloud-Init Templates

The Problem: I had been manually creating Ubuntu Virtual Machines within my Proxmox Home Lab. This included looking through the UI, uploading the Cloud Image, re-typing the same SSH keys multiple times, etc. The process was time consuming, and prone to human error and it was impossible to automate or replicate. Each Virtual Machine I created was a “15-minute job”.

The Constraints: My goal was to create an entirely Infrastructure as Code-based workflow that would have no dependency upon PXE boot or any large orchestration tools. Only one node on my Proxmox was to be used, and I wanted each VM to be fully configured (hostname, user, sshkey and packages) upon their initial start; I also wanted to configure the VMs without ever having to log into the console.

The Solution: I am using Terraform with the Telmate Proxmox provider, in addition to cloud-init templates. Once I create my template, I can create a fully functioning VM in under a minute. Within this guide, I will describe the exact configurations I have used, the mistakes that I made which resulted in losing many hours of my time, and the checks I have established to ensure that a similar situation does not happen again.

Quick Summary

  • Provision Ubuntu 22.04 Cloud-Init VMs on Proxmox VE v8.2.
  • Use a dedicated API Token with minimum privileges assigned — Not ROOT.
  • Attach a Cloud-Init ISO to the ide2, rather than a SCSI slot; otherwise, your Clone will fail without warning.
  • Pin your providers to specific versions and use variables so you can spawn many VMs off a single Root Module.

Tested On: Proxmox VE v8.2 (kernel 6.8.12-2-pve), Terraform v1.8.4, Ubuntu 22.04 LTS Cloud Image. Everything outlined in this document was tested in my single-node Homelab using Local Disk Storage.

In order to allow Terraform to interact with your Proxmox virtual machines you must prepare four items prior to beginning: a clean cloud-init template; a Proxmox API token with restricted permissions; selecting where to store and how to track your resource state; and setting the pinned versions for your Terraform templates.

Prerequisites & Planning

Prepare the Proxmox Template Clone

To create your Ubuntu cloud image template follow these steps: Download the Ubuntu Cloud Image from the official website and run it on a new tiny virtual machine. An agent must be installed in the virtual machine so that Terraform can receive the IP address via cloud-init and continue executing the command successfully. Do not modify the template or configuration of the image without creating a backup!

# If you already have a VM you’re turning into a template, clone it first
qm clone 9000 9999 --name backup-before-template --full

You may want to keep a second shell session open on your Proxmox host while executing commands for backup and recovery.

  1. Download the image and import it into a new virtual machine.
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
qm create 9000 --name ubuntu-2204-cloudinit-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
qm importdisk 9000 jammy-server-cloudimg-amd64.img local-lvm
qm set 9000 --scsihw virtio-scsi-single --scsi0 local-lvm:vm-9000-disk-0
qm set 9000 --ide2 local-lvm:cloudinit
qm set 9000 --boot c --bootdisk scsi0
qm set 9000 --serial0 socket --vga serial0
qm set 9000 --agent enabled=1
  1. Boot the virtual machine once, install the agent, and power off.
qm start 9000
# Inside the VM:
sudo apt update && sudo apt install -y qemu-guest-agent
sudo systemctl enable --now qemu-guest-agent
sudo shutdown -h now
  1. Select the virtual machine and Convert to Template.
qm template 9000
  1. Verify the virtual machine is set as a template.
qm status 9000
# Output: status: stopped (template)

The Proxmox VE Administration Guide on VM Templates and Clones has instructions on how to complete these procedures with the appropriate configuration flags if you want to replicate exactly what I have done here.

API Token and Permissions for the Proxmox Terraform Provider

Never provide your Terraform instance with a root password. Configure your API token to be as restricted as possible. In the Proxmox web interface, go to Datacenter > Permissions > API Tokens and add a new token to a user that has the PVEVMAdmin permission role on that particular path (i.e. /nodes/pve). Also grant Sys.Audit permission for /; otherwise you will not be able to list any resources using the provider and will receive a 401 error. The path is case-sensitive; PVEUser@realm!token must match exactly.

You will need to save the Token ID and Secret for later use; you will provide these values to Terraform by specifying them as environment variables or as part of the Provider Block.

Version Pinning and Remote State Planning

I keep my State files locally on my laptop for my homelab, but if you’re sharing this configuration with anyone else, you should consider using an S3-compatible backend to store your State files. You should also hardpin your Provider to a specific version from the start, as the Telmate Provider has had Breaking Changes between each version 2.x release.

Core Setup: Terraform Proxmox VM Provisioning Cloud-Init

Bootstrapping the Proxmox Terraform Provider

You need to create a provider.tf that looks similar to this:

terraform {
  required_version = "~> 1.8"
  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "2.9.14"
    }
  }
}

provider "proxmox" {
  pm_api_url          = "https://pve.example.com:8006/api2/json"
  pm_api_token_id     = var.proxmox_token_id
  pm_api_token_secret = var.proxmox_token_secret
  pm_tls_insecure     = true   # only for homelabs with self‑signed certs
}

This establishes the Provider and locks it to Version 2.9.14. Use the Terraform Registry page for the Proxmox provider as a point of reference whenever your Provider Versions begin to drift from Version 2.9.14.

You can initialize your Provider and verify connectivity by using the following command:

terraform init
terraform plan

Crafting the Terraform Proxmox QEMU Resource with Cloud-Init Drive

Now let’s look at the VM Resource. This is where you will determine whether your Cloud-Init drive mapping is functional or not. This code is placed in main.tf:

resource "proxmox_vm_qemu" "cloudinit_vm" {
  name        = "lab-ubuntu-01"
  target_node = "pve"
  clone       = "ubuntu-2204-cloudinit-template"
  full_clone  = false

  cores   = 2
  memory  = 2048
  sockets = 1

  scsihw = "virtio-scsi-single"

  disk {
    type    = "scsi"
    storage = "local-lvm"
    size    = "20G"
  }

  network {
    model  = "virtio"
    bridge = var.vm_network_bridge
  }

  # Cloud‑init drive MUST be on ide2
  cloudinit_cdrom_storage = "local-lvm"
  os_type                 = "cloud-init"

  sshkeys = var.ssh_public_key
  ciuser  = var.vm_user

  ipconfig0 = "ip=dhcp"
}

The first line of cloudinit_cdrom_storage tells Terraform to create a Cloud-Init ISO on the target storage and attach it to this ide2. If you do not include this line, or you try to put a Cloud-Init drive in scsi0, Terraform will still create a clone of your VM; however, the Cloud-Init data will be silently ignored. I will go into more detail about this point in the Pitfalls.

You can read about why the Cloud-Init Drive must appear as a CD-ROM drive in the cloud‑init NoCloud datasource reference. As specified, the NoCloud data source will require the following volume label: cidata, using Proxmox to handle that for you.

The following commands will help you run a plan so you know exactly what Terraform is going to do with all of your hardware, including the creation of the cloud-init drive from scratch.

$ terraform plan

  + resource "proxmox_vm_qemu" "cloudinit_vm" {
      + name                         = "lab-ubuntu-01"
      + target_node                  = "pve"
      + clone                        = "ubuntu-2204-cloudinit-template"
      + scsihw                       = "virtio-scsi-single"
      + cloudinit_cdrom_storage      = "local-lvm"              <-- cloud‑init ISO created here
      + ipconfig0                    = "ip=dhcp"
      + sshkeys                      = (sensitive value)
      ...
    }

Using the above commands, you will see that all of the entries listed above will be created by Terraform when you run the plan.

Running and Verifying the Automated Deployment

You should create a backup of the Proxmox node configuration before applying the configuration. At minimum, dump your running VM list with qm list so you have a record to revert to if the state gets tangled. Also, you should have a second root shell open in the Proxmox host.

To apply the configuration:

$ terraform apply --auto-approve

proxmox_vm_qemu.cloudinit_vm: Creating...
proxmox_vm_qemu.cloudinit_vm: Still creating... [10s elapsed]
proxmox_vm_qemu.cloudinit_vm: Creation complete after 15s [id=qemu/100]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:
vm_ip = 192.168.10.143 <-- that’s your new VM ready for SSH

Once the stack has been applied, you will want to verify that cloud-init has finished running on the VM:

ssh ubuntu@192.168.10.143 cloud-init status --wait
# Output: status: done

You can also verify that the guest agent is still working:

ssh ubuntu@192.168.10.143 systemctl is-active qemu-guest-agent
# Output: active

If cloud-init does not have a successful run, status: not run will be displayed during deployment and you will not be able to use SSH keys to log in. That is the best time to follow the troubleshooting guide below.

Optimization and Best Practices

Using Terraform Proxmox Variables for Reusable Stacks

Instead of hard-coding tunable values such as VM specifications, template names, network bridges, SSH keys, and users in your root module, create a folder containing all of the variable files for each of your different environments and use those files as your input files. Example:

variable "vm_count" { default = 1 }
variable "vm_name_prefix" { default = "lab-ubuntu" }
variable "template_name" { default = "ubuntu-2204-cloudinit-template" }
variable "vm_network_bridge" { default = "vmbr0" }
variable "vm_user" { default = "ubuntu" }
variable "ssh_public_key" { sensitive = true }

To create the same VM on all environments, you can use the variable file loaded into a resource block, which allows you to make one change to the template name in your variable file and automatically update every VM in that environment.

Pin Provider Versions and Avoid Drift

Hash the provider binary with terraform init and commit the .terraform.lock.hcl file. When you upgrade, always run terraform init -upgrade in a branch, then plan, and diff the lock file. State locking matters even for local state — if you ever move to a remote backend, enable DynamoDB‑style locking immediately.

Real‑World Pitfalls and Debugging

What Didn’t Work For Me

The first terraform apply I created timed out saying that “vm is locked (clone)… no cloud-initialization disk found.” I completed the clone, but my VM failed to get the user-data when I connected to it using SSH with the default password (which I never set). I found a completely blank /var/lib/cloud/instance.

I grabbed the configuration for the VM on the Hypervisor:

qm config 100 | grep ide
# Only showed an empty ide2 entry — no ISO attached.

I set disk { type = "scsi" storage = "local-lvm" size = "10G" }, but then I misread the schema and added disk again as type = "scsi" for cloud-init. The provider created an SCSI raw disk for me, but cloud-init needs to see a CD-ROM device in ide2. The answer to this problem was to delete the disk block for the extra raw disk and use cloudinit_cdrom_storage = "local-lvm". After doing that, terraform taint and apply reinstantiated the VM, and the drive was instantly visible. I soon verified with a qm config 100 | grep ide2 that ide2: local-lvm:vm-100-cloudinit,media=cdrom was now mounted. That’s when I tattooed “Cloud-Init ISO” on ide2 in my memory.

Debugging a Stalled Cloud-Init: NoCloud Datasource Mismatch

Even if the ISO attaches correctly, cloud-init sometimes ignores it completely, and the reason was due to the datasource within the template. While Ubuntu Cloud images typically include DataSourceNone as a backup boot, in cases where a pre‑seeded template contains only the instance of OpenStack for datasource_list, the NoCloud storage (drive) will not be consulted.

This can be verified by accessing the VM.

grep -r datasource_list /etc/cloud/cloud.cfg*

If the only materials visible are those referenced in [ OpenStack ], Cloud-Init will avoid utilizing the CD-ROM drive for booting. Two solutions to this problem include modifying the template to remove the restriction prior to conversion, or inserting a user-data into the image that will supersede the image provided in the original instance.

#cloud-config
datasource_list: [ NoCloud, None ]

With the template updated to reference the IDE drive, all deployments worked successfully on the first attempt.

Common Mistakes and Edge Cases

Ignoring the Cloud-Init Template Example With the Wrong SCSI Controller

If the original template used the Stock LSI SCSI controller instead of virtio-scsi-single, Cloud-Init may not be able to mount the Cloud-init Volume if the kernel does not recognize the Device (i.e. The SCSI Controller). Always confirm that you are using virtio-scsi-single to build the template, and ensure that the Terraform resource is referenced in the same way. A mismatch in the Template and Clone VM is likely to result in an erratic boot process or boot hang.

Home Lab Infrastructure as Code: Network Bridge Mismatch in Terraform Variables

My Terraform Configuration referenced vmbr0 however, My Proxmox Host had been configured to use vmbr1 as its Lab Network. As a result, the VM booted without an IP Address, leading to me blaming DHCP for about 10 minutes. To find your actual bridge name, go to the Proxmox web UI and click on Datacenter > pve > System > Network, and then check the Name column for the information you need. Use that bridge name as a string to populate a variable, rather than hardcoding the value. If you change the location of your physical NICs in the future, this allows you to easily update the bridge name without having to edit your entire Terraform config file.

Terraform Proxmox Cloud‑Init Drive Not Attached After Clone

If you are cloning a disk image of a template where the disk image type is raw and you clone it to a storage location that can only accept qcow2, you will successfully clone the disk, but the cloud‑init ISO creation process may fail silently because the provider cannot determine how much space is available to allocate the cloudinit ISO. Always use raw as your disk image type for the template and use a backend storage location that accepts that disk image type. If in doubt, check the provider logs using TF_LOG=DEBUG terraform apply to see if you can find an error message like “unable to create cloudinit drive”.

Frequently Asked Questions

Why does the Terraform Proxmox provider return a “401 permission denied” error when I use the correct API token?

The token path is case-sensitive and must match User@realm!tokenid exactly. The token must also have Sys.Audit assigned to / to allow the provider to enumerate the nodes. Once you provide that permission to the token, this error will no longer occur.

How can I automatically increase the size of the cloud-init disk after cloning my template to create a VM?

Although you cannot increase the size of the cloud‑init ISO itself, you can increase the size of the root disk by using a remote-exec provisioner that runs growpart and resize2fs. Make sure your cloudinit user-data file contains growpart under bootcmd or a runcmd. It is recommended not to use provisioner blocks for any heavy configuration. They should be considered a last resort.

Can I use one Terraform configuration to operate on multiple Proxmox nodes in a home lab?

Yes! You can configure provider aliases for each node, and use a loop to iterate over a map of endpoints. Be mindful of the API’s concurrency limits; keep parallelism low (for example, -parallelism=2), so you do not overwhelm your host with too many requests at once. While looping over nodes using for_each works, make sure you validate the token on each node before executing the configuration.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button