The problem: staging is expensive

Every team shipping software to production needs a staging environment. A place to test deploys, validate database migrations, and simulate failure scenarios before they reach end users. The problem is that maintaining a staging environment in the cloud costs real money.

An application server, a database, a Redis instance, maybe a background worker — replicating the production stack in the cloud can easily cost between $50 and $200 per month, depending on the application size. For a startup or small consultancy, that cost is hard to justify when the environment sits idle most of the time.

The most common alternative is having no staging at all. Deploy straight to production and hope for the best. It works until it doesn’t.


The decision: two datacenters, two purposes

The architecture we adopted splits infrastructure into two environments with distinct purposes:

  • Proxmox on a homelab — acts as a partial staging datacenter. Runs the application, database, and auxiliary services in VMs and LXC containers provisioned locally. Additional monthly cost: zero (the hardware already exists).
  • Vultr VPC — production datacenter. Servers in New York with a private network (VPC), automated backups, and public IPs. Predictable and scalable cost.

The word partial matters. The homelab staging doesn’t replicate 100% of the production environment. It has no SSL, no production-scale data, no equivalent network latency. But it replicates what matters for most tests: the deploy process, service configuration, database migrations, and application behavior with real (anonymized) data.


The architecture

graph TB
  subgraph homelab["Proxmox homelab — Staging (192.168.15.0/24)"]
    A1[app-staging — VM]
    A2[wireguard — LXC]
    A3[monitoring — Docker]
  end

  subgraph vultr["Vultr VPC NY — Production (10.1.96.0/20)"]
    B1[app — VM]
    B2[backup — VM]
    B3[vpn / WireGuard — VM]
  end

  A2 <-->|"WireGuard tunnel (10.200.200.0/24)"| B3

The two environments live on completely separate networks. The homelab uses the residential network (192.168.15.0/24), Vultr uses a private VPC (10.1.96.0/20). The connection between them is a WireGuard tunnel that creates a virtual subnet (10.200.200.0/24) accessible from both sides.

In practice, this means a service running on the homelab can reach resources on Vultr (and vice-versa) as if they were on the same network, with normal internet latency.


The glue: WireGuard

WireGuard is what makes this architecture viable. It’s a modern, lightweight, and fast VPN protocol that runs in the Linux kernel. Unlike heavier solutions like OpenVPN or IPSec, configuration is minimal and performance is excellent.

The WireGuard server runs on Vultr (public IP, fixed port). The client runs in an LXC container on Proxmox — no need for an entire VM. Proxmox 9+ already has the WireGuard kernel module, so an unprivileged LXC container is enough.

The peer configuration on the homelab side includes PersistentKeepalive, which is required when the client is behind residential NAT. Without it, the tunnel drops after a few minutes of inactivity.

For traffic to flow between networks (not just between the two tunnel hosts), the VPN server needs:

  1. ip_forward enabled
  2. iptables rules with MASQUERADE on the VPC interface
  3. Static routes on other VPC hosts pointing to the VPN server’s IP

This setup allows any machine on the Vultr VPC to reach machines on the homelab, and vice-versa. Useful for centralized monitoring, accessing staging databases from production tools, or simply SSH between environments.


Automation layers

Connecting the two environments is only half the problem. The other half is making sure the infrastructure is reproducible. Nobody wants to depend on a server that was manually configured six months ago and nobody remembers how to recreate.

Terraform: what exists

Terraform defines what exists in the infrastructure. On Proxmox, it creates VMs from Cloud-Init templates and LXC containers. On Vultr, it manages servers, VPC networks, and firewall rules.

An important point: if the production infrastructure already existed before Terraform (common case), you can import it without destroying anything. The workflow is:

  1. Write approximate resource definitions for existing resources
  2. Run terraform import for each resource
  3. Verify that terraform plan shows no destructions
  4. Use lifecycle { ignore_changes } for attributes that force recreation (like os_id or hostname on Vultr)

This allows you to codify existing infrastructure incrementally, without the risk of taking down production.

Ansible: how it’s configured

Ansible defines how each machine is configured. It installs Docker, configures the firewall (UFW), manages SSH keys, installs and configures WireGuard. While Terraform handles provisioning (creating/destroying resources), Ansible handles configuration (what runs inside each resource).

The separation matters. Terraform doesn’t know how to install packages. Ansible doesn’t know how to create VMs. Each tool does what it does well, and the result is a provisioning pipeline that goes from “nothing” to “working environment” in an automated way:

Terraform (creates VM/LXC) → Ansible (configures OS, Docker, VPN) → Deploy (application)

What works and what doesn’t

Works well

  • Identical deploy process: staging and production use the same deploy flow. If the deploy works in staging, it works in production.
  • Database migrations: testing migrations with real (anonymized) data before running them in production avoids surprises.
  • Zero staging cost: the homelab hardware is already there. No additional monthly cost.
  • Fast iteration: VMs on Proxmox are created and destroyed in seconds. Wrong configuration? Destroy and recreate.

Doesn’t work as well

  • Network latency: the homelab is on your residential network. Latency between staging and any external service differs from production latency (datacenter).
  • Availability: if your home power goes out, staging goes down with it. For staging, that’s acceptable. For production, it wouldn’t be.
  • SSL and domains: staging has no SSL certificate or public domain. Tests that depend on HTTPS or external webhooks need workarounds.
  • Scale: the homelab has limited resources. If the application needs realistic load testing, the homelab won’t reproduce production behavior.

None of these are deal-breakers for staging. The goal isn’t to replicate production with 100% fidelity — it’s to validate what breaks most often: deploys, migrations, and service configuration.


When this approach makes sense

This architecture works well in specific scenarios:

  • Small teams (1-5 people) where the cost of cloud staging is proportionally high
  • Monolithic applications or those with few services, where replicating the stack is feasible on limited hardware
  • Teams with infrastructure knowledge, capable of maintaining Terraform, Ansible, and Proxmox
  • Early-stage projects, where the investment in cloud staging isn’t justified yet

It doesn’t make sense when:

  • The application depends on managed cloud services (RDS, SQS, Lambda) that have no local equivalent
  • The team lacks infrastructure knowledge and the learning curve would be a bottleneck
  • The staging environment needs to be available 24/7 for automated CI/CD (though a UPS and a static IP solve part of that)

Final thoughts

Using a homelab as staging isn’t the ideal solution for everyone. But for small teams that need a test environment without increasing monthly infrastructure costs, it’s a pragmatic approach that works in practice.

The central point isn’t Proxmox, Terraform, or Ansible specifically. It’s the idea that staging doesn’t need to be a perfect copy of production to be useful. It needs to be good enough to validate what breaks most often — and cheap enough to not be cut in the next budget review.


References