Advertisement
I really like Proxmox snapshots. Before a risky OS upgrade, a tricky config change, or a dist-upgrade on a VM I definitely do not want to rebuild from scratch, taking a quick snapshot is a ten-second insurance policy. If something goes horribly wrong, running qm rollback aggressively puts the VM back exactly where it was. That is genuinely useful, and I rely on it often.
The problem starts when that exact same snapshot feature gets treated as the primary backup strategy. The two solve completely different problems, and the massive gap between them only reveals itself at the worst possible time.
What a snapshot actually is
A Proxmox snapshot does not copy the VM's disk. It simply freezes the current state and starts tracking all new changes as a delta. The original data stays exactly where it was, on the exact same physical storage array, on the exact same host node.
This is exactly why snapshots are so fast to create and do not immediately consume massive amounts of extra space. It is also exactly why they will absolutely not help you if that physical storage array or the host node itself has a catastrophic hardware failure.
# Take a snapshot before a risky change
qm snapshot 101 pre-upgrade
# If it goes wrong, roll back instantly
qm rollback 101 pre-upgrade
If the software upgrade simply breaks something inside the VM, this is perfect. Thirty seconds later, you are back to a known-good state. But if the host's actual hard drive fails, or the host motherboard dies and refuses to boot, the snapshot is permanently gone along with everything else. It was never copied anywhere else.
What an actual backup needs
An actual backup requires the data to physically exist somewhere else. A problem with the VM, the host node, or the local storage array should never be able to take out the backup as well.
Proxmox's built-in vzdump tool does exactly this by writing a complete, independent copy of the VM to a totally separate target:
vzdump 101 --storage backup-nfs --mode snapshot --compress zstd
The --mode snapshot flag here is doing something completely different from the Proxmox GUI snapshot you roll back to. It briefly leverages snapshot mechanics in the background just so the VM does not need to be violently stopped, but the final output is a complete, standalone archive file successfully written to backup-nfs. That vma.zst archive does not care if the original disk still exists.
For anything beyond a single isolated host, running Proxmox Backup Server is highly recommended. It adds deduplication and fast incremental backups on top of this process. That means daily backups of a massive 500GB VM do not require pulling 500GB over the network every single night. Yet, it still produces 100% restorable archives that live totally independently of the source node.
The test that actually matters
Just like with databases, the only absolute way to know a Proxmox backup is good is to restore it. Ideally, you restore it onto entirely different physical storage or a completely different Proxmox host than the original:
qmrestore /mnt/backup-nfs/dump/vzdump-qemu-101-*.vma.zst 201 --storage local-lvm
If that restore successfully produces a VM that boots up and looks right, the backup is real. If it completely fails, that is something you desperately want to know right now on a Tuesday morning, rather than during a frantic 3 AM disaster recovery scenario.
The split I use
I use snapshots exclusively for the "I am about to do something I might need to undo in the next ten minutes" case. Risky OS upgrades, destructive software tests, or massive config changes. They are cheap, fast, and exactly the right tool for that specific job.
I use vzdump pushing to a separate NFS storage, or a dedicated Proxmox Backup Server, for the "this VM must survive even if this physical server burns down" case. Those are scheduled, stored off the host, and rigorously tested with an actual restore now and then.
Both absolutely have a place. The fatal mistake is letting the pure convenience of the first one quietly take over the critical job of the second. It works fine right up until the day a hardware-level problem makes the difference violently obvious and incredibly expensive.
Advertisement