Troubleshooting and Performance
What to check when VMs or containers are slow, fail to boot, lose network access, or behave strangely under load.
Published May 19, 2024
Troubleshooting and Performance
When a guest feels wrong, pick the layer before you pick the fix. Throwing more cores at a storage problem is how people lose weekends.
If The Guest Feels Slow
Start with four questions:
- Is CPU actually busy?
- Is memory tight or swapping?
- Is disk I/O the real bottleneck?
- Is the problem really network latency in disguise?
Use host graphs and guest-side tools together. One without the other usually tells a flattering lie.
If A VM Will Not Boot Cleanly
Check the boring causes first:
- storage unavailable or full
- wrong controller or firmware choice for the guest
- imported image expecting different hardware
- guest agent or boot order assumptions that changed during edits
This is also where old compatibility choices come back to collect rent.
If Networking Is Weird
Do not debug that page from memory. Use the existing networking pages.
Start with:
They already cover the actual path better than a duplicate summary here would.
If A Container Refuses To Start
Containers fail for different reasons than VMs.
Common culprits are:
- mount or bind path problems
- permission issues with unprivileged mappings
- features that expect kernel behavior the host is not offering
- older distributions that do not play nicely with newer cgroup environments
Sometimes the right answer is not "keep fighting the container." Sometimes the workload simply belongs in a VM.
If Snapshots Or Backups Behave Poorly
Check whether the guest agent is installed, enabled, and actually running.
For Windows, be aware that filesystem freeze behavior can get complicated around VSS-aware applications. The guest agent is still useful, but it is not something to configure blindly on important database workloads.
If Disk Usage Never Seems To Shrink
Look for trim and discard gaps.
The guest can delete files all day long, but if discard is disabled or trim never runs, the backend may still treat the old blocks as occupied.
Performance Knobs Worth Touching
These usually matter:
- VirtIO devices instead of legacy emulation
- VirtIO SCSI single plus IO Thread for busy disks
- CPU type chosen with migration requirements in mind
- multiqueue on VirtIO NICs only when the guest truly handles high packet rates
These are the ones to treat carefully:
- aggressive CPU pinning
- passthrough-heavy designs that quietly kill migration options
- complicated storage layouts introduced before you needed them
The Habit That Saves Time
When a guest misbehaves, change one thing, test it, and leave notes.
The lab gets easier the moment you stop trying to debug three layers at once.