GPU Passthrough On Proxmox

This is the platform-specific side of the GPU story.

The question here is no longer whether a dedicated GPU is worth having. The question is how to make Proxmox behave properly once the GPU exists: host driver, device exposure, and the specific container mechanics that make AI workloads feel native instead of awkward.

If you want the hardware and workload reasoning first, start with GPUs For Local AI.

Scope

This guide assumes:

a Proxmox host already exists
the machine includes at least one NVIDIA GPU intended for AI workloads
you want to expose that GPU into one or more LXC containers

If your homelab does not include a dedicated GPU, skip this page entirely.

If you are doing this during a PVE 9.2 upgrade, read PVE 9.2 Upgrade Runbook first. The generic examples below use the 580.x branch, while the PVE 9.2/kernel 7.0 host path in this lab required NVIDIA 595.71.05 and also exposed the Kernel 7.0 Boot Hang RCA.

Host Preparation

SSH into Proxmox and install the dependencies you need for the generic NVIDIA installer. NVIDIA's installer needs matching kernel headers plus a working build toolchain so it can build the kernel interface for the running kernel, and Proxmox ships those matching headers as pve-headers-$(uname -r).¹²

ssh root@192.168.1.100
 
# Update system
apt update && apt upgrade -y
 
# Install required packages
apt install -y \
  g++ \
  freeglut3-dev \
  build-essential \
  libx11-dev \
  libxmu-dev \
  libxi-dev \
  libglu1-mesa-dev \
  libfreeimage-dev \
  libglfw3-dev \
  wget \
  htop \
  btop \
  nvtop \
  nano \
  glances \
  git \
  pciutils \
  cmake \
  curl \
  libcurl4-openssl-dev \
  dkms \
  make
 
# Install Proxmox-specific kernel headers
apt install -y pve-headers-$(uname -r)
 
# Update initramfs
update-initramfs -u
 
# Reboot
reboot

Download And Install The NVIDIA Driver

Replace 580.126.09 below with the driver version you intend to standardize on. NVIDIA supports both distribution packages and the generic .run installer; this page uses the generic installer path so the same driver branch can be staged into containers later.²³

# Example version; replace with the branch you intend to standardize on
cd /tmp
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.126.09/NVIDIA-Linux-x86_64-580.126.09.run
 
chmod +x NVIDIA-Linux-x86_64-580.126.09.run

Install on the host:

# Install driver with DKMS support
./NVIDIA-Linux-x86_64-580.126.09.run --dkms
 
# Installation prompts:
# - Register DKMS module: YES
# - Install 32-bit libraries: NO (optional)
# - Modify /etc/X11/xorg.conf: NO (not needed for server)
# - Update initramfs: YES
 
# Verify installation
nvidia-smi

If You Hit The `.run` Versus Debian Package Conflict

If you already installed Debian-packaged NVIDIA components and now want to use the generic installer, remove the packaged stack first and reboot cleanly before retrying so one packaging path owns the installed files.²³

# 1. Remove all Debian-packaged NVIDIA drivers
apt purge -y 'nvidia-*' 'libnvidia-*'
apt autoremove -y
 
# 2. Verify nothing is left
dpkg -l | grep -i nvidia
# Should return no results
 
# 3. Reboot to clear any loaded modules
reboot
 
# 4. After reboot, install with the .run file
cd /tmp
./NVIDIA-Linux-x86_64-580.126.09.run --dkms

Host Verification

At this point nvidia-smi should work on the Proxmox host and the device nodes should exist. NVIDIA documents the Linux device nodes as /dev/nvidia[minor number] for each GPU.⁴

# Verify the driver is healthy
nvidia-smi
 
# Verify the device files exist
ls -la /dev/nvidia*

Keep persistence mode, power limits, and boot-time reapplication in GPU Power Management On Proxmox.

GPU Passthrough To LXC Containers

LXC containers do not inherit direct GPU access automatically. Proxmox containers use the host kernel directly, but host resources still have to be exposed intentionally. For device nodes, Proxmox documents dev[n] as the native config key for passing a host device into a container.⁵

That is the whole point of the passthrough step below.

Push The Driver Into The Container

pct push 100 NVIDIA-Linux-x86_64-580.126.09.run /tmp/NVIDIA-Linux-x86_64-580.126.09.run

Inside the container:

# Enter container console, where 100 is the container ID where the driver was pushed and needs to be installed.
pct enter 100
 
cd /tmp
 
chmod +x NVIDIA-Linux-x86_64-580.126.09.run 
 
# Install without kernel modules (host provides them). Which means we only install the userspace libraries and tools, not the kernel module which is shared from the host.
./NVIDIA-Linux-x86_64-580.126.09.run --no-kernel-module

Alternative invocation if the warning is noisy:

./NVIDIA-Linux-x86_64-580.126.09.run \
  --no-kernel-module \
  --no-opengl-files \
  --no-glvnd-egl-client \
  --no-glvnd-glx-client

Identify GPU Devices On The Host

# List GPU device files
ls -la /dev/nvidia*

Add Devices To The Container Config

Use the dev[n]: /path/to/device entries in /etc/pve/lxc/<CTID>.conf to map the NVIDIA device nodes you actually need into the container.⁵

# Edit container config directly
nano /etc/pve/lxc/100.conf
 
# Add these lines at the end (replace 100 with your container ID):
 
# GPU device passthrough (Proxmox native syntax)
# Exposes individual GPU device files to the container
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
dev4: /dev/nvidia-caps/nvidia-cap1
dev5: /dev/nvidia-caps/nvidia-cap2

Restart and verify:

# Or via CLI:
pct restart 100  # Replace 100 with container ID

# Enter container console
pct exec 100 bash
 
# Inside container, install NVIDIA driver (no kernel modules)
# Download same driver version as host:
cd /tmp
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.126.09/NVIDIA-Linux-x86_64-580.126.09.run
 
chmod +x NVIDIA-Linux-x86_64-580.126.09.run
 
# Install without kernel modules (host provides them)
./NVIDIA-Linux-x86_64-580.126.09.run --no-kernel-module
 
# Verify GPU is accessible
nvidia-smi
 
# Exit container
exit

Troubleshooting GPU Passthrough

# If GPU not detected in container:
 
# 1. Verify GPU still works on host
nvidia-smi  # On Proxmox host
 
# 2. Check container has proper permissions
cat /etc/pve/lxc/100.conf | grep nvidia
 
# 3. Verify device files exist in container
pct exec 100 ls -la /dev/nvidia*
 
# 4. Check dmesg for errors
pct exec 100 dmesg | tail -20
 
# 5. Restart container (not just reboot)
pct stop 100
pct start 100
 
# 6. Force device re-detection
pct exec 100 nvidia-smi -L

Dual GPU Setup

If a second identical GPU is added, Proxmox and the NVIDIA stack will treat it as a second independent device. No special multi-GPU mode is required just to make both cards visible.

Verify Both GPUs

# List all NVIDIA devices on the PCI bus
lspci | grep -i nvidia
 
# Confirm driver sees both GPUs
nvidia-smi
 
# List GPU device files — you should now see nvidia0 and nvidia1
ls -la /dev/nvidia*

Verify IOMMU Groups

# Check IOMMU groups for all PCI devices
find /sys/kernel/iommu_groups -type l | sort -V | while read f; do
  device=$(basename "$f")
  group=$(echo "$f" | grep -oP 'iommu_groups/\K[0-9]+')
  echo "Group $group: $(lspci -nns $device 2>/dev/null)"
done | grep -i nvidia

Set Runtime Power Policy Before Splitting Workloads

Once both GPUs are visible to the host, decide how you want them to behave under sustained load before you start assigning them to services.

For persistence mode, boot-time reapplication, and the 250 W dual-GPU tradeoffs on a 1200 W PSU, continue to GPU Power Management On Proxmox.

Strategy A: One GPU Per Container

Container 100:

# Edit container 100 config
nano /etc/pve/lxc/100.conf
 
# Add GPU 0 devices only:
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
dev4: /dev/nvidia-caps/nvidia-cap1
dev5: /dev/nvidia-caps/nvidia-cap2

Container 101:

# Edit container 101 config
nano /etc/pve/lxc/101.conf
 
# Add GPU 1 device only (nvidia1):
dev0: /dev/nvidia1
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
dev4: /dev/nvidia-caps/nvidia-cap1
dev5: /dev/nvidia-caps/nvidia-cap2

Verify:

# Container 100
pct exec 100 nvidia-smi
 
# Container 101
pct exec 101 nvidia-smi

Strategy B: Both GPUs In One Container

# Edit container 100 config
nano /etc/pve/lxc/100.conf
 
# Pass both GPUs to the same container:
dev0: /dev/nvidia0
dev1: /dev/nvidia1
dev2: /dev/nvidiactl
dev3: /dev/nvidia-uvm
dev4: /dev/nvidia-uvm-tools
dev5: /dev/nvidia-caps/nvidia-cap1
dev6: /dev/nvidia-caps/nvidia-cap2

# Inside container 100
nvidia-smi

Restrict Processes With `CUDA_VISIBLE_DEVICES`

Once both GPUs are visible inside one container, NVIDIA documents CUDA_VISIBLE_DEVICES as the standard way to restrict which GPU indices or UUIDs a CUDA application can see. If you need stable targeting across reboots, prefer UUIDs over indices.⁶⁷

# Pin a process to GPU 0 only
CUDA_VISIBLE_DEVICES=0 ollama serve
 
# Pin a process to GPU 1 only
CUDA_VISIBLE_DEVICES=1 ./llama-server -m model.gguf
 
# Use both GPUs (default when variable is unset or set to "all")
CUDA_VISIBLE_DEVICES=0,1 python train.py
 
# Check which GPU a running process is using
nvidia-smi pmon -s u  # Real-time per-GPU process monitor

Common Recovery Paths

Secure Boot Blocking The Driver

Re-running the installer without changing Secure Boot state will not fix an unsigned-module failure. On systems that enforce signed kernel modules, either enroll a trusted key and sign the NVIDIA module or disable Secure Boot before reinstalling.⁸

# Remove failed installation completely
apt remove --purge nvidia-driver-* nvidia-dkms-* 2>/dev/null
apt autoremove -y
apt clean
 
# Update system
apt update && apt upgrade -y
 
# Reinstall kernel headers
apt install -y pve-headers-$(uname -r)
 
# Download and install fresh driver
cd /tmp
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/580.126.09/NVIDIA-Linux-x86_64-580.126.09.run
chmod +x NVIDIA-Linux-x86_64-580.126.09.run
 
# Install with DKMS after Secure Boot is handled
./NVIDIA-Linux-x86_64-580.126.09.run --dkms
 
# Verify successful installation
nvidia-smi

DKMS Mismatch After A Kernel Update

NVIDIA notes that DKMS usually smooths over minor kernel updates, but a major kernel jump can still require a newer driver branch if the kernel APIs changed.⁹

# Install headers for the NEW kernel
apt install -y pve-headers-$(uname -r)
 
# Rebuild NVIDIA module for new kernel (replace version if different)
dkms install nvidia/580.126.09 -k $(uname -r)
 
# Verify it installed
dkms status
 
# Load the modules
modprobe nvidia
modprobe nvidia-uvm
modprobe nvidia-modeset
 
# Verify devices appeared
ls -la /dev/nvidia*
 
# Restore persistence settings
/usr/local/bin/nvidia-persist.sh

GPU Running Too Hot

nvidia-smi -pl changes the software power limit within the min/max range the GPU reports.¹⁰

# Check current temperature
nvidia-smi --query-gpu=temperature.gpu --format=csv
 
# Reduce power limit to lower heat
nvidia-smi -pl 200  # Reduce from 350W to 200W

GPUs For Local AI — the conceptual side of model sizing, VRAM, and workload tradeoffs.
GPU Power Management On Proxmox — persistence mode, boot-time reapplication, and the power-cap tradeoffs once the host can see the cards.
Open WebUI And Ollama On Proxmox — the quickest concrete GPU-backed workload that sits on top of this host configuration.
llama.cpp Inference On Proxmox — the lower-level single-model inference stack once you want more control than Ollama offers.
llama.cpp Router Mode On Proxmox — the multi-model serving path that leans hardest on clean GPU ownership and VRAM discipline.
Update And Maintenance — how GPU containers and driver versions fit into the broader maintenance window.
Monitoring And Alerts — GPU telemetry and alerting once the host is running real workloads.

NVIDIA's installer compiles a kernel interface specifically for the running kernel and requires the matching kernel source or headers plus a linker to do so: Installing the NVIDIA Driver. ↩
NVIDIA's current driver installation guide explicitly frames the choice between distribution-specific packages and the generic NVIDIA installer as a deployment decision, and calls out kernel headers, module support, and verification as part of that path: NVIDIA Driver Installation Guide. ↩ ↩² ↩³
NVIDIA's generic installer is the .run package, which extracts and launches nvidia-installer; that installer also offers DKMS registration when DKMS is present on the system: Installing the NVIDIA Driver. ↩ ↩²
NVIDIA documents nvidia-smi as the verification tool for driver state, documents GPU UUID reporting, and documents the Linux device node naming as /dev/nvidia[minor number]: NVIDIA System Management Interface. ↩
Proxmox documents that containers use the host kernel directly, and pct.conf documents dev[n] as the container config key used to pass specific device nodes through to a container: Proxmox Container Toolkit, pct.conf - Proxmox VE Container Configuration. ↩ ↩²
NVIDIA documents CUDA_VISIBLE_DEVICES as the environment variable that controls which GPU indices or UUIDs are visible to a CUDA application: CUDA_VISIBLE_DEVICES. ↩
NVIDIA recommends UUID or PCI bus ID over numeric indices when you need stable targeting, because enumeration order is not guaranteed to remain consistent across reboots: NVIDIA System Management Interface. ↩
NVIDIA documents that Secure Boot systems may require signed kernel modules, and describes module signing plus MOK enrollment as the supported recovery path when unsigned modules cannot be loaded: Installing the NVIDIA Driver. ↩
NVIDIA documents that DKMS typically rebuilds registered modules for minor kernel updates, but warns that a major kernel update may require a newer NVIDIA driver because of kernel API compatibility changes: Installing the NVIDIA Driver. ↩
NVIDIA documents nvidia-smi -pl as the software power-limit control and reports the default, minimum, and maximum supported power limits for supported GPUs: NVIDIA System Management Interface. ↩

Comments