GPU & AI

This section is the home for GPU computing and AI infrastructure topics that are broader than any one platform.

That separation matters. Some guides will absolutely be about how to run AI workloads on Proxmox, but the underlying ideas around GPUs, model serving, inference stacks, and local AI systems deserve their own understandable home.

Planned Focus

GPU and AI compute fundamentals
local model-serving patterns
inference and tooling architecture
performance, monitoring, and resource tradeoffs
cross-platform notes that should not be trapped inside one hypervisor section

In This Section

GPUs For Local AI - why dedicated GPUs change what local AI feels like, how VRAM shapes model choice, and when single- versus dual-GPU design is actually justified.

Running This On Proxmox

When a guide is specifically about doing one of these things on Proxmox, it should live under Proxmox rather than being forced into this section just because GPUs are involved.

That means the conceptual side stays here, while the host-specific execution path can sit under Proxmox Workloads.

Useful entry points:

GPU Passthrough On Proxmox — the host-side NVIDIA setup and LXC device exposure layer.
Open WebUI And Ollama On Proxmox — the quickest way to get a usable local chat stack online.
Open WebUI Standalone Frontend On Proxmox — the browser-first split once inference already lives in dedicated Ollama or llama.cpp guests.
llama.cpp Inference On Proxmox — the single-model GGUF-native path when you want tighter runtime control.
llama.cpp Router Mode On Proxmox — the multi-model serving path once one model no longer feels like enough.

GPU & AI

Planned Focus

In This Section

Running This On Proxmox

Comments