GPU & AI
Foundations and practical notes for GPU computing, local AI infrastructure, and model-serving workflows.
Published October 23, 2024
GPU & AI
This section is the home for GPU computing and AI infrastructure topics that are broader than any one platform.
That separation matters. Some guides will absolutely be about how to run AI workloads on Proxmox, but the underlying ideas around GPUs, model serving, inference stacks, and local AI systems deserve their own understandable home.
Planned Focus
- GPU and AI compute fundamentals
- local model-serving patterns
- inference and tooling architecture
- performance, monitoring, and resource tradeoffs
- cross-platform notes that should not be trapped inside one hypervisor section
In This Section
- GPUs For Local AI - why dedicated GPUs change what local AI feels like, how VRAM shapes model choice, and when single- versus dual-GPU design is actually justified.
Running This On Proxmox
When a guide is specifically about doing one of these things on Proxmox, it should live under Proxmox rather than being forced into this section just because GPUs are involved.
That means the conceptual side stays here, while the host-specific execution path can sit under Proxmox Workloads.
Useful entry points:
- GPU Passthrough On Proxmox — the host-side NVIDIA setup and LXC device exposure layer.
- Open WebUI And Ollama On Proxmox — the quickest way to get a usable local chat stack online.
- Open WebUI Standalone Frontend On Proxmox — the browser-first split once inference already lives in dedicated Ollama or llama.cpp guests.
- llama.cpp Inference On Proxmox — the single-model GGUF-native path when you want tighter runtime control.
- llama.cpp Router Mode On Proxmox — the multi-model serving path once one model no longer feels like enough.