Forty checks across six domains, written from real deployments and real
incident reviews. It covers the stack this community actually runs —
Ollama, llama.cpp, vLLM, Open WebUI, Chroma, Qdrant — not an
abstract "enterprise AI platform."
No email gate. No PDF funnel. Print it (Cmd/Ctrl+P renders
a clean black-on-white copy), share it, or open a PR if I got something wrong.
If your team clears all forty, you're ahead of most production deployments I've reviewed.
Most local-AI exposure starts here: an inference API bound to all interfaces with no authentication, because that's what the quickstart did.
A-01
A-02
A-03
A-04
A-05
A-06
A-07
§B — RAG & Vector Stores
Your corpus is the crown jewels, embedded.
RAG quietly concentrates your most sensitive documents into one queryable store — then answers anyone who can reach it.
B-01
B-02
B-03
B-04
B-05
B-06
B-07
§C — Model Artifacts
Weights are executables with better PR.
You would never run a random binary from the internet as root. A model file can be the same thing wearing a lab coat.
C-01
C-02
C-03
C-04
C-05
C-06
C-07
§D — Host & Network
The GPU box is a high-value target.
It holds your models, your corpus, and often your credentials — and it was probably set up in a hurry.
D-01
D-02
D-03
D-04
D-05
D-06
§E — Access & Usage
Know who asked, and what for.
Private AI removes the vendor's abuse controls. Whatever governance you want now has to be yours.
E-01
E-02
E-03
E-04
E-05
E-06
§F — Evidence & Response
If you can't prove it, it didn't happen.
The auditor and the attacker are both coming eventually. These checks decide whether you have answers or apologies.
F-01
F-02
F-03
F-04
F-05
F-06
F-07
Colophon
Written by Joey Victorino, co-founder of
Qompute AI. This document is
free to share and reproduce with attribution. Corrections and additions are
welcome — the stack moves fast and so should this list.
If you're deploying at a scale where a mistake is expensive, that's the work I do.