Building a Hybrid LLM Platform on EKS
Across this blog we keep referring to a hybrid LLM platform — frontier models for the hard reasoning, self-hosted open-source models for the high-volume work, all on Kubernetes. This series builds it from an empty AWS account to a working inference service, one layer at a time, as reproducible AWS CDK infrastructure you can deploy and tear down yourself.
The Target Architecture
┌─────────────────────────┐
client requests ───► │ ALB (public subnets) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ hybrid router / gateway │ ← cloud vs. local
│ (CPU node pool) │
└──────┬─────────────┬─────┘
│ │
frontier API │ │ local inference
(egress via │ ▼
NAT) │ ┌──────────────────────┐
▼ │ vLLM model servers │
┌──────────┤ (GPU node pool) │
│ Claude / │└──────────────────────┘
│ GPT │
└──────────┘
all of it on EKS, in private subnets, observed + autoscaled The Parts
Each part deploys cleanly on its own. Published parts link to the full walkthrough; the rest are on the way.
Serving Local Models with vLLM
Coming soonDeploying the self-hosted inference layer — vLLM model servers, loading weights, and request-based autoscaling so GPU capacity follows demand.
The Hybrid Router
Coming soonThe gateway that makes it hybrid: routing each request to a frontier model for hard reasoning or to a local model for high-volume execution work.
Observability & Cost Telemetry
Coming soonWiring observability into the platform — OpenTelemetry traces through the router, Prometheus and Grafana for GPU and vLLM metrics, and Langfuse for per-request token and cost telemetry.
Testing, Load & Examples
Coming soonValidating the platform end-to-end — load testing the inference layer, sample workloads, and proving the routing economics under real traffic.
Prefer the high-level version? The companion Hybrid AI Playbook and Self-Hosting LLMs on Kubernetes cover the why behind this build.
Want This Built for Your Team?
We build hybrid LLM platforms like this one for clients — reproducible, cost-aware, and documented so your team can own it. Book a free call and we'll map the fastest path.
Book a Free Call