← All posts

hybrid

2 posts tagged “hybrid”

The Cost-Efficient AI Stack: Ship AI Features Without the Runaway Bill

Most teams overpay for AI by routing every request to a frontier model. This is the architecture we build instead — hybrid cloud+local routing, self-hosted inference, agent orchestration, and cost-per-request observability — and the single principle that ties it together: send each unit of work to the cheapest model that can do it well.

The Agent Control Plane: Frontier Models Plan, Your Kubernetes Fleet Executes

How to orchestrate a fleet of AI agents using a shared task queue — frontier models like Claude handle planning and decomposition, while a local Kubernetes worker pool runs the high-volume execution tasks. Covers the task ledger, dynamic task creation, lane-based routing, and KEDA autoscaling.