ai

7 posts tagged “ai”

May 29, 2026

Securing Self-Hosted LLMs and AI Agents on Kubernetes

Harden self-hosted vLLM and AI agents on Kubernetes: an auth/rate-limit gateway, gVisor tool sandboxing, prompt-injection guardrails, scoped secrets, and signed model weights — mapped to the OWASP LLM Top 10.

security ai agents kubernetes llm prompt-injection supply-chain

May 24, 2026

The Cost-Efficient AI Stack: Ship AI Features Without the Runaway Bill

Most teams overpay for AI by routing every request to a frontier model. This is the architecture we build instead — hybrid cloud+local routing, self-hosted inference, agent orchestration, and cost-per-request observability — and the single principle that ties it together: send each unit of work to the cheapest model that can do it well.

ai llm cost-optimization hybrid infrastructure finops

May 23, 2026

Build a Personal AI Dev Environment: Hybrid Models, Local Inference, and a Workflow That Costs Almost Nothing

The production patterns we deploy for teams — hybrid cloud/local routing, self-hosted models, agent orchestration — scaled down to a single developer's workstation. A practical guide to building a personal AI dev environment with Ollama, Claude Code, and a local router that keeps your token bill near zero.

ai llm local-models ollama claude-code developer-tools

May 22, 2026

The Agent Control Plane: Frontier Models Plan, Your Kubernetes Fleet Executes

How to orchestrate a fleet of AI agents using a shared task queue — frontier models like Claude handle planning and decomposition, while a local Kubernetes worker pool runs the high-volume execution tasks. Covers the task ledger, dynamic task creation, lane-based routing, and KEDA autoscaling.

ai agents orchestration kubernetes llm hybrid

May 14, 2026

The Hybrid AI Playbook: Cloud Models for Thinking, Local Models for Doing

How to cut your AI costs by 60-80% using a hybrid approach — Claude or GPT for planning and complex reasoning, local models like Llama and Qwen for execution tasks like code generation, summarization, and data extraction.

ai llm cost-optimization local-models ollama

February 7, 2025

Using AI to Monitor Kubernetes Clusters and Make Dynamic Scaling Decisions

How to move beyond static thresholds and use AI-driven observability to detect anomalies, predict traffic patterns, and automate scaling decisions across your Kubernetes infrastructure.

kubernetes ai monitoring autoscaling observability

February 6, 2025

A Practical Guide to AI for Small and Mid-Size Businesses

No hype, no jargon — a straightforward guide for business owners evaluating where AI actually makes sense and how to adopt it without wasting money.

ai small-business strategy automation