← All posts

ai-infrastructure

7 posts tagged “ai-infrastructure”

Building a Hybrid LLM Platform on EKS, Part 4: Platform Add-ons, the Load Balancer Controller, and Karpenter

Part 4 of our hands-on EKS series. We install the two add-ons every production EKS cluster needs: the AWS Load Balancer Controller so Kubernetes Ingress objects provision real ALBs, and Karpenter for cost-aware autoscaling — including the GPU NodePool that scales to zero between inference workloads.

Building a Hybrid LLM Platform on EKS, Part 3: Node Groups, GPU AMIs, and the NVIDIA Device Plugin

Part 3 of our hands-on EKS series. We add worker nodes to the empty cluster from Part 2: a CPU system pool for add-ons and the hybrid router, a GPU pool for vLLM model servers, the NVIDIA device plugin DaemonSet, and the taints and labels that make scheduling predictable.

Building a Hybrid LLM Platform on EKS, Part 2: The Control Plane, IAM, and IRSA

Part 2 of our hands-on EKS series. We provision the EKS cluster into the VPC from Part 1, wire up OIDC federation and IRSA so pods authenticate without static credentials, and end with a working kubectl connection to a real cluster.

Building a Hybrid LLM Platform on EKS, Part 1: Architecture and the Network Foundation

Part 1 of a hands-on series building the EKS-based hybrid LLM platform referenced throughout this blog. We map out the full architecture, then provision the VPC, subnets, NAT, and VPC endpoints with AWS CDK — the network foundation every later part builds on.

Observability for LLM Applications on Kubernetes: Tokens, Traces, and Cost per Request

How to instrument self-hosted and hybrid LLM workloads with OpenTelemetry, Prometheus, and Langfuse — tracking time-to-first-token, tokens per second, GPU utilization, and unit economics down to the individual request.

Self-Hosting LLMs on Kubernetes: A Practical Guide

How to deploy, serve, and autoscale open-source large language models on Kubernetes with vLLM — from GPU node pools and deployment manifests to KEDA-based autoscaling and production guardrails.

FinOps for AI Infrastructure: Beyond Cloud Cost Tags

Traditional FinOps practices fall short for AI workloads. Here's how to build a cost management strategy that accounts for GPU economics.