ai-infrastructure

7 posts tagged “ai-infrastructure”

June 6, 2026

Building a Hybrid LLM Platform on EKS, Part 4: Platform Add-ons, the Load Balancer Controller, and Karpenter

Part 4 of our hands-on EKS series. We install the two add-ons every production EKS cluster needs: the AWS Load Balancer Controller so Kubernetes Ingress objects provision real ALBs, and Karpenter for cost-aware autoscaling — including the GPU NodePool that scales to zero between inference workloads.

eks kubernetes aws-cdk karpenter load-balancer-controller autoscaling irsa ai-infrastructure typescript

June 6, 2026

Building a Hybrid LLM Platform on EKS, Part 3: Node Groups, GPU AMIs, and the NVIDIA Device Plugin

Part 3 of our hands-on EKS series. We add worker nodes to the empty cluster from Part 2: a CPU system pool for add-ons and the hybrid router, a GPU pool for vLLM model servers, the NVIDIA device plugin DaemonSet, and the taints and labels that make scheduling predictable.

eks kubernetes aws-cdk gpu nvidia node-groups ai-infrastructure typescript

May 30, 2026

Building a Hybrid LLM Platform on EKS, Part 2: The Control Plane, IAM, and IRSA

Part 2 of our hands-on EKS series. We provision the EKS cluster into the VPC from Part 1, wire up OIDC federation and IRSA so pods authenticate without static credentials, and end with a working kubectl connection to a real cluster.

eks kubernetes aws-cdk iam irsa oidc ai-infrastructure typescript

May 24, 2026

Building a Hybrid LLM Platform on EKS, Part 1: Architecture and the Network Foundation

Part 1 of a hands-on series building the EKS-based hybrid LLM platform referenced throughout this blog. We map out the full architecture, then provision the VPC, subnets, NAT, and VPC endpoints with AWS CDK — the network foundation every later part builds on.

eks kubernetes aws-cdk llm ai-infrastructure hybrid-ai vpc typescript

May 21, 2026

Observability for LLM Applications on Kubernetes: Tokens, Traces, and Cost per Request

How to instrument self-hosted and hybrid LLM workloads with OpenTelemetry, Prometheus, and Langfuse — tracking time-to-first-token, tokens per second, GPU utilization, and unit economics down to the individual request.

kubernetes llm observability opentelemetry finops ai-infrastructure

April 3, 2026

Self-Hosting LLMs on Kubernetes: A Practical Guide

How to deploy, serve, and autoscale open-source large language models on Kubernetes with vLLM — from GPU node pools and deployment manifests to KEDA-based autoscaling and production guardrails.

kubernetes llm gpu ai-infrastructure self-hosting

January 1, 2025

FinOps for AI Infrastructure: Beyond Cloud Cost Tags

Traditional FinOps practices fall short for AI workloads. Here's how to build a cost management strategy that accounts for GPU economics.

finops cost-management ai-infrastructure cloud