Deploying LiteLLM on AKS with azd and Bicep for a Secure, Scalable LLM Gateway
Deploying LiteLLM on AKS with azd and Bicep for a Secure, Scalable LLM Gateway
Date: 2026-06-13
Discover how to self-host LiteLLM on Azure Kubernetes Service using azd and Bicep — complete with private networking, Redis caching, spend tracking, and automatic TLS.
Tags: ["Azure", "AKS", "LiteLLM", "Bicep", "azd", "Kubernetes", "OpenAI", "Redis", "PostgreSQL"]
Self-hosting large language model (LLM) proxies can quickly become complex when you factor in operational security, cost management, and scalability. APIs from multiple LLM providers introduce demanding requirements for authentication, caching, and usage tracking — all while maintaining high availability.
LiteLLM is an open-source LLM proxy that simplifies this by routing requests through a single OpenAI-compatible endpoint to a wide range of supported backends, including Azure OpenAI, Anthropic, and over 100 providers. Having one unified gateway dramatically improves control over API keys, usage budgets, and caching.
In this post, we'll explore how Luke Murray successfully deployed LiteLLM on Azure Kubernetes Service (AKS) using Azure Developer CLI (azd) coupled with Bicep Infrastructure-as-Code. This approach provides a production-ready, self-hosted LLM gateway featuring private networking, Redis caching for cross-pod scalability, PostgreSQL-based spend tracking, and automated TLS certificate management — all wrapped in a single command deployment experience.
We'll cover the overall architecture, the detailed network design, key configuration points, how multi-replica caching behaves, and important AKS production best practices derived from the implementation.
Architecture Overview
┌────────────────────────────────────────────┐
│Architecture │
├────────────────────────────────────────────┤
│• Enterprise data sources │
│• Foundry platform │
│• AI applications │
└────────────────────────────────────────────┘
Key Technical Observations
-
Comprehensive Private Networking: Every critical data service—including PostgreSQL, Redis, ACR, and Key Vault—is isolated via Azure Private Endpoint within dedicated subnets. This design eliminates public IP exposure and strengthens security posture.
-
Azure CNI Overlay with Pod-Level Network Policy: The AKS cluster uses Azure CNI Overlay networking and granular Azure Network Policies to segment pods, achieving isolation and compliance at network layers.
-
Split-DNS CoreDNS Patch: CoreDNS is patched post-provisioning to resolve Azure private DNS zones for private endpoints internally while forwarding all other DNS queries to a public resolver (8.8.8.8). This hybrid DNS ensures cert-manager's Let's Encrypt HTTP-01 challenges function reliably alongside private DNS resolution.
-
Redis-Backed Distributed Cache Across Replicas: Azure Managed Redis supports caching in a multi-replica environment. Cache misses populate Redis from one pod, and subsequent requests served by any other pod get the cached response, achieving cross-pod cache consistency.
-
Production Best Practices from LiteLLM Documentation: Configurations such as batching spend writes every 60 seconds, controlling DB connection pool to 10 per pod, and serving requests during database unavailability improve operational stability and reduce PostgreSQL load.
-
Robust Rolling Updates with Graceful Pod Shutdown: Using Kubernetes deployment strategies with
maxUnavailable: 0and long termination grace periods ensures zero downtime during upgrades, respecting in-flight request timeouts.
How It Works
Infrastructure Provisioning
The deployment leverages an azd lifecycle which automates the entire provisioning and deployment process:
-
Preprovision Hooks: Generate random secrets for PostgreSQL credentials, LiteLLM master keys, and salt keys, ensuring strong security defaults. Also installs necessary tooling like
kustomizeif missing. -
Provisioning Bicep Template: Deploys all resources in a clean resource group, including AKS cluster with system and user node pools, Azure Container Registry, PostgreSQL Flexible Server, Managed Redis, Key Vault, Azure OpenAI, NAT Gateway, and private DNS zones linked to the VNet.
-
Postprovision Configuration: Retrieves AKS cluster credentials, applies a CoreDNS split-DNS patch to handle mixed public and private DNS resolution, and deploys Kubernetes manifests with
kustomize, including the LiteLLM proxy and ingress controller. -
Postdeploy Operations: Refreshes Kubernetes secrets with current connection strings, triggers proxy rollout, and synchronizes DNS A records for the public ingress IP.
This pipeline enables a "single-command" deployment experience with azd up, taking around 10-15 minutes for the full environment.
Network Design Details
The virtual network is partitioned into:
snet-aks(10.30.0.0/23) for AKS nodes (both system and user pools).snet-pe(10.30.2.0/24) hosting all private endpoints for PostgreSQL, Redis, ACR, and Key Vault.snet-ingress(10.30.3.0/24) reserved for the NGINX ingress controller.
Outbound connectivity uses a NAT Gateway attached to the AKS subnet with a dedicated public IP to avoid SNAT exhaustion, critical at scale.
DNS zones like privatelink.postgres.database.azure.com, privatelink.redis.azure.net, privatelink.azurecr.io, and privatelink.vaultcore.azure.net are linked to the VNet, enabling pods to resolve private endpoint IPs transparently.
LiteLLM Proxy Configuration
The proxy config (managed via Kubernetes ConfigMap) specifies:
- The list of routed models (Azure OpenAI GPT-4o, OpenCode Zen & Go models).
- Authentication via a master key with virtual API keys scoped to individual consumers.
- Redis caching parameters with TLS-enabled connections.
- Spend tracking backed by PostgreSQL with batching to limit DB write load.
- Enabling traffic through even if DB connection is temporarily unavailable.
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_base: os.environ/AZURE_OPENAI_ENDPOINT
api_key: os.environ/AZURE_OPENAI_KEY
api_version: "2024-10-21"
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
database_connection_pool_limit: 10
proxy_batch_write_at: 60
allow_requests_on_db_unavailable: true
litellm_settings:
cache: true
cache_params:
type: redis
host: os.environ/REDIS_HOST
port: os.environ/REDIS_PORT
password: os.environ/REDIS_PASSWORD
ssl: true
Adding non-Azure models, like OpenCode Zen and Go, is straightforward, exposing their OpenAI-compatible endpoints under the same proxy umbrella — centralized authentication, routing, and caching.
Multi-Replica and Caching Behavior
Running multiple LiteLLM pods behind the NGINX ingress controller enables high availability and load balancing.
The Redis cache is shared across pods, ensuring that if one pod misses a cache entry, it is written to Redis so that subsequent requests handled by any pod get a cache hit.
A basic test showed:
Pod A cache miss: 1.040s
Pod B cache hit: 0.636s
This behavior confirms correct cross-pod cache behavior using Azure Managed Redis.
AKS Pod Configuration for Production
-
Rolling Updates:
maxUnavailable: 0andmaxSurge: 1ensure pods never drop below the desired count during updates. -
Readiness & Liveness Probes: Readiness probes delay 30 seconds to allow Prisma migrations before accepting traffic, and liveness probes restart pods if unhealthy after three failures.
-
Graceful Shutdown: A 620-second termination grace period with a 5-second preStop delay allows in-flight requests to finish cleanly before pod termination.
-
Security Hardening: Containers run with
readOnlyRootFilesystem: true, drop all capabilities, and run as non-root. Writable directories are mounted viaemptyDirto satisfy Prisma and UI requirements.
Quick Tips & Tricks
-
Use Azure CNI Overlay with Network Policies — This enables pod-level network segmentation and private IP assignments conforming to enterprise security standards.
-
Patch CoreDNS for Split DNS Resolution — Prevent failures in challenge validations by routing private zone queries internally and public zone queries to external DNS like 8.8.8.8.
-
Batch PostgreSQL Writes to Reduce Load — Group spend tracking updates into fixed intervals (
proxy_batch_write_at) to avoid excessive frequent writes. -
Leverage Managed Redis for Cache Consistency Across Replicas — A centralized Redis cache backing multiple pods guarantees cache hits regardless of request routing.
-
Set
allow_requests_on_db_unavailable: truein Production — Enables high availability by allowing LiteLLM to continue serving requests even if the DB spikes or briefly disconnects. -
Use Virtual Keys for Fine-Grained Access Control — Instead of exposing multiple upstream API keys, distribute scoped virtual keys via LiteLLM to enforce budgets and permitted models per user/team.
Conclusion
Deploying LiteLLM on AKS with azd and Bicep delivers a powerful, self-hosted, and production-grade LLM gateway. This approach tightly integrates Azure-managed infrastructure with Kubernetes best practices to meet requirements for security, scalability, cost control, and operational visibility.
Private endpoints ensure zero public exposure to backend data services while the NGINX ingress with cert-manager automates TLS certificates for client-facing access. Redis caching combined with multi-pod replicas enables responsiveness and high availability. Spend tracking via PostgreSQL adds crucial cost management and governance.
This fully automated, IaC-driven deployment path lowers the barrier to adopting LiteLLM within enterprises or teams needing centralized control of heterogeneous LLM providers. As LiteLLM continues evolving with features like MCP Gateway integration, this foundation opens paths for tight integrations with Microsoft Learn and GitHub servers — turning the proxy into a comprehensive AI operations hub.
References
- Running LiteLLM on AKS with azd and Bicep | luke.geek.nz — Primary source blog post by Luke Murray
- LiteLLM Documentation — Official docs covering configuration and best practices
- LiteLLM Production Best Practices — Performance and operational configuration insights
- Azure Developer CLI — Azure CLI tooling used for deployment
- AKS Network Concepts and Security — Deep dive into AKS networking and policies
- HTTPS Ingress on AKS with cert-manager — Setup of TLS with ingress controllers
- Azure Private Link and Private Endpoints — Overview of private connectivity
- Azure Cache for Redis with Private Endpoints — Securing managed Redis in VNet environments
- PostgreSQL Flexible Server Networking — Private connectivity patterns for PostgreSQL

Request flow diagram courtesy of Luke Murray

LiteLLM UI configuration demo

Demonstration of cross-pod Redis cache hit

Deployment rollout with azd and HPA scaling

Virtual key lifecycle operations

Cache miss and hit timing demonstration

Health check and model catalogue retrieval

Article by Luke Murray from luke.geek.nz
azd down --purge
Command to tear down and clean up the entire deployment