Back to Blog
May 13, 2026

Fixing AKS NAP Scheduling for ZRS Disk Persistent Volumes Using a Mutating Admission Webhook

Share

Fixing AKS NAP Scheduling for ZRS Disk Persistent Volumes Using a Mutating Admission Webhook

Date: 2026-05-13

Discover how to resolve AKS NAP pod scheduling issues with ZRS Azure Disk Persistent Volumes using a Kubernetes mutating admission webhook as a temporary workaround.

Tags: ["AKS", "Kubernetes", "Node Auto Provisioning", "ZRS Disk", "Admission Webhook"]

Fixing AKS NAP Scheduling for ZRS Disk Persistent Volumes Using a Mutating Admission Webhook

If you’re running Azure Kubernetes Service (AKS) with Node Auto Provisioning (NAP, also known as Karpenter) and leveraging Azure Disk Zone-Redundant Storage (ZRS) Persistent Volumes (PVs) with volumeBindingMode: Immediate, you may encounter a common and frustrating problem: pods remain stuck in the Pending state indefinitely.

This issue arises due to how the Azure Disk CSI driver annotates PV node affinity for ZRS disks combined with Karpenter’s node provisioning logic. In this post, we explore a temporary workaround involving a Kubernetes mutating admission webhook that dynamically merges node affinity zones on ZRS Azure Disk PVs. This patch eliminates scheduling conflicts and allows pods to run correctly in the desired zones.

You’ll learn the problem context, implementation details of the mutating webhook written in Go, how to deploy it with Terraform on AKS, and verify that the fix works. Although this is a temporary proof-of-concept, it can help unblock teams facing this issue until the upstream Karpenter fix is released.

Architecture Overview

┌───────────────────────────────┐
│      Azure Kubernetes Service │
│  Node Auto Provisioning (NAP) │
└──────────────┬────────────────┘
               │
               │ Pod Scheduling requests with PVs using ZRS Azure Disks
               ↓
┌───────────────────────────────┐
│      Azure Disk CSI Driver    │
│  Creates PVs with multiple    │
│  NodeSelectorTerms for zones  │
└──────────────┬────────────────┘
               │
               │ PV Admission Requests intercepted by
               │ Mutating Admission Webhook
               ↓
┌───────────────────────────────┐
│ Kubernetes Mutating Admission │
│      Webhook Service          │
│  Merges multiple zone terms   │
│  into a single NodeSelectorTerm│
└──────────────┬────────────────┘
               │
               │ Mutated PV with merged zone affinity
               ↓
┌───────────────────────────────┐
│        Karpenter/NAP          │
│   Node Provisioning honors    │
│      correct zone affinity    │
└───────────────────────────────┘

This architecture shows the data flow: AKS NAP requests nodes based on PV affinity, but the original PV has multiple zone selectors that confuse Karpenter. The webhook merges zones into one selector, enabling proper provisioning.

Key Technical Observations

  • ZRS Azure Disk NodeAffinity Representation — The Azure Disk CSI driver assigns nodeAffinity for ZRS disks as multiple NodeSelectorTerms, each specifying a single availability zone with OR semantics. This causes trouble because Karpenter only evaluates the first of these terms.

  • Karpenter’s First-Term-Only Zone Evaluation — Karpenter’s scheduling relies only on the first NodeSelectorTerm when deciding which AZ to provision nodes in. If the pod specifies a zone not matching the first term, scheduling fails silently, leaving pods in Pending state.

  • Mutating Webhook Approach — By intercepting PV creation and update admission requests, the webhook merges all zone terms into a single NodeSelectorTerm that includes all zones in a single OR expression. This fixes affinity mismatches on-the-fly.

  • Minimal and Secure Design — The webhook is a compact Go HTTP server running as a distroless container with a read-only filesystem, deployed with least privileges under Kubernetes RBAC. It uses cert-manager to handle TLS certificates securely.

  • Terraform for AKS+Webhook Setup — The full infrastructure, including AKS with Node Auto Provisioning, Container Registry, cert-manager, and webhook deployment, is provisioned via Terraform code offering repeatable, automated cluster setup.

  • Fail-Open Webhook Configuration — The webhook’s failurePolicy is set to Ignore, ensuring PV creations are not blocked even if the webhook is unavailable, preserving cluster stability.

How It Works

1. Root Cause: Multi-Term NodeAffinity Confuses Karpenter

Azure Disk CSI driver, when provisioning a ZRS disk with volumeBindingMode: Immediate, writes the PV nodeAffinity with multiple NodeSelectorTerms, one for each availability zone:

nodeAffinity:
  required:
    nodeSelectorTerms:
    - matchExpressions:
      - key: topology.disk.csi.azure.com/zone
        operator: In
        values: [eastus2-1]    # term 0 – only zone 1
    - matchExpressions:
      - key: topology.disk.csi.azure.com/zone
        operator: In
        values: [eastus2-2]    # term 1 – only zone 2
    - matchExpressions:
      - key: topology.disk.csi.azure.com/zone
        operator: In
        values: [eastus2-3]    # term 2 – only zone 3

Karpenter evaluates only the first term (eastus2-1) and ignores the others, so if your pod requests zone 2 but the first term is zone 1, the scheduler detects an affinity mismatch and the pod never runs.

2. Webhook Mutation Logic: Merge Zones into One Term

The webhook intercepts the admission request of PV objects and performs these steps:

  • Deserialize the incoming PV JSON.
  • Detect if the PV is a ZRS Azure Disk by checking if the skuName in the CSI volume attributes contains "ZRS".
  • Extract all zones listed across all NodeSelectorTerms.
  • Merge all identified zones into a single NodeSelectorTerm with all zones in the values array, for example:
nodeAffinity:
  required:
    nodeSelectorTerms:
    - matchExpressions:
      - key: topology.disk.csi.azure.com/zone
        operator: In
        values: [eastus2-1, eastus2-2, eastus2-3]   # merged – any zone OK
  • Construct a JSON Patch to replace the PV's nodeAffinity.required.nodeSelectorTerms with the merged version.
  • Return the patched admission response to the API server.

This ensures Karpenter sees all zones in one term and provisions nodes matching the pod’s zone preference correctly.

3. Webhook Implementation Highlights

The webhook is a simple Go web server exposing two endpoints:

  • /mutate handles admission review webhook requests.
  • /healthz for readiness and liveness probes.

Core mutation resides in internal/handler/handler.go:

  • Robust JSON unmarshaling of admission requests.
  • Logic focused on PV resources only, ignoring other types.
  • Handles dry-run mode driven by env var DRY_RUN.
  • Careful logging to gain insights into mutation decisions.

Example snippet from the mutation function:

isZRS := false
if pv.Spec.CSI != nil {
    sku := pv.Spec.CSI.VolumeAttributes["skuName"]
    if strings.Contains(sku, "ZRS") {
        isZRS = true
    }
}
...
// Collect zones from all terms:
zones := map[string]struct{}{}
for _, term := range pv.Spec.NodeAffinity.Required.NodeSelectorTerms {
    for _, expr := range term.MatchExpressions {
        if expr.Key == topologyZoneKey {
            for _, v := range expr.Values {
                zones[v] = struct{}{}
            }
        }
    }
}
// Merge zones:
merged := make([]string, 0, len(zones))
for z := range zones {
    merged = append(merged, z)
}
sort.Strings(merged)
...

4. Deployment Workflow

  • Infrastructure provisioning with Terraform, which sets up:
  • AKS cluster with Node Auto Provisioning enabled.
  • Azure Container Registry (ACR) for hosting the webhook image.
  • cert-manager for TLS certificate management inside Kubernetes.
  • Build webhook binary with Go 1.23 and containerize with a multi-stage Dockerfile using distroless base image for security.
  • Push the container image to Azure Container Registry.
  • Deploy the webhook manifests with Kubernetes, including namespace, RBAC, deployment with the webhook container, service, cert-manager issuers, and MutatingWebhookConfiguration.
  • Monitor webhook logs to confirm zones are correctly merged in PV admission patches.

5. Testing the Fix

  • Create a StorageClass for ZRS disks with volumeBindingMode: Immediate.
  • Deploy a pod pinned explicitly to a zone (e.g., eastus2-2).
  • Without the webhook, the pod remains Pending with affinity misfit errors referencing zone mismatches.
  • With the webhook, the pod schedules and runs successfully in the targeted zone.

Quick Tips & Tricks

  1. Enable cert-manager Before Deploying the Webhook
    Use Terraform or Helm to install cert-manager at v1.15+ to automate TLS certificate issuance for secured webhook communication.

  2. Use failurePolicy: Ignore for Admission Webhook
    This prevents PV creation failure if your webhook experiences downtime, maintaining cluster resilience.

  3. Keep the Webhook Container Small and Secure
    Use distroless images and run as a non-root user with read-only root filesystem to minimize attack surface.

  4. Leverage Terraform Outputs for Easier Management
    The provided Terraform scripts output useful commands and resource info, e.g., kubeconfig commands and ACR login servers.

  5. Monitor Logs for Unexpected PV Mutations
    Tail webhook logs (kubectl logs -n pv-zone-fix-webhook -l app.kubernetes.io/name=pv-zone-fix-webhook -f) to catch mutation decisions and potential errors early.

  6. Plan to Remove the Webhook Once Upstream Fix Rolls Out
    Track the upstream issue kubernetes-sigs/karpenter#2743 and migrate away from this workaround once native support lands.

Conclusion

This mutating admission webhook offers a pragmatic stopgap solution for the AKS NAP scheduling issue with ZRS Azure Disk Persistent Volumes using immediate volume binding. By merging multiple nodeSelectorTerms for zones into a single aggregate term, it resolves affinity mismatches preventing pods from running. The compact Go implementation, combined with Terraform automation for cluster setup, makes deployment straightforward.

While this is a temporary workaround pending fixes in Karpenter upstream, it empowers teams to unblock deployments requiring high availability and zone redundancy with AKS Node Auto Provisioning today. Keeping infrastructure as code and Kubernetes best practices in place ensures maintainability and security. Stay tuned as Karpenter evolves, and plan smooth migration paths off this webhook once the issue is addressed natively.

References

  1. Original blog post by Carlos Mendible — Detailed walkthrough and source code.
  2. Kubernetes SIG Karpenter PR #2743 — Tracking upstream root cause and fix.
  3. Azure AKS Node Auto Provisioning documentation — Official Microsoft guidance on NAP.
  4. karpenter-provider-azure storage e2e tests — Relevant storage test suite.
  5. Sample webhook code on GitHub — Full source for the mutation webhook.

AKS Logo
Image courtesy of Carlos Mendible