Back to Blog
May 15, 2026

Streamlining AI Local Inference: Using Azure Local Foundry CLI with PowerShell

Share

Streamlining AI Local Inference: Using Azure Local Foundry CLI with PowerShell

Date: 2026-05-15

Cut inference costs and keep AI workloads private by running Microsoft Local Foundry’s AI models locally—and automate it all with PowerShell and REST.

Tags: ["Azure", "AI Foundry", "PowerShell", "Local AI", "Automation"]

The rapid evolution of generative AI has brought incredible capabilities to developers, but it’s also introduced growing operational costs and privacy challenges. As cloud providers like Anthropic, OpenAI, and Microsoft tighten token quotas, running every AI call in the cloud is increasingly expensive—and sometimes, simply unnecessary.

Many tasks don’t require state-of-the-art frontier models; a quick summarization or code formatting job wasted on costly cloud tokens is inefficient. Additionally, for sensitive or regulated data—especially under GDPR and emerging EU Cloud Act constraints—sending data outside your trusted perimeter isn’t an option.

This is where local AI models shine: running inference close to your hardware mitigates cost and privacy concerns. Microsoft's Local Foundry platform addresses this by enabling local AI inference accelerated by dedicated hardware such as NPUs. In this post, we explore how to use the Azure Local Foundry CLI on Windows and macOS, specifically focusing on automating interactions with local AI models through PowerShell scripts leveraging the Foundry REST API.

We’ll cover how to install and manage models via CLI, interact programmatically using PowerShell functions, and build chat completions compatible with OpenAI’s API format. Whether you want to test local models interactively or integrate them into your automation workflows, this guide has you covered.

Architecture Overview

┌─────────────────────────────────────────────┐
│                 Local Machine                │
├─────────────────────────────────────────────┤
│ • Azure Local Foundry CLI & SDK              │
│ • AI Accelerator Hardware (NPU)              │
│ • PowerShell Automation & REST API Client    │
└─────────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────────┐
│         Local Foundry AI Inference           │
├─────────────────────────────────────────────┤
│ • Model Cache & Management                    │
│ • REST API Endpoint (OpenAI compatible)      │
│ • Model Execution via AI Accelerator          │
└─────────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────────┐
│          Developer Automation Layer           │
├─────────────────────────────────────────────┤
│ • PowerShell Scripts invoking REST API       │
│ • Structured Chat Completion Requests        │
│ • Quick Integration without SDK Binding      │
└─────────────────────────────────────────────┘

The local machine runs the Azure Local Foundry CLI, which manages models and exposes an OpenAI-compliant REST API for inference. PowerShell scripts invoke this REST API to automate tasks seamlessly, leveraging the AI accelerator for efficient local processing.

DEV Community Logo
Image credit: DEV Community

Key Technical Observations

  • Local AI Inference Offloads Cloud Usage — By running models locally on AI accelerators like NPUs, organizations reduce dependence on cloud inference, cutting costs and lowering latency.
  • OpenAI-Compatible REST API Interface — Local Foundry exposes REST endpoints mimicking OpenAI’s Chat Completion API, easing the transition and enabling use of familiar tooling and prompt formats.
  • No Native PowerShell SDK Binding Yet — While SDKs exist for Python, C#, Rust, and JavaScript, PowerShell accesses local models exclusively via REST calls, requiring custom scripting for session management and calls.
  • Model Management via CLI Commands — Foundry CLI provides granular control over models with commands like list, download, load, and run, allowing flexible caching and deployment strategies.
  • Role-Based Chat Completion Protocol — Inputs to the chat API use structured role/content pairs (system, user, assistant), supporting contextual behavior setting and user prompts inline with OpenAI messaging patterns.
  • Service Lifecycle Handling Needed — Scripts must ensure the local service is running and expected models are loaded before issuing requests, an extra operational step clearly demonstrated in the PowerShell example.

How It Works: Automating the Local Foundry CLI with PowerShell

Installing the Azure Local Foundry CLI

Microsoft provides easy one-liner installs for major platforms:

# Windows
winget install Microsoft.FoundryLocal

# macOS (using Homebrew)
brew install microsoft/foundrylocal/foundrylocal

This installs both the CLI and the underlying SDK and runtime, readying the machine to run local models.

Managing Models from the CLI

To interact with models:

foundry model list           # Lists available models
foundry model download <model>  # Downloads a model locally to cache
foundry model load <model>      # Loads the model into the running AI service
foundry model run <model>       # Runs inference once without loading persistently

This manual control allows you to prepare and manage models efficiently before invoking inference.

Starting and Managing the Foundry Service in PowerShell

The local Foundry service must be running and a model loaded before we can send inference requests. The following PowerShell snippet checks the service status, starts it if stopped, loads a model, and extracts the REST API URI:

function get-foundryServiceStatus {
   return  & foundry service status
}

$getServiceStatus = get-foundryServiceStatus

if ($getServiceStatus -like "*service is not running*") {
    & foundry service start | Out-Null
    $getServiceStatus = get-foundryServiceStatus
}

# Load a model to enable the service API
& foundry model load phi-3-mini-128k | Out-Null

$pattern = 'https?://[^\s"]+'
$uri = [regex]::Match($getServiceStatus, $pattern).Value
$uri = $uri -replace '/openai/status$',''
$uri

This snippet encapsulates lifecycle management, ensuring the environment is ready.

Abstracting REST API Calls for Reuse

We create a helper function to invoke REST methods against the Foundry API with appropriate headers and JSON encoding:

function Invoke-FoundryRequest {
    param(
        [Parameter(Mandatory)]
        [string]$Method,
        [Parameter(Mandatory)]
        [string]$FoundryBaseUrl,
        [Parameter(Mandatory)]
        [string]$Path,
        [hashtable]$Headers,
        $Body
    )

    $uri = "$FoundryBaseUrl$Path"

    $params = @{
        Method  = $Method
        Uri     = $uri
    }

    if ($Headers) { $params.Headers = $Headers }

    if ($Body) {
        $params.Body = ($Body | ConvertTo-Json -Depth 10)
        $params.ContentType = "application/json"
    }

    return Invoke-RestMethod @params
}

This modular approach helps maintain clear boundaries between networking and business logic.

Sending Chat Completion Requests

Since the Foundry API implements OpenAI-compatible chat completions, we define a function encapsulating the request format:

function New-FoundryChatCompletion {
    [CmdletBinding()]
    param(
        [string]
        $Model = "phi-3-mini-128k-instruct-qnn-npu:3",
        [Parameter(Mandatory)]
        [array]
        $Messages,
        [Parameter(Mandatory)]
        [string]
        $FoundryBaseUrl,
        [double]
        $Temperature,
        [double]
        $TopP,
        [int]$MaxTokens
    )

    $body = @{
        model    = $Model
        messages = $Messages
        max_tokens = 2048
        max_completion_tokens = 2048
    }

    if ($PSBoundParameters.ContainsKey('Temperature')) { $body.temperature = $Temperature }
    if ($PSBoundParameters.ContainsKey('TopP'))        { $body.top_p       = $TopP }

    Invoke-FoundryRequest -Method POST -Path "/v1/chat/completions" -Body $body -FoundryBaseUrl $FoundryBaseUrl
}

Example Usage

Here’s how to prepare messages and call the chat completion via PowerShell:

$messages = @(
    @{ role = "system"; content = "You are a PowerShell coding assistant. Only give code if requested." },
    @{ role = "user"; content = "Give me a PowerShell script to list all files in a directory." }
)

$chat = New-FoundryChatCompletion -FoundryBaseUrl "http://127.0.0.1:52236" -Temperature 0.2 -Messages $messages

# Retrieve generated assistant response content
$chat.choices[0].message.content

This interactive model invocation pattern makes local AI integration intuitive and accessible in PowerShell automation scenarios.

Quick Tips & Tricks

  1. Always Check Service Status Before Requests — Your script should verify that the Foundry service is running and start it if not, avoiding connection errors.
  2. Load Models Explicitly for Faster Inference — Use foundry model load to persistently load models, reducing latency on the first request.
  3. Use OpenAI-Compatible Payloads — Stick to the role-based message format (system, user, assistant) to maintain compatibility and predictable assistant behavior.
  4. Adjust Creativity with Temperature — Set the temperature parameter (0–2) to control response creativity, balancing accuracy and novelty.
  5. Leverage PowerShell’s JSON Conversion — Using ConvertTo-Json -Depth 10 handles complex payloads without manual serialization hassles.
  6. Consider SDKs for Richer Integration — The REST API is powerful but limited; the upcoming .NET SDK unlocks additional features and tighter PowerShell integration.

Local Foundry CLI running in PowerShell terminal
PowerShell session running Foundry CLI commands (image credit: DEV Community)

Conclusion

Running AI models locally with Microsoft’s Foundry Local CLI empowers developers to slash inference costs and control sensitive data boundaries with ease. By pairing the Foundry service with PowerShell automation over its REST API, you get a flexible and scriptable way to harness local AI acceleration without cloud overhead.

The seamless OpenAI API compatibility lowers the barrier to entry, enabling existing prompt engineering skills to be reused efficiently. While the REST API interface is a robust starting point, richer integrations through the forthcoming .NET SDK promise even easier and more powerful workflows.

As token quotas tighten and privacy regulations grow, local AI inference solutions like Azure Foundry will become a vital part of the AI ecosystem, blending edge hardware with familiar developer tooling for effective, scalable AI deployments.

References

  1. Using Azure Local Foundry CLI with PowerShell - DEV Community — Original article with detailed PowerShell examples
  2. Azure Foundry Local REST API Reference — Official Microsoft documentation for the Foundry REST API
  3. Microsoft Foundry SDK GitHub Repository — SDKs for multiple programming languages
  4. OpenAI Chat Completion API Spec — For understanding the chat completion message format
  5. PowerShell Invoke-RestMethod Documentation — Cmdlet details for REST API calls
  6. Winget Package Manager — Installation of Windows tools like Foundry Local CLI