Streamlining AI Local Inference: Using Azure Local Foundry CLI with PowerShell
Streamlining AI Local Inference: Using Azure Local Foundry CLI with PowerShell
Date: 2026-05-15
Cut inference costs and keep AI workloads private by running Microsoft Local Foundry’s AI models locally—and automate it all with PowerShell and REST.
Tags: ["Azure", "AI Foundry", "PowerShell", "Local AI", "Automation"]
The rapid evolution of generative AI has brought incredible capabilities to developers, but it’s also introduced growing operational costs and privacy challenges. As cloud providers like Anthropic, OpenAI, and Microsoft tighten token quotas, running every AI call in the cloud is increasingly expensive—and sometimes, simply unnecessary.
Many tasks don’t require state-of-the-art frontier models; a quick summarization or code formatting job wasted on costly cloud tokens is inefficient. Additionally, for sensitive or regulated data—especially under GDPR and emerging EU Cloud Act constraints—sending data outside your trusted perimeter isn’t an option.
This is where local AI models shine: running inference close to your hardware mitigates cost and privacy concerns. Microsoft's Local Foundry platform addresses this by enabling local AI inference accelerated by dedicated hardware such as NPUs. In this post, we explore how to use the Azure Local Foundry CLI on Windows and macOS, specifically focusing on automating interactions with local AI models through PowerShell scripts leveraging the Foundry REST API.
We’ll cover how to install and manage models via CLI, interact programmatically using PowerShell functions, and build chat completions compatible with OpenAI’s API format. Whether you want to test local models interactively or integrate them into your automation workflows, this guide has you covered.
Architecture Overview
┌─────────────────────────────────────────────┐
│ Local Machine │
├─────────────────────────────────────────────┤
│ • Azure Local Foundry CLI & SDK │
│ • AI Accelerator Hardware (NPU) │
│ • PowerShell Automation & REST API Client │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Local Foundry AI Inference │
├─────────────────────────────────────────────┤
│ • Model Cache & Management │
│ • REST API Endpoint (OpenAI compatible) │
│ • Model Execution via AI Accelerator │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Developer Automation Layer │
├─────────────────────────────────────────────┤
│ • PowerShell Scripts invoking REST API │
│ • Structured Chat Completion Requests │
│ • Quick Integration without SDK Binding │
└─────────────────────────────────────────────┘
The local machine runs the Azure Local Foundry CLI, which manages models and exposes an OpenAI-compliant REST API for inference. PowerShell scripts invoke this REST API to automate tasks seamlessly, leveraging the AI accelerator for efficient local processing.

Image credit: DEV Community
Key Technical Observations
- Local AI Inference Offloads Cloud Usage — By running models locally on AI accelerators like NPUs, organizations reduce dependence on cloud inference, cutting costs and lowering latency.
- OpenAI-Compatible REST API Interface — Local Foundry exposes REST endpoints mimicking OpenAI’s Chat Completion API, easing the transition and enabling use of familiar tooling and prompt formats.
- No Native PowerShell SDK Binding Yet — While SDKs exist for Python, C#, Rust, and JavaScript, PowerShell accesses local models exclusively via REST calls, requiring custom scripting for session management and calls.
- Model Management via CLI Commands — Foundry CLI provides granular control over models with commands like
list,download,load, andrun, allowing flexible caching and deployment strategies. - Role-Based Chat Completion Protocol — Inputs to the chat API use structured role/content pairs (
system,user,assistant), supporting contextual behavior setting and user prompts inline with OpenAI messaging patterns. - Service Lifecycle Handling Needed — Scripts must ensure the local service is running and expected models are loaded before issuing requests, an extra operational step clearly demonstrated in the PowerShell example.
How It Works: Automating the Local Foundry CLI with PowerShell
Installing the Azure Local Foundry CLI
Microsoft provides easy one-liner installs for major platforms:
# Windows
winget install Microsoft.FoundryLocal
# macOS (using Homebrew)
brew install microsoft/foundrylocal/foundrylocal
This installs both the CLI and the underlying SDK and runtime, readying the machine to run local models.
Managing Models from the CLI
To interact with models:
foundry model list # Lists available models
foundry model download <model> # Downloads a model locally to cache
foundry model load <model> # Loads the model into the running AI service
foundry model run <model> # Runs inference once without loading persistently
This manual control allows you to prepare and manage models efficiently before invoking inference.
Starting and Managing the Foundry Service in PowerShell
The local Foundry service must be running and a model loaded before we can send inference requests. The following PowerShell snippet checks the service status, starts it if stopped, loads a model, and extracts the REST API URI:
function get-foundryServiceStatus {
return & foundry service status
}
$getServiceStatus = get-foundryServiceStatus
if ($getServiceStatus -like "*service is not running*") {
& foundry service start | Out-Null
$getServiceStatus = get-foundryServiceStatus
}
# Load a model to enable the service API
& foundry model load phi-3-mini-128k | Out-Null
$pattern = 'https?://[^\s"]+'
$uri = [regex]::Match($getServiceStatus, $pattern).Value
$uri = $uri -replace '/openai/status$',''
$uri
This snippet encapsulates lifecycle management, ensuring the environment is ready.
Abstracting REST API Calls for Reuse
We create a helper function to invoke REST methods against the Foundry API with appropriate headers and JSON encoding:
function Invoke-FoundryRequest {
param(
[Parameter(Mandatory)]
[string]$Method,
[Parameter(Mandatory)]
[string]$FoundryBaseUrl,
[Parameter(Mandatory)]
[string]$Path,
[hashtable]$Headers,
$Body
)
$uri = "$FoundryBaseUrl$Path"
$params = @{
Method = $Method
Uri = $uri
}
if ($Headers) { $params.Headers = $Headers }
if ($Body) {
$params.Body = ($Body | ConvertTo-Json -Depth 10)
$params.ContentType = "application/json"
}
return Invoke-RestMethod @params
}
This modular approach helps maintain clear boundaries between networking and business logic.
Sending Chat Completion Requests
Since the Foundry API implements OpenAI-compatible chat completions, we define a function encapsulating the request format:
function New-FoundryChatCompletion {
[CmdletBinding()]
param(
[string]
$Model = "phi-3-mini-128k-instruct-qnn-npu:3",
[Parameter(Mandatory)]
[array]
$Messages,
[Parameter(Mandatory)]
[string]
$FoundryBaseUrl,
[double]
$Temperature,
[double]
$TopP,
[int]$MaxTokens
)
$body = @{
model = $Model
messages = $Messages
max_tokens = 2048
max_completion_tokens = 2048
}
if ($PSBoundParameters.ContainsKey('Temperature')) { $body.temperature = $Temperature }
if ($PSBoundParameters.ContainsKey('TopP')) { $body.top_p = $TopP }
Invoke-FoundryRequest -Method POST -Path "/v1/chat/completions" -Body $body -FoundryBaseUrl $FoundryBaseUrl
}
Example Usage
Here’s how to prepare messages and call the chat completion via PowerShell:
$messages = @(
@{ role = "system"; content = "You are a PowerShell coding assistant. Only give code if requested." },
@{ role = "user"; content = "Give me a PowerShell script to list all files in a directory." }
)
$chat = New-FoundryChatCompletion -FoundryBaseUrl "http://127.0.0.1:52236" -Temperature 0.2 -Messages $messages
# Retrieve generated assistant response content
$chat.choices[0].message.content
This interactive model invocation pattern makes local AI integration intuitive and accessible in PowerShell automation scenarios.
Quick Tips & Tricks
- Always Check Service Status Before Requests — Your script should verify that the Foundry service is running and start it if not, avoiding connection errors.
- Load Models Explicitly for Faster Inference — Use
foundry model loadto persistently load models, reducing latency on the first request. - Use OpenAI-Compatible Payloads — Stick to the role-based message format (
system,user,assistant) to maintain compatibility and predictable assistant behavior. - Adjust Creativity with Temperature — Set the temperature parameter (0–2) to control response creativity, balancing accuracy and novelty.
- Leverage PowerShell’s JSON Conversion — Using
ConvertTo-Json -Depth 10handles complex payloads without manual serialization hassles. - Consider SDKs for Richer Integration — The REST API is powerful but limited; the upcoming .NET SDK unlocks additional features and tighter PowerShell integration.

PowerShell session running Foundry CLI commands (image credit: DEV Community)
Conclusion
Running AI models locally with Microsoft’s Foundry Local CLI empowers developers to slash inference costs and control sensitive data boundaries with ease. By pairing the Foundry service with PowerShell automation over its REST API, you get a flexible and scriptable way to harness local AI acceleration without cloud overhead.
The seamless OpenAI API compatibility lowers the barrier to entry, enabling existing prompt engineering skills to be reused efficiently. While the REST API interface is a robust starting point, richer integrations through the forthcoming .NET SDK promise even easier and more powerful workflows.
As token quotas tighten and privacy regulations grow, local AI inference solutions like Azure Foundry will become a vital part of the AI ecosystem, blending edge hardware with familiar developer tooling for effective, scalable AI deployments.
References
- Using Azure Local Foundry CLI with PowerShell - DEV Community — Original article with detailed PowerShell examples
- Azure Foundry Local REST API Reference — Official Microsoft documentation for the Foundry REST API
- Microsoft Foundry SDK GitHub Repository — SDKs for multiple programming languages
- OpenAI Chat Completion API Spec — For understanding the chat completion message format
- PowerShell Invoke-RestMethod Documentation — Cmdlet details for REST API calls
- Winget Package Manager — Installation of Windows tools like Foundry Local CLI