Scaling Dynamic AI Agents with Microsoft Foundry and Azure AI Services

Date: 2026-06-03

Discover how to design and orchestrate scalable, dynamic AI agent systems using Microsoft Foundry, Azure AI Search, and Azure OpenAI to handle complex, multiagent conversational workloads.

Tags: ["Azure", "AI Foundry", "OpenAI", "AI Agents", "Orchestration"]

In building intelligent conversational systems, one common challenge is how to handle the sheer diversity of user requests without hardcoding fixed workflows. When you want to scale beyond just a few assistants or tools, selecting and coordinating the right AI agents dynamically is critical. Static orchestration quickly becomes brittle, and naive approaches don't scale well when hundreds of agents could be invoked in unpredictable combinations.

Microsoft’s Dynamic AI Agents at Scale pattern offers a blueprint for solving these problems. This solution uses Microsoft Foundry, Azure AI Search, and Azure OpenAI models to dynamically select relevant agents from a large pool during conversations. It provides a flexible and scalable orchestration framework to support evolving AI ecosystems—where new agents can be onboarded seamlessly and invoked only when appropriate.

In this post, we will explore the architecture behind this pattern, review key technical insights, and break down how agent selection and orchestration work under the hood. Along the way, you’ll gain practical guidance for implementing multiagent AI systems that balance performance, cost, and adaptability.

Architecture Overview

┌────────────────────────────────────────────┐
│Architecture                                │
├────────────────────────────────────────────┤
│• Enterprise data sources                   │
│• Foundry platform                          │
│• AI applications                           │
└────────────────────────────────────────────┘

Key Technical Observations

Semantic Cache for Efficient Agent Selection: Azure AI Search stores vector embeddings of sample utterances representing agent capabilities. This semantic cache filters candidate agents based on vector similarity before passing a shortlist to the LLM. This dramatically reduces token consumption and latency compared to evaluating all agents in the LLM every request.
Adaptive Orchestration with Confidence Threshold: The system uses a confidence threshold (e.g., 85%) to decide if a single agent can be invoked directly, bypassing orchestrator LLM steps. This adaptive pattern delivers low-latency responses for unambiguous intents while reserving costly orchestration for complex ones.
Multiturn Context Management with Azure Managed Redis: Conversation history is cached in Redis with configurable TTL to balance context retention against storage cost. This enables coherent multiagent multiturn conversations without stale or excessive memory.
Declarative vs. In-Code Agent Implementation: Agents can be defined programmatically (using frameworks like Microsoft Agent Framework or LangChain) or declaratively with YAML templates. Declarative definitions facilitate rapid onboarding and empower non-developers to modify agent behavior without redeployment.
Built-in Observability and Evaluation Framework: The solution incorporates OpenTelemetry and Azure Monitor to track agent invocation accuracy, response relevance, and system health. Foundry’s evaluation pipeline rigorously tests agents both individually and within orchestration, helping ensure ongoing quality as the agent pool evolves.
Cost Optimization via Tiered Models and Telemetry Sampling: Lower-cost models handle agent routing/selecting while higher-capability models produce agent responses. Intelligent telemetry sampling controls observability overhead to keep infrastructure costs in check at scale.

How It Works

1. User Query and Orchestration

A user sends a query through a client application. The AI agent service receives this and passes the request to its orchestrator component.

2. Agent Selection via Semantic Cache

The orchestrator delegates to the Agent Selector which performs dynamic filtering:

Alias Mapping: Resolves agent references or aliases for consistency.
Semantic Search: Sends a normalized query to Azure AI Search’s semantic cache containing vector embeddings of agent utterances.
Scoring and Filtering: Assigns similarity cosine scores, retains candidates above threshold, and removes duplicates.
Confidence Check: If one agent’s score exceeds a confidence threshold, it is selected directly; otherwise, a supervisor agent or LLM evaluates multiple candidates.

User Query → Alias Mapping → Semantic Search → Scoring & Filtering → Orchestrator

Flow diagram that shows how a system uses a semantic cache to select the most relevant agents for a user query.
Agent selection workflow courtesy of Microsoft Learn

3. Agent Instantiation

The shortlisted or selected agent is instantiated by the Agent Factory, which loads the agent’s implementation — be it code modules or YAML templates — to produce a ready-to-use agent instance.

4. Agent Processing with Azure OpenAI

The instantiated agent processes the request, using Azure OpenAI models hosted in Microsoft Foundry. Agents may call external APIs or services (via Azure NAT Gateway) during execution as needed.

5. Response and Memory Management

Agent responses flow back through the orchestration layer and return to the user. Conversation state and context are stored in Azure Managed Redis with TTL policies to facilitate multiturn dialogs without stale data buildup.

6. Monitoring and Evaluation

Throughout the process, telemetry data and execution logs are captured by Application Insights and Azure Monitor for observability. Foundry’s Evaluation Framework is used during agent onboarding and ongoing operation to validate agent behavior and orchestration performance.

Quick Tips & Tricks

Optimize Semantic Cache with Diverse Utterances
Include at least five representative utterances per agent capability when building your semantic cache in Azure AI Search to ensure robust candidate retrieval under diverse query formulations.
Set Confidence Thresholds Thoughtfully
Calibrate your agent confidence threshold (commonly around 85%) to balance accuracy and cost. A low threshold increases orchestration calls and costs, while too high may reduce flexibility for ambiguous requests.
Use TTL Values to Tune Conversation Memory
Configure Redis TTL based on typical session lengths and conversation turnover to balance context availability against storage costs and cache freshness.
Prefer Declarative Agent Definitions for Scalability
When onboarding many agents or enabling rapid iteration by non-developers, choose YAML-based declarative definitions to avoid frequent code deployments.
Sample Telemetry Intelligently
Apply adaptive sampling on OpenTelemetry data streams to limit ingestion costs while preserving metrics essential for anomaly detection and performance tuning.
Apply Tiered Model Usage
Route agent selection through lower-cost, faster LLMs, reserving premium Azure OpenAI models for generating final user-facing responses to optimize overall expenditure.

Conclusion

Designing dynamic AI agent systems that scale to hundreds of collaborators requires careful orchestration, intelligent agent selection, and robust observability. Microsoft’s Dynamic AI Agents at Scale pattern leverages Azure AI Search for semantic filtering, Microsoft Foundry for model hosting and evaluation, and adaptive orchestration strategies to achieve this at enterprise scale.

By filtering large agent pools via vector similarity and confidence thresholds, this pattern limits costly LLM calls and reduces latency. Declarative and code-based agent definitions provide flexibility in onboarding and maintenance. Careful state management with Azure Managed Redis enables multiturn conversations without compromising cost. Finally, integrated evaluation and monitoring ensure ongoing system health and response quality.

As conversational AI environments grow more complex, these principles will remain foundational. Future advancements in AI orchestration and evaluation promise even smarter, more cost-effective multiagent ecosystems that can seamlessly evolve with your business needs.

References

Dynamic AI agents at scale pattern - Azure Architecture Center — Original solution idea from Microsoft Learn
Microsoft Foundry Documentation — Foundry platform overview and usage
Azure AI Search Documentation — AI-powered vector and semantic search service
Azure OpenAI Service — Managed OpenAI models in Azure Foundry
AI Agent Orchestration Patterns — Patterns and strategies for multiagent AI systems
Microsoft Agent Framework — Framework for in-code AI agent development
LangChain — Popular toolkit for building agentic systems in code
Dynamic AI Agents at Scale Evaluation Framework (GitHub) — Evaluation pipeline and tooling for agentic systems