Back to Blog
May 28, 2026

Building an Agentic Operations Lakehouse with Drasi and Microsoft Agent Framework

Share

Building an Agentic Operations Lakehouse with Drasi and Microsoft Agent Framework

Date: 2026-05-28

Discover how to combine Drasi, Microsoft Agent Framework, and Microsoft Fabric to build a safe, auditable AI-driven operational risk system for healthcare and beyond.

Tags: ["Azure", "Microsoft Agent Framework", "Drasi", "Agentic Lakehouse", "Operational Risk"]

Hospital operations run on a constant web of interdependent signals—from theater lists evolving throughout the day, to PACU bays filling and emptying, to discharge blockers triggering bed shortages. While no single signal defines a risk, the complex combinations do, often within timeframes under an hour. Traditionally, coordinators juggle spreadsheets and phone calls with incomplete information to manage these risks. But AI assistants routed through chat interfaces struggle in this environment: operational workflows demand auditability, clear human approval boundaries, and deterministic detection logic—not black-box LLM outputs alone.

In this post, we explore a proof of technology that embraces a different approach: an Agentic Operations Lakehouse pattern where AI agents produce evidence-backed recommendations grounded in historical data, high-impact actions always require human approval, and every decision is auditable and replayable. This system leverages three core Azure and open-source technologies:

  • Drasi for live, continuous, deterministic risk detection
  • Microsoft Agent Framework (MAF) for governed agent reasoning workflows
  • Microsoft Fabric as the operational memory layer

Using a synthetic healthcare scenario, we’ll break down how these pieces fit together, the architectural choices, safety boundaries, and reusable design patterns that extend beyond healthcare into manufacturing and logistics.


Architecture Overview

┌────────────────────────────────────────────┐
│Architecture                                │
├────────────────────────────────────────────┤
│• Enterprise data sources                   │
│• Foundry platform                          │
│• AI applications                           │
└────────────────────────────────────────────┘

Key Technical Observations

  • Separation of Detection and Reasoning: Risk detection firmly resides in Drasi via continuous, testable Cypher queries over live data streams. This prevents the typical AI anti-pattern of burying domain logic in LLM prompts, enabling deterministic, auditable detection.
  • 14-Stage Agent Workflow Splitting LLM and Deterministic Logic: The Microsoft Agent Framework workflow segments into five LLM-backed agent stages handling classification and routing, and nine deterministic stages ensuring schema validation, state queries, action routing, and safety guardrails. This explicit boundary preserves predictability and safety.
  • Role-Aware Recommendation Context Enrichment: Recommendations incorporate operator roles (e.g., Bed Manager, Theatre Coordinator) and context injected directly into LLM prompts, improving relevance and domain vocabulary precision without code redeployments—prompts are versioned and loaded from Azure App Configuration.
  • Fabric as Bidirectional Operational Memory: Microsoft Fabric’s Eventhouse stores all state transitions, risks, and approvals, feeding back historical context into agent prompts for grounded recommendations. This memory layer enables longitudinal visibility and auditability from risk to outcome.
  • Multi-Layer Safety Enforcement Model: Actions pass through a three-layer evaluation—deterministic routing classification, LLM-based contextual safety evaluation, and a final deterministic gate post-human approval—that blocks unsafe or unknown actions and ensures risk freshness before execution.
  • Incremental Retry Logic for Infrastructure Robustness: Handling Azure Foundry agent transient failures with internal retries inside the agent invocation method insulates the higher-level workflow from unnecessary resets, preserving user experience and state consistency.

How It Works: Under the Hood of the Agentic Operations Lakehouse

Step 1: Continuous Risk Detection with Drasi

At the core, Drasi monitors multiple live operational data sources, such as Azure PostgreSQL tables (surgical_cases, ward_bed_forecasts) and Azure Event Hub streams. It runs continuous queries defined in Cypher, for example:

apiVersion: v1
kind: ContinuousQuery
name: healthcare-bed-capacity-risk
spec:
  mode: query
  queryLanguage: Cypher
  sources:
    subscriptions:
      - id: aol-operational-postgres
        nodes:
          - sourceLabel: surgical_cases
          - sourceLabel: ward_bed_forecasts
  query: >
    MATCH (c:surgical_cases)
    MATCH (w:ward_bed_forecasts)
    WHERE
      c.ScenarioRunId = w.ScenarioRunId
      AND c.CorrelationId = w.CorrelationId
      AND w.StateValue = 'blocked'
    RETURN
      c.Id AS workItemId,
      'bed-capacity-risk' AS riskType,
      'high' AS riskLevel,
      'Post-op bed forecast indicates blocked capacity' AS observedFact,
      'human-approval-required' AS approvalRequirement

This declarative detection logic emits structured events whenever risk criteria match. Its testability and determinism contrast with ad hoc AI detection buried inside prompts. Operators can trace why a risk was flagged by analyzing structured observedFact outputs.

Drasi continuous queries running on AKS
Figure: Drasi continuous query containers deployed and running on Azure Kubernetes Service. Source: Luke Murray.


Step 2: Reasoning & Recommendation via Microsoft Agent Framework

Once Drasi emits a risk event, the Microsoft Agent Framework kicks off a 14-stage workflow blending deterministic and LLM-powered reasoning stages. For example:

  • LLM stages classify the risk, determine routing, and generate recommendations tailored to the operator role.
  • Deterministic stages validate inputs, query operational memory, enforce routing policies, and maintain workflow state checkpoints.

Here’s a snippet demonstrating how roles are mapped to risk types to contextualize recommendations:

private static (string Role, string Context) MapRoleFromRiskType(string? riskType) => riskType switch {
    "bed-capacity-risk" or "post-op-discharge-coordination-risk" =>
        ("Bed Manager",
         "Responsible for ward capacity and patient discharge flow..."),
    "pacu-throughput-risk" or "theatre-turnover-risk" =>
        ("Theatre Coordinator",
         "Responsible for theatre list execution and perioperative flow..."),
    _ => ("Operational Manager", "...")
};

LLM prompts are not hardcoded but version-controlled artifacts stored in Azure App Configuration, dynamically loaded and cached to avoid workflow latency or unnecessary redeploys.

Handling transient failures from Azure Foundry agents inside the workflow uses an internal retry loop preventing complete restart of the 14-stage flow:

const int MaxAttempts = 2;
for (int attempt = 1; attempt <= MaxAttempts; attempt++) {
    PersistentAgentThread thread = await agentsClient.Threads.CreateThreadAsync(...);
    try {
        if (run.Status != RunStatus.Completed) {
            bool isIncomplete = string.Equals(run.Status.ToString(), "incomplete", ...);
            if (isIncomplete && attempt < MaxAttempts) { continue; }
            throw new InvalidOperationException(...);
        }
        return result;
    }
    finally {
        await agentsClient.Threads.DeleteThreadAsync(thread.Id, ...);
    }
}

14-stage RiskDecisionWorkflow - agentic and deterministic stages annotated
Figure: Annotated execution stages of the mixed LLM and deterministic workflow in MAF. Source: Luke Murray.


Step 3: Operational Memory & Auditing in Microsoft Fabric

Microsoft Fabric acts as the operational memory layer. The workflow writes every risk event, recommendation, safety evaluation, and approval decision to Fabric’s Eventhouse. Subsequent workflow runs read from this memory to ground recommendations in historical context — how often risks occurred, typical escalation latencies, and effective past actions.

KQL queries expose risk lifecycle summaries and audit trails, such as:

Risk event lifecycle data summarised over 7 days in Fabric Eventhouse
Figure: Seven-day summary of risk event lifecycles queried via KQL. Source: Luke Murray.

and a query listing full decision chains for recommendations:

Recommendation records query - top 100 results in Eventhouse
Figure: Top 100 recommendation records showing end-to-end audit trail. Source: Luke Murray.

Fabric’s Eventhouse data feeds the operator portal UI, which surface real-time insights into the health of the memory layer and workflow throughput.

System dashboard showing Fabric insights view within the operator portal
Figure: Fabric insights dashboard embedded in operator portal UI. Source: Luke Murray.


Step 4: Three Layers of Safety and Approval

Actions recommended by agents pass through a comprehensive safety model before reaching operators or execution:

Class Actions Behaviour
Safe-automated create-risk-board-entry, send-role-notification, pacu-throughput-coordination Recorded immediately
Approval-required theatre-case-resequencing, alternate-ward-placement, overtime-approval, duty-manager-escalation Approval request to Duty Manager
Blocked surgery-cancellation, clinical-prioritisation, any unknown action Blocked before LLM evaluation

Safety Model - Three-Layer Action Classification
Figure: Three-layer classification and enforcement model for recommended actions. Source: Luke Murray.

The three layers are:

  1. Deterministic routing table that classifies actions into safe, approval-required, or blocked sets. Unknown or blocked actions never proceed.
  2. LLM-powered safety policy evaluation which provides contextual rationale but cannot override routing enforcement.
  3. Deterministic post-approval gate that re-evaluates action freshness (e.g., risk still active) before execution.

This layered design ensures robust safety, auditability, and human oversight for all impactful AI-driven operational actions.


Quick Tips & Tricks

  1. Keep Detection Logic Declarative and Testable
    Use tools like Drasi for live continuous queries exposing structured, verifiable event streams, rather than embedding detection inside LLM prompts.

  2. Separate LLM Reasoning from Deterministic Logic
    Design workflows that clearly delineate contextual natural language reasoning from validations, routing, and safety checks to maintain repeatability and auditability.

  3. Inject Role Context into Prompts Dynamically
    Use configuration storage (e.g., Azure App Configuration) to version and update prompts; include operator roles and context strings to tailor AI outputs without service redeployment.

  4. Implement Multi-Layer Safety Enforcement
    Mixing deterministic routing tables with LLM contextual evaluation and a final deterministic gate prevents automation errors and ensures human approval on high-impact actions.

  5. Use Fabric (or equivalent) as Operational Memory
    Store full audit trails, decision chains, and historical context in a robust event store with query support for transparency and workflow grounding.

  6. Gracefully Handle Agent Infra Transient Failures
    Use internal retry loops at the agent communication layer to avoid restarting entire workflows and improve operator experience during infrastructure blips.


Conclusion

The Agentic Operations Lakehouse pattern demonstrated with Drasi, Microsoft Agent Framework, and Microsoft Fabric offers a compelling, principled approach to embedding AI agents safely within live operational workflows. By clearly separating deterministic detection from LLM reasoning, maintaining rich operational memory, and enforcing layered safety boundaries, this architecture addresses common pitfalls of AI in mission-critical contexts.

While the example explored here is synthetic hospital operations, the pattern generalizes across industries like manufacturing and logistics, where real-time signals define complex risks and require human-in-the-loop approval. The open-source implementation invites extension, proving this is not just a theory but a practical, reusable blueprint.

As AI agents grow more capable and integrated, patterns like this that balance autonomy with human oversight and auditability will be critical for deploying trusted operational AI at scale.


References

  1. Agentic Operations Lakehouse on GitHub — Full MIT licensed repo with scenario packs, microservices, and UI.
  2. Drasi: open-source continuous query engine — Continuous query platform for real-time risk detection.
  3. Microsoft Agent Framework documentation — Official guide to the agent execution framework.
  4. Microsoft Fabric Eventhouse documentation — Details on Fabric event streaming and operational memory.
  5. Microsoft Foundry — Underlying platform powering MAF LLM agents.
  6. Architecture overview — Documentation with detailed architecture diagrams and flow explanations.

luke.geek.nz logo
Image source: luke.geek.nz by Luke Murray