Expanding Document Translation Across Formats with Azure Translator at Microsoft Build 2026

Date: 2026-06-03

Discover how the latest Azure Translator enhancements break format barriers by translating images, PDFs, Office files, and adopting LLM-powered workflows, delivering richer, high-fidelity enterprise translations.

Tags: ["Azure", "Document Translation", "AI", "Microsoft Foundry"]

Advancing multilingual communication in enterprises demands more than plain text translation — it requires seamless support across complex document types and embedded content. At Microsoft Build 2026, the Azure Translator team unveiled significant Document Translation enhancements in Foundry Tools that dramatically expand support for image files, PDFs, structured XML formats, and embedded images within Office documents. These innovations empower organizations to translate content as it exists naturally, preserving structure, layout, and semantic fidelity.

Whether you need real-time image translation for customer support or batch processing for large document archives, these new capabilities meet diverse enterprise needs. Additionally, a forthcoming large language model (LLM) powered translation option promises to further elevate contextual accuracy for long-form and nuanced documents.

This post breaks down the latest features, explores the architecture behind them, and highlights practical tips for developers to integrate and leverage these advances in Azure Translator Document Translation.

Architecture Overview

┌─────────────────────────────────────────────┐
│            Enterprise Document Sources       │
├─────────────────────────────────────────────┤
│ • Image files (JPEG, PNG, etc.)              │
│ • PDFs with complex formatting                │
│ • Office Documents (Word, PowerPoint)         │
│ • Structured content (XML DITA, XLIFF 2.0)   │
└─────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────┐
│              Azure Document AI Layer         │
├─────────────────────────────────────────────┤
│ • Azure AI Vision for text extraction        │
│ • Azure AI Document Intelligence for layout  │
│   and embedded content detection              │
└─────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────┐
│               Azure Translator Core          │
├─────────────────────────────────────────────┤
│ • Neural Machine Translation (NMT) models    │
│ • Large Language Model (LLM) pipeline (soon)│
│ • Synchronous and asynchronous APIs          │
└─────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────┐
│                 Output Destinations          │
├─────────────────────────────────────────────┤
│ • Translated image documents                  │
│ • Localized PDFs maintaining layout           │
│ • XML/DITA/XLIFF structured outputs           │
│ • Office files with translated embedded images│
└─────────────────────────────────────────────┘

At the heart of these capabilities lies a sophisticated synergy of Azure AI Vision for precise text extraction and Azure Document Intelligence for content structure analysis. These feed into Azure Translator's proven translation models, which now interface through both synchronous and batch asynchronous APIs tailored for varied enterprise workflows. This layered architecture ensures high fidelity across many input types while maintaining scalability and performance.

Key Technical Observations

Integrated Text Extraction and Translation Pipeline — The synchronous image translation feature combines Azure AI Vision's OCR capabilities with Azure Translator in a streamlined call, delivering translated images while preserving original layout in real time.
Batch Processing at Enterprise Scale — The ability to submit large repositories of image files or PDFs via Azure Blob Storage containers and process asynchronously addresses common enterprise needs for scalable localization pipelines.
Semantic Preservation in Structured Content — Native support for XML DITA and XLIFF 2.0 enables direct translation of content without losing semantic markup, which is essential for clean reintegration with content management systems and CAT tools.
Enhanced PDF Layout Fidelity Using Azure Document Intelligence — Improvements in handling multi-column text, tables, and footnotes ensure that translated PDFs retain professional formatting, not just plain text swaps.
Image Text Translation Embedded Inside Office Files — Leveraging Document Intelligence to detect and translate text in images embedded within Word and PowerPoint elevates localizations beyond typical text-only workflows.
Forthcoming LLM-Powered Document Translation — The planned integration of GPT-5.x models introduces context-aware translations for complex, long-form documents—allowing toggling between traditional neural approaches and LLM-driven workflows for quality vs. throughput optimization.

How It Works

Synchronous Image Translation

Azure Translator accepts an image file directly over a synchronous API endpoint. Azure AI Vision first detects and extracts text regions using OCR. The extracted text passes to Azure Translator’s neural models for instant translation. Finally, the service renders the translated text back into the image, preserving layout and styling. This all occurs within a single request-response cycle, perfect for interactive scenarios like real-time support.

from pathlib import Path

endpoint = "DOCUMENT_TRANSLATION_ENDPOINT"
key = "DOCUMENT_TRANSLATION_KEY"
imagefile_path = Path("./image_file.jpg")

response = requests.post(
    f"{endpoint}/translator/document:translate",
    params={
        "api-version": "2026-03-01",
        "sourceLanguage": "en",
        "targetLanguage": "de",
    },
    headers={"Ocp-Apim-Subscription-Key": key},
    files={
        "document": (
            imagefile_path.name,
            imagefile_path.open("rb"),
            "application/json",
        )
    },
    timeout=120,
)
response.raise_for_status()

Path("./image_file_de.jpg").write_bytes(response.content)

Batch Image and PDF Translation

For bulk operations, files are staged in Azure Blob Storage containers. Developers submit a JSON payload specifying source and target container URLs plus language pairs. The service processes inputs asynchronously: extracting text via AI Vision or Document Intelligence, translating via Azure Translator, and outputting translated files to the specified container. This relies heavily on Azure Blob Storage security primitives like SAS tokens or managed identities for streamlined access.

endpoint = "DOCUMENT_TRANSLATION_ENDPOINT"
key = "DOCUMENT_TRANSLATION_KEY"

payload = {
    "inputs": [
        {
            "source": {
                "sourceUrl": "SOURCE_CONTAINER_SAS_URL_OR_MANAGED_IDENTITY",
                "storageSource": "AzureBlob",
                "language": "en",
            },
            "targets": [
                {
                    "targetUrl": "TARGET_CONTAINER_SAS_URL_OR_MANAGED_IDENTITY",
                    "storageSource": "AzureBlob",
                    "language": "fr",
                }
            ],
        }
    ]
}

response = requests.post(
    f"{endpoint}/translator/document/batches",
    params={"api-version": "2026-03-01"},
    headers={
        "Ocp-Apim-Subscription-Key": key,
        "Content-Type": "application/json",
    },
    json=payload,
    timeout=60,
)
response.raise_for_status()

print("Batch operation:", response.headers["Operation-Location"])

Structured XML Content Translation

Document Translation now natively processes XML DITA and XLIFF 2.0 formats without requiring preprocessing. The service preserves the markup integrity, translating only text nodes while maintaining document structure. This minimizes manual cleanup and integration friction with publishing and localization pipelines.

from pathlib import Path

endpoint = "DOCUMENT_TRANSLATION_ENDPOINT"
key = "DOCUMENT_TRANSLATION_KEY"
xliff_path = Path("./messages.xlf")

response = requests.post(
    f"{endpoint}/translator/document:translate",
    params={
        "api-version": "2026-03-01",
        "sourceLanguage": "en",
        "targetLanguage": "de",
    },
    headers={"Ocp-Apim-Subscription-Key": key},
    files={
        "document": (
            xliff_path.name,
            xliff_path.open("rb"),
            "application/xliff+xml",
        )
    },
    timeout=120,
)
response.raise_for_status()

Path("./messages.de.xlf").write_bytes(response.content)

Embedded Image Text Translation within Office Files

Using the parameter "translateTextWithinImage": True in batch translation requests, the service leverages Azure Document Intelligence to identify and translate text embedded inside images within Word and PowerPoint files. The translated text is then re-embedded, producing fully localized documents useful for visually rich presentations and reports.

endpoint = "DOCUMENT_TRANSLATION_ENDPOINT"
key = "DOCUMENT_TRANSLATION_KEY"

payload = {
    "inputs": [
        {
            "source": {
                "sourceUrl": "OFFICE_SOURCE_CONTAINER_SAS_URL_OR_MANAGED_IDENTITY",
                "storageSource": "AzureBlob",
                "language": "en",
            },
            "targets": [
                {
                    "targetUrl": "OFFICE_TARGET_CONTAINER_SAS_URL_OR_MANAGED_IDENTITY",
                    "storageSource": "AzureBlob",
                    "language": "ja",
                }
            ],
        }
    ],
    "options": {
        "translateTextWithinImage": True,
    },
}

response = requests.post(
    f"{endpoint}/translator/document/batches",
    params={"api-version": "2026-03-01"},
    headers={
        "Ocp-Apim-Subscription-Key": key,
        "Content-Type": "application/json",
    },
    json=payload,
    timeout=60,
)
response.raise_for_status()

print("Office batch operation:", response.headers["Operation-Location"])

LLM-Powered Document Translation (Upcoming)

Soon, developers will be able to toggle LLM-powered translations powered by GPT-5.x models for greater contextual understanding tailored to long documents. This option offers a choice between the proven throughput and cost efficiency of classic neural models and the enhanced nuance of LLMs — enabling enterprises to optimize for their specific localization quality and speed requirements.

Microsoft Foundry logo
Source: Microsoft Foundry Blog

Quick Tips & Tricks

Leverage Synchronous Translation for Real-Time Use Cases
For interactive workflows like customer support chat or in-app previews, use synchronous image translation to reduce latency and maintain user experience.
Batch Translation for Scaled Archives
Use asynchronous batch APIs with Azure Blob Storage containers when dealing with thousands of documents to harness parallel processing and failover resilience.
Preserve Structure Using XML Native Support
Avoid costly preprocessing by submitting XML DITA and XLIFF 2.0 files directly — the service keeps semantic tags intact ensuring smooth localization pipeline integration.
Activate Image Text Translation Within Office Files
Remember to set "translateTextWithinImage": True in batch requests to fully localize images embedded in Word and PowerPoint files, delivering more comprehensive multilingual assets.
Plan for LLM Translations Based on Quality vs Cost
When LLM-powered translation becomes available, consider your document’s complexity carefully; use LLMs for nuanced, context-rich documents, and fallback to NMT for high-volume, lower-cost needs.
Manage Costs by Reviewing Azure Translator Pricing
Image and LLM translations incur additional charges; monitor resource usage and configure access with managed identities where possible for security and billing transparency.

Conclusion

Microsoft’s latest Document Translation enhancements in Foundry Tools represent a major leap toward truly universal content localization. By seamlessly integrating Azure AI Vision and Azure Document Intelligence, and expanding support for diverse formats including images and structured XML, enterprises can translate more content effortlessly and accurately. The upcoming LLM-powered translation will further elevate contextual fidelity for business-critical documents.

These innovations reduce friction in multi-format localization workflows while maintaining enterprise-grade performance and scalability. For developers, integrating these features using Azure Translator's versatile synchronous and asynchronous APIs provides powerful options tailored to varied use cases—from real-time support to large-scale batch processing.

As language technologies continue evolving rapidly, Azure Translator’s roadmap signals a commitment to blending AI advancements and developer-friendly platforms, empowering organizations to connect across languages with precision and ease.

References

Expanding the Reach of Document Translation – New Capabilities Announced at Microsoft Build | Microsoft Foundry Blog — Official announcement with detailed feature overview and code samples.
Azure Translator Pricing — Overview of billing and costs associated with translation.
Azure AI Vision Documentation — Details on OCR and image analysis capabilities used in translation.
Azure AI Document Intelligence Overview — Explanation of document layout and content extraction services.
Microsoft Foundry Overview — Insights into the Foundry platform and tools enabling advanced AI deployments.
Document Translation API Reference — Technical documentation for REST API usage and examples.