Foundry Local Integration for Microsoft Agent Framework

raw JSON →
1.0.0b260409 verified Thu Apr 16 auth: no python

agent-framework-foundry-local is a Python package that provides an integration client for the Microsoft Agent Framework to run large language models (LLMs) locally using Foundry Local. It allows developers to build and orchestrate AI agents that leverage local inference capabilities without needing cloud API keys. The package is currently in beta, while the broader Microsoft Agent Framework recently reached version 1.0.0, indicating active development and a rapid release cadence for the ecosystem.

pip install agent-framework-foundry-local --pre
error agent_framework.exceptions.ServiceResponseException: <class 'agent_framework_foundry_local._foundry_local_client.FoundryLocalClient'> service failed to complete the prompt: Error code: 500 - { 'type': 'https://tools.ietf.org/html/rfc9110#section-15.6.1', 'title': 'Failed to handle openAI completion', 'status': 500, 'detail': "JsonTypeInfo metadata for type 'System.Collections.Generic.List`1[Betalgo.Ranul.OpenAI.ObjectModels.RequestModels.MessageContent]' was not provided by TypeInfoResolver... Path: $.ContentCalculated." }
cause The Foundry Local service (which is .NET based) might not correctly handle the multi-modal message content array format (e.g., `Message(contents=[...])`) sent by the Python Agent Framework, leading to a JSON serialization error on the server side.
fix
This is a known issue, potentially indicating a mismatch in how the Python client and .NET server handle message serialization. Ensure both the agent-framework and agent-framework-foundry-local packages are updated to the latest beta/stable versions. If the issue persists, try simplifying message content or reporting the issue to the Microsoft Agent Framework GitHub.
error Service connection errors (Request to local service failed. Uri:http://127.0.0.1:0/foundry/list)
cause The Foundry Local service is either not running, or there are port binding issues preventing the Python client from connecting.
fix
Ensure the Foundry Local service is started by running foundry service start in your terminal. If it's already running, try foundry service restart to resolve potential port conflicts or service accessibility problems.
error Slow inference or unexpected CPU-only model usage despite GPU availability.
cause The selected local model might be a CPU-only variant, or Foundry Local is not correctly detecting/utilizing the available GPU hardware. Large models can also naturally be slow on less powerful hardware.
fix
Use foundry model list to check for GPU-optimized model variants (e.g., phi-4-mini-instruct-cuda-gpu). Ensure GPU drivers are up-to-date. Stop other AI inference sessions (e.g., AI Toolkit for VS Code) that might be hogging GPU resources. Consider more quantized model variants (e.g., INT8) for better performance.
breaking The broader Microsoft Agent Framework (which `agent-framework-foundry-local` integrates with) underwent significant architectural changes in its 1.0.0 release. Specifically, the `Message` constructor's `text` parameter was removed in favor of `contents=[...]`, and provider designs shifted. Existing code relying on older patterns of the `agent-framework` with `FoundryLocalClient` may break and require migration.
fix Update `agent-framework` to 1.0.0 or later and revise message construction to use `Message(contents=[...])`. Consult the official migration guide for Agent Framework 1.0.0.
gotcha This package requires the separate installation and active running of the Foundry Local CLI and its runtime components. `pip install` only installs the Python client library, not the local LLM server.
fix Before running Python code, install Foundry Local via `winget install Microsoft.FoundryLocal` (Windows) or `brew tap microsoft/foundrylocal && brew install foundrylocal` (macOS), then start the service with `foundry service start`.
gotcha The first time a model is used with Foundry Local, it needs to be downloaded, which can take a significant amount of time and consume considerable disk space depending on the model size.
fix Plan for initial model download time. Use `foundry model list` to see available models and `foundry model run <model_name>` or `foundry model download <model_name>` to pre-download models if desired.
gotcha Function/tool calling and structured output capabilities of the agent are dependent on the specific local model chosen and its support within Foundry Local. Not all local models support these advanced features.
fix Consult Foundry Local documentation for specific models' capabilities. The `FoundryLocalClient.manager` helper can be used to inspect the local catalog and supported features.
gotcha Foundry Local is designed for on-device inference and local development. It is not intended for distributed, containerized, or multi-machine production deployments.
fix For production deployments requiring scalability and managed services, consider using `agent-framework-foundry` with Azure AI Foundry's hosted services rather than `agent-framework-foundry-local`.
winget install Microsoft.FoundryLocal (Windows) OR brew tap microsoft/foundrylocal && brew install foundrylocal (macOS)

This quickstart demonstrates how to create and run a simple AI agent using FoundryLocalClient to connect to a locally hosted LLM via the Foundry Local runtime. Ensure the Foundry Local CLI is installed, the service is running (`foundry service start`), and the specified model (e.g., `phi-4-mini`) is available locally.

import asyncio
from agent_framework import Agent
from agent_framework.foundry import FoundryLocalClient

async def main():
    # Set the default local model (e.g., phi-4-mini, qwen2.5) as an environment variable or pass explicitly
    # Ensure Foundry Local CLI is installed and 'foundry service start' has been run.
    # Optionally, set: os.environ['FOUNDRY_LOCAL_MODEL'] = 'phi-4-mini'

    client = FoundryLocalClient(model='phi-4-mini') # Requires 'phi-4-mini' model to be downloaded via Foundry Local CLI
    agent = Agent(
        client=client,
        name='LocalAssistant',
        instructions='You are a helpful local assistant that runs entirely on my machine.'
    )

    print("Local agent created. Asking a question...")
    result = await agent.run("Tell me a fun fact about Python.")
    print(f"Agent response: {result}")

if __name__ == '__main__':
    asyncio.run(main())