Docker

Building AI Teams: How Docker Sandboxes and Docker Agent Transform Development

Jennifer Kohl — Wed, 11 Mar 2026 13:00:00 +0000

It’s 11 PM. You’ve got a JIRA ticket open, an IDE with three unsaved files, a browser tab on Stack Overflow, and another on documentation. You’re context-switching between designing UI, writing backend APIs, fixing bugs, and running tests. You’re wearing all the hats, product manager, designer, engineer, QA specialist, and it’s exhausting.

What if instead of doing it all yourself, you could describe the goal and have a team of specialized AI agents handle it for you?

One agent breaks down requirements, another designs the interface, a third builds the backend, a fourth tests it, and a fifth fixes any issues. Each agent focuses on what it does best, working together autonomously while you sip your coffee.That’s not sci-fi, it’s what Agent + Docker Sandboxes delivers today.

What is Docker Agent?

Docker Agent is an open source tool for building teams of specialized AI agents. Instead of prompting one general-purpose model to do everything, you define agents with specific roles that collaborate to solve complex problems.

Here’s a typical dev-team configuration:

agents:
root:
model: openai/gpt-5
description: Product Manager - Leads the development team and coordinates iterations
instruction: |
Break user requirements into small iterations. Coordinate designer → frontend → QA.
- Define feature and acceptance criteria
- Ensure iterations deliver complete, testable features
- Prioritize based on value and dependencies
sub_agents: [designer, awesome_engineer, qa, fixer_engineer]
toolsets:
- type: filesystem
- type: think
- type: todo
- type: memory
path: dev_memory.db

designer:
model: openai/gpt-5
description: UI/UX Designer - Creates user interface designs and wireframes
instruction: |
Create wireframes and mockups for features. Ensure responsive, accessible designs.
- Use consistent patterns and modern principles
- Specify colors, fonts, interactions, and mobile layout
toolsets:
- type: filesystem
- type: think
- type: memory
path: dev_memory.db

qa:
model: openai
description: QA Specialist - Analyzes errors, stack traces, and code to identify bugs
instruction: |
Analyze error logs, stack traces, and code to find bugs. Explain what's wrong and why it's happening.
- Review test results, error messages, and stack traces
.......

awesome_engineer:
model: openai
description: Awesome Engineer - Implements user interfaces based on designs
instruction: |
Implement responsive, accessible UI from designs. Build backend APIs and integrate.
..........

fixer_engineer:
model: openai
description: Test Integration Engineer - Fixes test failures and integration issues
instruction: |
Fix test failures and integration issues reported by QA.
- Review bug reports from QA

The root agent acts as product manager, coordinating the team. When a user requests a feature, root delegates to designer for wireframes, then awesome_engineer for implementation, qa for testing, and fixer_engineer for bug fixes. Each agent uses its own model, has its own context, and accesses tools like filesystem, shell, memory, and MCP servers.

Agent Configuration

Each agent is defined with five key attributes:

model: The AI model to use (e.g., openai/gpt-5, anthropic/claude-sonnet-4-5). Different agents can use different models optimized for their tasks.
description: A concise summary of the agent’s role. This helps Docker Agent understand when to delegate tasks to this agent.
instruction: Detailed guidance on what the agent should do. Includes workflows, constraints, and domain-specific knowledge.
sub_agents: A list of agents this agent can delegate work to. This creates the team hierarchy.
toolsets: The tools available to the agent. Built-in options include filesystem (read/write files), shell (run commands), think (reasoning), todo (task tracking), memory (persistent storage), and mcp (external tool connections).

This configuration system gives you fine-grained control over each agent’s capabilities and how they coordinate with each other.

Why Agent Teams Matter

One agent handling complex work means constant context-switching. Split the work across focused agents instead, each handles what it’s best at. Docker Agent manages the coordination.

The benefits are clear:

Specialization: Each agent is optimized for its role (design vs. coding vs. debugging)
Parallel execution: Multiple agents can work on different aspects simultaneously
Better outcomes: Focused agents produce higher quality work in their domain
Maintainability: Clear separation of concerns makes teams easier to debug and iterate

The Problem: Running AI Agents Safely

Agent teams are powerful, but they come with a serious security concern. These agents need to:

Read and write files on your system
Execute shell commands (npm install, git commit, etc.)
Access external APIs and tools
Run potentially untrusted code

Giving AI agents full access to your development machine is risky. A misconfigured agent could delete files, leak secrets, or run malicious commands. You need isolation, agents should be powerful but contained.

Traditional virtual machines are too heavy. Chroot jails are fragile. You need something that provides:

Strong isolation from your host machine
Workspace access so agents can read your project files
Familiar experience with the same paths and tools
Easy setup without complex networking or configuration

Docker Sandboxes: The Secure Foundation

Docker Sandboxes solves this by providing isolated environments for running AI agents. As of Docker Desktop 4.60+, sandboxes run inside dedicated microVMs, providing a hard security boundary beyond traditional container isolation. When you run docker sandbox run , Docker creates an isolated microVM workspace that:

Mounts your project directory at the same absolute path (on Linux and macOS)
Preserves your Git configuration for proper commit attribution
Does not inherit environment variables from your current shell session
Gives agents full autonomy without compromising your host
Provides network isolation with configurable allow/deny lists

Docker Sandboxes now natively supports six agent types: Claude Code, Gemini, Codex, Copilot, Agent, and Kiro (all experimental). Agent can be launched directly as a sandbox agent:

# Run Agent natively in a sandbox
docker sandbox create agent ~/path/to/workspace
docker sandbox run agent ~/path/to/workspace

Or, for more control, use a detached sandbox:

# Create a sandbox
docker sandbox run -d --name my-agent-sandbox claude

# Copy agent into the sandbox
docker cp /usr/bin/agent <container-id>:/usr/bin/agent

# Run your agent team
docker exec -it <container-id> bash -c "cd /path/to/workspace && agent run dev-team.yaml"

Your workspace /Users/alice/projects/myapp on the host is also /Users/alice/projects/myapp inside the microVM. Error messages, scripts with hard-coded paths, and relative imports all work as expected. But the agent is contained in its own microVM, it can’t access files outside the mounted workspace, and any damage it causes is limited to the sandbox.

Why Docker Sandboxes Matter

The combination of agents and Docker Sandboxes gives you something powerful:

Full agent autonomy: Agents can install packages, run tests, make commits, and use tools without constant human oversight
Complete safety: Even if an agent makes a mistake, it’s contained within the microVM sandbox
Hard security boundary: MicroVM isolation goes beyond containers, each sandbox runs in its own virtual machine
Network control: Allow/deny lists let you restrict which external services agents can access
Familiar experience: Same paths, same tools, same workflow as working directly on your machine
Workspace persistence: Changes sync between host and microVM, so your work is always available

Here’s how the workflow looks in practice:

User requests a feature to the root agent: “Create a bank app with Gradio”
Root creates a todo list and delegates to the designer
Designer generates wireframes and UI specifications
Awesome_engineer implements the code, running pip install gradio and python app/main.py
QA runs tests, finds bugs, and reports them
Fixer_engineer resolves the issues
Root confirms all tests pass and marks the feature complete

All of this happens autonomously inside a sandboxed environment. The agents can install dependencies, modify files, and execute commands, but they’re isolated from your host machine.

Try It Yourself

Let’s walk through setting up a simple agent team in a Docker Sandbox.

Prerequisites

Docker Desktop 4.60+ with sandbox support (microVM-based isolation)
agent (included in Docker Desktop 4.49+)
API key for your model provider (Anthropic, OpenAI, or Google)

Step 1: Create Your Agent Team

Save this configuration as dev-team.yaml:

models:
 openai:
   provider: openai
   model: gpt-5

agents:
 root:
   model: openai
   description: Product Manager - Leads the development team
   instruction: |
     Break user requirements into small iterations. Coordinate designer → frontend → QA.
   sub_agents: [designer, awesome_engineer, qa]
   toolsets:
     - type: filesystem
     - type: think
     - type: todo

 designer:
   model: openai
   description: UI/UX Designer - Creates designs and wireframes
   instruction: |
     Create wireframes and mockups for features. Ensure responsive designs.
   toolsets:
     - type: filesystem
     - type: think

 awesome_engineer:
   model: openai
   description: Developer - Implements features
   instruction: |
     Build features based on designs. Write clean, tested code.
   toolsets:
     - type: filesystem
     - type: shell
     - type: think

 qa:
   model: openai
   description: QA Specialist - Tests and identifies bugs
   instruction: |
     Test features and identify bugs. Report issues to fixer.
   toolsets:
     - type: filesystem
     - type: think

Step 2: Create a Docker Sandbox

The simplest approach is to use agent as a native sandbox agent:

# Run agent directly in a sandbox (experimental)
docker sandbox run agent ~/path/to/your/workspace

Alternatively, use a detached Claude sandbox for more control:

# Start a detached sandbox
docker sandbox run -d --name my-dev-sandbox claude

# Copy agent into the sandbox
which agent  # Find the path on your host
docker cp $(which agent) $(docker sandbox ls --filter name=my-dev-sandbox -q):/usr/bin/agent

Step 3: Set Environment Variables

# Run agent with your API key (passed inline since export doesn't persist across exec calls)
docker exec -it -e OPENAI_API_KEY=your_key_here my-dev-sandbox bash

Step 4: Run Your Agent Team

# Mount your workspace and run agent
docker exec -it my-dev-sandbox bash -c "cd /path/to/your/workspace && agent run dev-team.yaml"

Now you can describe what you want to build, and your agent team will handle the rest:

User: Create a bank application using Python. The bank app should have basic functionality like account savings, show balance, withdraw, add money, etc. Build the UI using Gradio. Create a directory called app, and inside of it, create all of the files needed by the project

Agent (root): I'll break this down into iterations and coordinate with the team...

Watch as the designer creates wireframes, the engineer builds the Gradio app, and QA tests it, all autonomously in a secure sandbox.

Final result from a one shot prompt

Step 5: Clean Up

When you’re done:

# Remove the sandbox
docker sandbox rm my-dev-sandbox

Docker enforces one sandbox per workspace. Running docker sandbox run in the same directory reuses the existing container. To change configuration, remove and recreate the sandbox.

Current Limitations

Docker Sandboxes and Docker Agent are evolving rapidly. Here are a few things to know:

Docker Sandboxes now supports six agent types natively: Claude Code, Gemini, Codex, Copilot, agent, and Kiro. All are experimental and breaking changes may occur between Docker Desktop versions.
Custom Shell that doesn’t include a pre-installed agent binary. Instead, it provides a clean environment where you can install and configure any agent or tool
MicroVM sandboxes require macOS or Windows. Linux users can use legacy container-based sandboxes with Docker Desktop 4.57+
API keys may still need manual configuration depending on the agent type
Sandbox templates are optimized for certain workflows; custom setups may require additional configuration

Why This Matters Now

AI agents are becoming more capable, but they need infrastructure to run safely and effectively. The combination of agent and Docker Sandboxes addresses this by:

Feature	Traditional Approach	With agent + Docker Sandboxes
Autonomy	Limited – requires constant oversight	High – agents work independently
Security	Risky – agents have host access	Isolated – agents run in microVMs
Specialization	One model does everything	Multiple agents with focused roles
Reproducibility	Inconsistent across machines	MicroVM-isolated, version-controlled
Scalability	Manual coordination	Automated team orchestration

This isn’t just about convenience, it’s about enabling AI agents to do real work in production environments, with the safety guarantees that developers expect.

What’s Next

Explore the Docker Agent documentation to build your own agent teams
Check out Docker Sandboxes for advanced configurations
Browse example agent configurations in the agent repository
Integrate agent with your editor or use agents as tools in MCP clients

Conclusion

We’re moving from “prompting AI to write code” to “orchestrating AI teams to build software.” agent gives you the team structure; Docker Sandboxes provides the secure foundation.

The days of wearing every hat as a solo developer are numbered. With specialized AI agents working in isolated containers, you can focus on what matters, designing great software, while your AI team handles the implementation, testing, and iteration.

Try it out. Build your own agent team. Run it in a Docker Sandbox. See what happens when you have a development team at your fingertips, ready to ship features while you grab lunch.

What’s Holding Back AI Agents? It’s Still Security

Yiwen Xu — Tue, 10 Mar 2026 12:59:28 +0000

It’s hard to find a team today that isn’t talking about agents. For most organizations, this isn’t a “someday” project anymore. Building agents is a strategic priority for 95% of respondents that we surveyed across the globe with 800+ developers and decision makers in our latest State of Agentic AI research. The shift is happening fast: agent adoption has moved beyond experiments and demos into something closer to early operational maturity. 60% of organizations already report having AI agents in production, though a third of those remain in early stages.

Agent adoption today is driven by a pragmatic focus on productivity, efficiency, and operational transformation, not revenue growth or cost reduction. Early adoption is concentrated in internal, productivity-focused use cases, especially across software, infrastructure, and operations. The feedback loops are fast, and the risks are easier to control.

So what’s holding back agent scaling? Friction shows up and nearly all roads lead to the same place: AI agent security.

AI agent security isn’t one issue it’s the constraint

When teams talk about what’s holding them back, AI agent security rises to the top. In the same survey, 40% of respondents cite security as their top blocker when building agents. The reason it hits so hard is that it’s not confined to a single layer of the stack. It shows up everywhere, and it compounds as deployments grow.

For starters, when it comes to infrastructure, as organizations expand agent deployments, teams emphasize the need for secure sandboxing and runtime isolation, even for internal agents.

At the operations layer, complexity becomes a security problem. Once you have more tools, more integrations, and more orchestration logic, it gets harder to see what’s happening end-to-end and harder to control it. Our latest research data reflects that sprawl: over a third of respondents report challenges coordinating multiple tools, and a comparable share say integrations introduce security or compliance risk. That’s a classic pattern: operational complexity creates blind spots, and blind spots become exposure.

45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready.

And at the governance layer, enterprises want something simple: consistency. They want guardrails, policy enforcement, and auditability that work across teams and workflows. But current tooling isn’t meeting that bar yet. In fact, 45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready. That’s not a minor complaint: it’s the difference between “we can try this” and “we can scale this.”

MCP is popular but not ready for enterprise

Many teams are adopting Model Context Protocol (MCP) because it gives agents a standardized way to connect to tools, data, and external systems, making agents more useful and customized. Among respondents further along in their agent journey, 85% say they’re familiar with MCP and two-thirds say they actively use it across personal and professional projects.

Research data suggests that most teams are operating in what could be described as “leap-of-faith mode” when it comes to MCP, adopting the protocol without security guarantees and operational controls they would demand from mature enterprise infrastructure.

But the security story hasn’t caught up yet. Teams adopt MCP because it works, but they do so without the security guarantees and operational controls they would expect from mature enterprise infrastructure. For teams earlier in their agentic journey: 46% of them identify security and compliance as the top challenge with MCP.

Organizations are increasingly watching for threats like prompt injection and tool poisoning, along with the more foundational issues of access control, credentials, and authentication. The immaturity and security challenges of current MCP tooling make for a fragile foundation at this stage of agentic adoption.

Conclusion and recommendations

Ai agent security is what sets the speed limit for agentic AI in the enterprise. Organizations aren’t lacking interest, they’re lacking confidence that today’s tooling is enterprise-ready, that access controls can be enforced reliably, and that agents can be kept safely isolated from sensitive systems.

The path forward is clear. Unlocking agents’ full potential will require new platforms built for enterprise scale, with secure-by-default foundations, strong governance, and policy enforcement that’s integrated, not bolted on.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise.

Join us on March 25, 2026, for a webinar where we’ll walk through the key findings and the strategies that can help you prioritize what comes next.

Learn more:

Get your copy of the latest State of Agentic AI report!
Learn more about Docker’s AI solutions
Read more about why AI agents challenge existing governance approaches and explore a new framework designed for agentic AI.

Celebrating Women in AI: 3 Questions with Cecilia Liu on Leading Docker’s MCP Strategy

Yiwen Xu — Fri, 06 Mar 2026 12:59:30 +0000

To celebrate International Women’s Day, we sat down with Cecilia Liu, Senior Product Manager at Docker, for three questions about the vision and strategy behind Docker’s MCP solutions. From shaping product direction to driving AI innovation, Cecilia plays a key role in defining how Docker enables secure, scalable AI tooling.

Cecilia leads product management for Docker’s MCP Catalog and Toolkit, our solution for running MCP servers securely and at scale through containerization. She drives Docker’s AI strategy across both enterprise and developer ecosystems, helping organizations deploy MCP infrastructure with confidence while empowering individual developers to seamlessly discover, integrate, and use MCP in their workflows. With a technical background in AI frameworks and an MBA from NYU Stern, Cecilia bridges the worlds of AI infrastructure and developer tools, turning complex challenges into practical, developer-first solutions.

What products are you responsible for?

I own Docker’s MCP solution. At its core, it’s about solving the problems that anyone working with MCP runs into: how do you find the right MCP servers, how do you actually use them without a steep learning curve, and how do you deploy and manage them reliably across a team or organization.

How does Docker’s MCP solution benefit developers and enterprise customers?

Dev productivity is where my heart is. I want to build something that meaningfully helps developers at every stage of their cycle — and that’s exactly how I think about Docker’s MCP solution.

For end-user developers and vibe coders, the goal is simple: you shouldn’t need to understand the underlying infrastructure to get value from MCP. As long as you’re working with AI, we make it easy to discover, configure, and start using MCP servers without any of the usual setup headaches. One thing I kept hearing in user feedback was that people couldn’t even tell if their setup was actually working. That pushed us to ship in-product setup instructions that walk you through not just configuration, but how to verify everything is running correctly. It sounds small, but it made a real difference.

For developers building MCP servers and integrating them into agents, I’m focused on giving them the right creation and testing tools so they can ship faster and with more confidence. That’s a big part of where we’re headed.

And for security and enterprise admins, we’re solving real deployment pain, making it faster and cheaper to roll out and manage MCP across an entire organization. Custom catalogs, role-based access controls, audit logging, policy enforcement. The goal is to give teams the visibility and control they need to adopt AI tooling confidently at scale.

Customers love us for all of the above, and there’s one more thing that ties it together: the security that comes built-in with Docker. That trust doesn’t happen overnight, and it’s something we take seriously across everything we ship.

What are you excited about when it comes to the future of MCP?

What excites me most is honestly the pace of change itself. The AI landscape is shifting constantly, and with every new tool that makes AI more powerful, there’s a whole new set of developers who need a way to actually use it productively. That’s a massive opportunity.

MCP is where that’s happening right now, and the adoption we’re seeing tells me the need is real. But what gets me out of bed is knowing the problems we’re solving: discoverability, usability, deployment. They are all going to matter just as much for whatever comes next. We’re not just building for today’s tools. We’re building the foundation that developers will reach for every time something new emerges.

Cecilia is speaking about scaling MCP for enterprises at the MCP Dev Summit in NYC on 3rd of April, 2026. If you’re attending, be sure to stop by Docker’s booth (D/P9).

Learn more

Explore Docker’s MCP Catalog and Toolkit on our website.
Dive into our documentation to get started quickly.
Ready to go hands-on? Open Docker Desktop or the CLI and start using MCP to streamline and automate your development workflows.

Announcing Docker Hardened System Packages

Vishrut Iyengar — Tue, 03 Mar 2026 20:30:00 +0000

Your Package Manager, Now with a Security Upgrade

Last December, we made Docker Hardened Images (DHI) free because we believe secure, minimal, production-ready images should be the default. Every developer deserves strong security at no cost. It should not be complicated or locked behind a paywall.

From the start, flexibility mattered just as much as security. Unlike opaque, proprietary hardened alternatives, DHI is built on trusted open source foundations like Alpine and Debian. That gives teams true multi-distro flexibility without forcing change. If you run Alpine, stay on Alpine. If Debian is your standard, keep it. DHI strengthens what you already use. It does not require you to replace it.

Today, we are extending that philosophy beyond images.

With Docker Hardened System Packages, we’re driving security deeper into the stack. Every package is built on the same secure supply chain foundation: source-built and patched by Docker, cryptographically attested, and backed by an SLA.

The best part? Multi-distro support by design.

The result is consistent, end-to-end hardening across environments with the production-grade reliability teams expect.

Since introducing DHI Community (our OSS tier), interest has surged. The DHI catalog has expanded from more than 1,000 to over 2,000 hardened container images. Its openness and ability to meet teams where they are have accelerated adoption across the ecosystem. Companies of all sizes, along with a growing number of open source projects, are making DHI their standard for secure containers.

Just consider this short selection of examples:

n8n.io has moved its production infrastructure to DHI, they share why and how in this recent webinar
Medplum, an open-source electronic health records platform (managing data of 20+ million patients) has now standardized to DHI
Adobe uses DHI because of great alignment with its security posture and developer tooling compatibility
Attentive co-authored this e-book with Docker on helping others move from POC to production with DHI

Docker Hardened System Packages: Going deeper into the container

From day one, Docker has built and secured the most critical operating system packages to deliver on our CVE remediation commitments. That’s how we continuously maintain near-zero CVEs in DHI images. At the same time, we recognize that many teams extend our minimal base images with additional upstream packages to meet their specific requirements. To support that reality, we are expanding our catalog with more than 8,000 hardened Alpine packages, with Debian coverage coming soon.

This expansion gives teams greater flexibility without weakening their security posture. You can start with a DHI base image and tailor it to your needs while maintaining the same hardened supply chain guarantees. There is no need to switch distros to get continuous patching, verified builds through a SLSA Build Level 3 pipeline, and enterprise-grade assurances. Your teams can continue working with the Alpine and Debian environments they know, now backed by Docker’s secure build system from base image to system package.

Why this matters for your security posture:

Complete provenance chain. Every package is built from source by Docker, attested, and cryptographically signed. From base image to final container, your provenance stays intact.

Faster vulnerability remediation. When a vulnerability is identified, we patch it at the package level and publish it to the catalog. Not image by image. That means fixes move faster and remediation scales across your entire container fleet.

Extending the near-zero CVE guarantee. DHI images maintain near-zero. Hardened System Packages extend that guarantee more broadly across the software ecosystem, covering packages you add during customization.

Use hardened packages with your containers. DHI Enterprise customers get access to the secure packages repository, making it possible to use Hardened System Packages beyond DHI images. Integrate them into your own pipelines and across Alpine and Debian workloads throughout your environment.

The work we’re doing on our users’ behalf: Maintaining thousands of packages is continuous work. We monitor upstream projects, backport patches, test compatibility, rebuild when dependencies change, and generate attestations for every release. Alpine alone accounts for more than 8,000 packages today, soon approaching 10,000, with Debian next.

Making enterprise-grade security even more accessible

We’re also simplifying how teams access DHI. The full catalog of thousands of open-source images under Apache 2.0 now has a new name: DHI Community. There are no licensing changes, this is just a name change, so all of that free goodness has an easy name to refer to.

For teams that need SLA-backed CVE remediation and customization capabilities at a more accessible price point, we’re announcing a new pricing tier today, DHI Select. This new tier brings enterprise-grade security at a price of $5,000 per repo.

For organizations with more demanding requirements, including unlimited customizations, access to the Hardened System Packages repo, and extended lifecycle coverage for up to five years after upstream EOL, DHI Enterprise and the DHI Extended Lifecycle Support add-on remain available.

More options means more teams can adopt the right level of security for where they are today.

Build with the standard that’s redefining container security

Docker’s momentum in securing the software supply chain is accelerating. We’re bringing security to more layers of the stack, making it easier for teams to build securely by default, for open source-based containers as well as your company’s internally-developed software. We’re also pushing toward a one-day (or shorter) timeline for critical CVE fixes. Each step builds on the last, moving us closer to end-to-end supply chain security for all of your critical applications.

Get started:

Join the n8n webinar to see how they’re running production workloads on DHI
Start your free trial and get access to the full DHI catalog, now with Docker Hardened System Packages

Docker Model Runner Brings vLLM to macOS with Apple Silicon

Yiwen Xu — Thu, 26 Feb 2026 14:42:57 +0000

vLLM has quickly become the go-to inference engine for developers who need high-throughput LLM serving. We brought vLLM to Docker Model Runner for NVIDIA GPUs on Linux, then extended it to Windows via WSL2.

That changes today. Docker Model Runner now supports vllm-metal, a new backend that brings vLLM inference to macOS using Apple Silicon’s Metal GPU. If you have a Mac with an M-series chip, you can now run MLX models through vLLM with the same OpenAI-compatible API, same Anthropic-compatible API for tools like Claude Code, and all in one, the same Docker workflow.

What is vllm-metal?

vllm-metal is a plugin for vLLM that brings high-performance LLM inference to Apple Silicon. Developed in collaboration between Docker and the vLLM project, it unifies MLX, the Apple’s machine learning framework, and PyTorch under a single compute pathway, plugging directly into vLLM’s existing engine, scheduler, and OpenAI-compatible API server.

The architecture is layered: vLLM’s core (engine, scheduler, tokenizer, API) stays unchanged on top. A plugin layer consisting of MetalPlatform, MetalWorker, and MetalModelRunner handles the Apple Silicon specifics. Underneath, MLX drives the actual inference while PyTorch handles model loading and weight conversion. The whole stack runs on Metal, Apple’s GPU framework.

+-------------------------------------------------------------+
|                          vLLM Core                          |
|        Engine | Scheduler | API | Tokenizers                |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   vllm_metal Plugin Layer                   |
|   +-----------+  +-----------+  +------------------------+  |
|   | Platform  |  | Worker    |  | ModelRunner            |  |
|   +-----------+  +-----------+  +------------------------+  |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   Unified Compute Backend                   |
|   +------------------+    +----------------------------+    |
|   | MLX (Primary)    |    | PyTorch (Interop)          |    |
|   | - SDPA           |    | - HF Loading               |    |
|   | - RMSNorm        |    | - Weight Conversion        |    |
|   | - RoPE           |    | - Tensor Bridge            |    |
|   | - Cache Ops      |    |                            |    |
|   +------------------+    +----------------------------+    |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                       Metal GPU Layer                       |
|           Apple Silicon Unified Memory Architecture         |
+-------------------------------------------------------------+

Figure 1: High-level architecture diagram of vllm-metal. Credit: vllm-metal

What makes this particularly effective on Apple Silicon is unified memory. Unlike discrete GPUs where data must be copied between CPU and GPU memory, Apple Silicon shares a single memory pool. vllm-metal exploits this with zero-copy tensor operations. Combined with paged attention for efficient KV cache management and Grouped-Query Attention support, this means you can serve longer sequences with less memory waste.

vllm-metal runs MLX models published by the mlx-community on Hugging Face. These models are built specifically for the MLX framework and take full advantage of Metal GPU acceleration. Docker Model Runner automatically routes MLX models to vllm-metal when the backend is installed, falling back to the built-in MLX backend otherwise.

How vllm-metal works

vllm-metal runs natively on the host. This is necessary because Metal GPU access requires direct hardware access and there is no GPU passthrough for Metal in containers.

When you install the backend, Docker Model Runner:

Pulls a Docker image from Hub that contains a self-contained Python 3.12 environment with vllm-metal and all dependencies pre-packaged.
Extracts it to `~/.docker/model-runner/vllm-metal/`.
Verifies the installation by importing the `vllm_metal` module.

When a request comes in for a compatible model, the Docker Model Runner’s scheduler starts a vllm-metal server process that communicates over TCP, serving the standard OpenAI API. The model is loaded from Docker’s shared model store, which contains all the models you pull with `docker model pull`.

Which models work with vllm-metal?

vllm-metal works with safetensors models in MLX format. The mlx-community on Hugging Face maintains a large collection of quantized models optimized for Apple Silicon. Some examples you can try:

vLLM everywhere with Docker Model Runner

With vllm-metal, Docker Model Runner now supports vLLM across the three major platforms:

Platform	Backend	GPU
Linux	vllm	NVIDIA (CUDA)
Windows (WSL2)	vllm	NVIDIA (CUDA)
macOS	vllm-metal	Apple Silicon (Metal)

The same docker model commands work regardless of platform. Pull a model, run it. Docker Model Runner picks the right backend for your platform.

Get started

Update to Docker Desktop 4.62 or later for Mac, and install the backend:

docker model install-runner --backend vllm-metal

Check out the Docker Model Runner documentation to learn more. For contributions, feedback, and bug reports, visit the docker/model-runner repository on GitHub.

Giving Back: vllm-metal is Now Open Source

At Docker, we believe that the best way to accelerate AI development is to build in the open. That is why we are proud to announce that Docker has contributed the vllm-metal project to the vLLM community. Originally developed by Docker engineers to power Model Runner on macOS, this project now lives under the vLLM GitHub organization. This ensures that every developer in the ecosystem can benefit from and contribute to high-performance inference on Apple Silicon. The project also has had significant contributions by Lik Xun Yuan, Ricky Chen and Ranran Haoran Zhang.

The $599 AI Development Rig

For a long time, high-throughput vLLM development was gated behind a significant GPU cost. To get started, you typically need a dedicated Linux box with an RTX 4090 ($1,700+) or enterprise-grade A100/H100 cards ($10,000+).

vllm-metal changes the math

Now, a base $599 Mac Mini with an M4 chip becomes a viable vLLM development environment. Because Apple Silicon uses Unified Memory, that 16GB (or upgraded 32GB/64GB) of RAM is directly accessible by the GPU. This allows you to:

Develop & Test Locally: Build your vLLM-based applications on the same machine you use for coding.
Production-Mirroring: Use the exact same OpenAI-compatible API on your Mac Mini as you would on an H100 cluster in production.
Energy Efficiency: Run inference at a fraction of the power consumption (and heat) of a discrete GPU rig.

How does vllm-metal compare to llama.cpp?

We benchmarked both backends using Llama 3.2 1B Instruct with comparable 4-bit quantization, served through Docker Model Runner on Apple Silicon.

	llama.cpp	vLLM-Metal
Model	unsloth/Llama-3.2-1B-Instruct-GGUF:Q4_0	mlx-community/llama-3.2-1b-instruct-4bit
Format	GGUF (Q4_0)	Safetensors (MLX 4-bit)

Throughput (tokens/sec, wall-clock)

max_tokens	llama.cpp	vLLM-Metal	speedup
128	333.3	251.5	1.3x
512	345.1	279.0	1.3x
1024	338.5	275.4	1.2x
2048	339.1	279.5	1.2x

Each configuration was run 3 times across 3 different prompts (9 total requests per data point).

Throughput is measured as completion_tokens / wall_clock_time, applied consistently to both backends.

Key observations:

llama.cpp is consistently ~1.2x faster than vLLM-Metal across all output lengths.
llama.cpp throughput is remarkably stable (~333-345 tok/s regardless of max_tokens), while vLLM-Metal shows more variance between individual runs (134-343 tok/s).
Both backends scale well. Neither backend shows significant degradation as output length increases.
Quantization methods differ (GGUF Q4_0 vs MLX 4-bit), so this benchmarks the full stack, engine + quantization, rather than the engine alone.

The benchmark script used for these results is available as a GitHub Gist.

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. To get involved:

Star the repository: Show your support by starring the Docker Model Runner repo.
Contribute your ideas: Create an issue or submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends and colleagues who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn More

Read the companion post: OpenCode with Docker Model Runner for Private AI Coding
Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo
Get started with a simple hello GenAI application

Open WebUI + Docker Model Runner: Self-Hosted Models, Zero Configuration

Yiwen Xu — Wed, 25 Feb 2026 14:37:33 +0000

We’re excited to share a seamless new integration between Docker Model Runner (DMR) and Open WebUI, bringing together two open source projects to make working with self-hosted models easier than ever.

With this update, Open WebUI automatically detects and connects to Docker Model Runner running at localhost:12434. If Docker Model Runner is enabled, Open WebUI uses it out of the box, no additional configuration required.

The result: a fully Docker-managed, self-hosted model experience running in minutes.

Note for Docker Desktop users:
If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled. Open WebUI connects to Docker Model Runner over HTTP, which requires the TCP port to be exposed:

docker desktop enable model-runner --tcp

Better Together: Docker Model Runner and Open WebUI

Docker Model Runner and Open WebUI come from the same open source mindset. They’re built for developers who want control over where their models run and how their systems are put together, whether that’s on a laptop for quick experimentation or on a dedicated GPU host with more horsepower behind it.

Docker Model Runner focuses on the runtime layer: a Docker-native way to run and manage self-hosted models using the tooling developers already rely on. Open WebUI focuses on the experience: a clean, extensible interface that makes those models accessible and useful.

Now, the two connect automatically.

No manual endpoint configuration. No extra flags.

That’s the kind of integration open source does best, separate projects evolving independently, but designed well enough to fit together naturally.

Zero-Config Setup

If Docker Model Runner is enabled, getting started with Open WebUI is as simple as:

docker run -p 3000:8080 openwebui/open-webui

That’s it.

Open WebUI will automatically connect to Docker Model Runner and begin using your self-hosted models, no environment variables, no manual endpoint configuration, no extra flags.

Visit: http://localhost:3000 and create your account:

And you’re ready to interact with your models through a modern web interface:

Open by design

One of the nice things about this integration is that it didn’t require special coordination or proprietary hooks. Docker Model Runner and Open WebUI are both open source projects with clear boundaries and well-defined interfaces. They were built independently, and they still fit together cleanly.

Docker Model Runner focuses on running and managing models in a way that feels natural to anyone already using Docker.

Open WebUI focuses on making those models usable. It provides the interface layer, conversation management, and extensibility you’d expect from a modern web UI.

Because both projects are open, there’s no hidden contract between them. You can see how the connection works. You can modify it if you need to. You can deploy the pieces separately or together. The integration isn’t a black box, it’s just software speaking a clear interface.

Works with Your Setup

One of the practical benefits of this approach is flexibility.

Docker Model Runner doesn’t dictate where your models run. They might live on your laptop during development, on a more powerful remote machine, or inside a controlled internal environment. As long as Docker Model Runner is reachable, Open WebUI can connect to it.

That separation between runtime and interface is intentional. The UI doesn’t need to know how the model is provisioned. The runtime doesn’t need to know how the UI is presented. Each layer does its job.

With this integration, that boundary becomes almost invisible. Start the container, open your browser, and everything lines up.

You decide where the models run. Open WebUI simply meets them there.

Summary

Open WebUI and Docker Model Runner make self-hosted AI simple, flexible and fully under your control. Docker powers the runtime. Open WebUI delivers a modern interface on top.

With automatic detection and zero configuration, you can go from enabling Docker Model Runner to interact with your models in minutes.

Both projects are open source and built with clear boundaries, so you can run models wherever you choose and deploy the pieces together or separately. We can’t wait to see what you build next!

How You Can Get Involved

The strength of Docker Model Runner lies in its community and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.
Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!
Get started with Docker Model Runner with a simple hello GenAI application

From the Captain’s Chair: Kristiyan Velkov

Jennifer Kohl — Tue, 24 Feb 2026 14:00:00 +0000

Docker Captains are leaders from the developer community that are both experts in their field and are passionate about sharing their Docker knowledge with others. “From the Captain’s Chair” is a blog series where we get a closer look at one Captain to learn more about them and their experiences.

Today we are interviewing Kristiyan Velkov, a Docker Captain and Front-end Tech Lead with over a decade of hands-on experience in web development and DevOps.

Kristiyan builds applications with React, Next.js, Angular, and Vue.js, and designs modern front-end architectures. Over the years, Docker has become a core part of his daily work — used as a practical tool for building, testing, and deploying front-end applications in a predictable way.

He focuses on production-ready Docker setups for front-end teams, including clean Dockerfiles, multi-stage builds, and CI/CD pipelines that work consistently across environments. His work is grounded in real projects and long-term maintenance, not theoretical examples.

Kristiyan is the author of four technical books, one of which is “Docker for Front-end Developers”. He actively contributes to open-source projects and is the person behind several official Docker guides, including guides for React.js, Node.js, Angular, Vue.js, and related front-end technologies.

Through writing, open source,speaking and mentoring, he helps developers understand Docker better — explaining not just how things work, but why they are done a certain way.

As a Docker Captain, his goal is to help bridge the gap between front-end developers and DevOps teams.

Can you share how you first got involved with Docker?

I first started using Docker because I was tired of making the excuse “it works on my machine”. We didn’t have many DevOps people, and the ones we had didn’t really know the front-end or how the application was supposed to behave. At the same time, I didn’t know Docker. That made communication difficult and problems hard to debug.

As a front-end developer, I initially thought Docker wasn’t something I needed to care about. It felt like a DevOps concern. But setting up projects and making sure they worked the same everywhere kept causing issues. Docker solved that problem and completely changed the way I work.

At first, Docker wasn’t easy to understand. But the more I used it, the more I saw how much simpler things became. My projects started running the same across environments, and that consistency saved time and reduced stress.

Over time, my curiosity grew and I went deeper — learning how to design well-structured, production-ready Dockerfiles, optimize build performance, and integrate Docker into CI/CD pipelines following clear, proven best practices, not just setups that work, but ones that are reliable and maintainable long term.

For me, Docker has never been about trends. I started using it to reduce friction between teams and avoid recurring problems, and it has since become a core part of my daily work.

What inspired you to become a Docker Captain?

What inspired me to become a Docker Captain was the desire to share the real struggles I faced as a front-end developer. When I first started using Docker, I wasn’t looking for recognition or titles — I was just trying to fix the problems that were slowing me down and it was hard to explain to some DevOps developers what and why this should work like that without knowing the DevOps terms.

I clearly remember how exhausting it was to set up projects and how much time I wasted dealing with environment issues instead of real front-end work. Docker slowly changed the way I approached development and gave me a more reliable way to build and ship applications.

At some point, I realized I wasn’t the only one in this situation. Many front-end developers were avoiding Docker because they believed it was only meant for back-end or DevOps engineers. I wanted to change that perspective and show that Docker can be practical and approachable for front-end developers as well.

That’s also why I wrote the book Docker for Front-end Developers, where I explain Docker from a front-end perspective, using a real React.js application and walking through how to containerize and deploy it to AWS, with practical code examples and clear diagrams. The goal was to make Docker understandable and useful for people who build user-facing applications every day.

I also contributed official Docker guides for React.js, Angular, and Vue.js — not because I had all the answers, but because I remembered how difficult it felt when there was no clear guidance.

For me, becoming a Docker Captain was never about a title. It has always been about sharing what I’ve learned, building a bridge between front-end developers and containerization, and hopefully making someone else’s journey a little easier than mine.

What are some of your personal goals for the next year?

Over the next year, I want to continue writing books. Writing helps me structure my own knowledge, go deeper into the topics I work with, and hopefully make things clearer for other developers as well. I also want to push myself to speak at more conferences. Public speaking doesn’t come naturally to me, but it’s a good way to grow and to share real, hands-on experience with a broader audience and meet amazing people. I plan to keep contributing to open-source projects and maintaining the official Docker guides I’ve written for Angular, Vue.js, and React.js. People actively use these guides, so keeping them accurate and up to date is important to me. Alongside that, I’ll continue writing on my blog and newsletter, sharing practical insights from day-to-day work.

If you weren’t working in tech, what would you be doing instead?

If I weren’t working in tech, I’d probably be a lawyer — I’m a law graduate. Studying law gave me a strong sense of discipline and a structured approach to problem-solving, which I still rely on today. Over time, though, I realized that technology gives me a different kind of fulfillment. It allows me to build things, create practical solutions, and share knowledge in a way that has a direct and visible impact on people. I don’t think anything else would give me the same satisfaction. In tech, I get to solve problems every day, write code, contribute to open-source projects, write books, and share what I’ve learned with the community. That mix of challenge, creativity, and real impact is hard to replace. Law could have been my profession, but technology is where I truly feel at home.

Can you share a memorable story from collaborating with the Docker community?

One of my most memorable experiences with the Docker community was publishing my open-source project frontend-prod-dockerfiles, which provides production-ready Dockerfiles for most of the popular front-end applications. I originally created it to solve a gap I kept seeing: front-end developers didn’t have a clear, reliable reference for well-structured and optimized Dockerfiles.

The response from the community was better than I expected. Developers from all over the world started using it, sharing feedback and suggesting ideas I hadn’t even considered.

That experience was a strong reminder of what makes the Docker community special — openness, collaboration, and a genuine willingness to help each other grow.

The Docker Captains Conference in Turkey (2025) was amazing. It was well organized, inspiring, and full of great energy. I met great people who share the same passion for Docker.

What’s your favorite Docker product or feature right now, and why?

Right now, my favorite Docker features are Docker Offload and Docker Model Runner.

Offload is a game-changer because it lets me move heavy builds and GPU workloads to secure cloud resources directly from the same Docker CLI/Desktop flow I already use. I don’t have to change the way I work locally, but I get cloud-scale speed whenever I need it.

Model Runner lets me run open models locally in just minutes. And when I need more power, I can pair it with Offload to scale out to GPUs.

Can you walk us through a tricky technical challenge you solved recently?

A recent challenge I dealt with was reviewing Dockerfiles that had been generated with AI. A lot of developers were starting to use AI in our company, but I noticed some serious problems right away, images that were too large, broken caching, hardcoded environment variables, and containers running as root. It was a good reminder that while AI can help, we still need to carefully review and apply best practices when it comes to security and performance.

What’s one Docker tip you wish every developer knew?

One tip I wish every developer knew is that Docker is for everyone, not just DevOps or back-end developers. Front-end developers can benefit just as much by using Docker to create consistent environments, ship production-ready builds, and collaborate more smoothly with their teams. It’s not just infrastructure , it’s a productivity boost for the whole stack. I saw a racing number of tech jobs required to have such kind of basic knowledge which overall is positive.

If you could containerize any non-technical object in real life, what would it be and why?

If I could containerize any non-technical object, it would be a happy day. I’d package a perfectly joyful day and redeploy it whenever I needed , no wasted hours, no broken routines, just a consistent, repeatable “build” of happiness.

Where can people find you online?

On LinkedIn, x.com and also my website. I regularly write technical articles on Medium and share insights in my newsletter Front-end World. My open-source projects, including production-ready Dockerfiles for front-end frameworks, are available on GitHub.

Rapid Fire Questions

Cats or Dogs?

Both, I love animals.

Morning person or night owl?

Morning person for study, night owl for work.

Favorite comfort food?

Pasta.

One word friends would use to describe you?

Persistent

A hobby you picked up recently?

Hiking, I love nature

Gordon (Beta): Docker’s AI Agent Just Got an Update

Srini Sekaran — Mon, 23 Feb 2026 14:13:00 +0000

AI agents are moving from demos to daily workflows. They write code, run commands, and complete multi-step tasks without constant hand-holding. But general-purpose agents don’t know Docker. They don’t understand your containers, your images, or your specific setup.

Gordon does. Just run docker ai in your terminal or try it in Docker Desktop.

Available today in Docker Desktop 4.61, still in beta, Gordon is an AI agent purpose-built for Docker. It has shell access, Docker CLI access, your filesystem, and deep knowledge of Docker best practices. Point it at a problem, approve its actions, and watch it work.

Figure 1: docker ai command launching Gordon in terminal interface

Figure 2: Gordon in Docker Desktop sidebar

Why Docker Needs Its Own Agent

When your container exits with code 137, Claude or ChatGPT will explain what OOM means. Gordon checks your container’s memory limit, inspects the logs, identifies the memory-hungry process, and proposes a fix. One approval, and it’s done.

When you need to containerize a Next.js app, Copilot might suggest a Dockerfile. Gordon examines your project structure, detects your dependencies, generates a production-ready Dockerfile with multi-stage builds, creates docker-compose.yml with the right services, and sets up your environment configs.

The difference is context and execution. Gordon knows what’s running on your machine. It can read your Docker state, access your filesystem, and take action. It’s not guessing – it’s working with your actual environment.

What Gordon Does

Debug and fix – Container won’t start. Service is unhealthy. Something is consuming all the memory. Gordon inspects logs, checks container status, identifies root cause, and proposes fixes. You approve, it executes.

Build and containerize – Take this application and make it run in Docker. Gordon examines your project, generates production-ready Dockerfiles with multi-stage builds, creates docker-compose.yml with the right services, handles environment configs and dependencies.

Execute and manage – Clean up disk space. Stop all containers. Pull and run specific images. Routine Docker operations should be conversational, not a trip to the docs.

Develop and optimize – Add health checks. Implement multi-stage builds. Apply security best practices. Reduce image sizes. Make existing Docker setups production-ready.

Gordon handles all of it.

Figure 3: Split screen showing Gordon debugging a mongodb container

How Gordon Works

Gordon is built on cagent, Docker’s agent framework included with Docker Desktop, and runs locally within Docker Desktop. It has access to:

Your shell – Can execute commands after approval
Your filesystem – Reads project structure, configs, logs

Docker CLI – Full access to Docker operations
Docker knowledge base – Documentation, best practices, common patterns

You can configure Gordon’s working directory to point to a specific codebase. This gives Gordon full context on your project structure, dependencies, and existing Docker setup.

The permission model is straightforward: Gordon shows you what it wants to do, you approve or reject, then it executes. Every command. Every file update. Every Docker operation. You’re not watching passively – you’re directing an agent that knows Docker inside and out.

Figure 4: Permissions request

Where to Find Gordon

Docker Desktop: Look for the Gordon icon in the left sidebar

CLI: Run docker ai from your terminal

Get started today

Download Docker Desktop 4.61+
Log in with your Docker account
Click the Gordon icon, select a project directory, and ask “Optimize my Dockerfile”
Explore the full documentation in Docker Docs

Gordon is available now in Docker Desktop 4.61 and later

Run OpenClaw Securely in Docker Sandboxes

Jennifer Kohl — Mon, 23 Feb 2026 14:00:00 +0000

Docker Sandboxes is a new primitive in the Docker’s ecosystem that allows you to run AI agents or any other workloads in isolated micro VMs. It provides strong isolation, convenient developer experience and a strong security boundary with a network proxy configurable to deny agents connecting to arbitrary internet hosts. The network proxy will also conveniently inject the API keys, like your ANTHROPIC_API_KEY, or OPENAI_API_KEY in the network proxy so the agent doesn’t have access to them at all and cannot leak them.

In a previous article I showed how Docker Sandboxes lets you install any tools an AI agent might need, like a JDK for Java projects or some custom CLIs, into a container that’s isolated from the host. Today we’re going a step further: we’ll run OpenClaw, an open-source AI coding agent, on a local model via Docker Model Runner.

No API keys, no cloud costs, fully private. And you can do it in 2-ish commands.

Quick Start

Make sure you have Docker Desktop and that Docker Model Runner is enabled (Settings → Docker Model Runner → Enable), then pull a model:

docker model pull ai/gpt-oss:20B-UD-Q4_K_XL

Now create and run the sandbox:

docker sandbox create --name openclaw -t olegselajev241/openclaw-dmr:latest shell .
docker sandbox network proxy openclaw --allow-host localhost
docker sandbox run openclaw

Inside the sandbox:

~/start-openclaw.sh

And that’s it. You’re in OpenClaw’s terminal UI, talking to a local gpt-oss model on your machine. The model runs in Docker Model Runner on your host, and OpenClaw runs completely isolated in the sandbox: it can only read and write files in the workspace you give it, and there’s a network proxy to deny connections to unwanted hosts.

Cloud models work too

The sandbox proxy will automatically inject API keys from your host environment. If you have ANTHROPIC_API_KEY or OPENAI_API_KEY set, OpenClaw can run cloud models, just specify them in OpenClaw settings. The proxy takes care of credential injection, so your keys will never be exposed inside the sandbox.

This means you can use free local models for experimentation, then switch to cloud models for serious work all in the same sandbox. With cloud models you don’t even need to allow to proxy to host’s localhost, so don’t run docker sandbox network proxy openclaw --allow-host localhost.

Choose Your Model

The startup script automatically discovers models available in your Docker Model Runner. List them:

~/start-openclaw.sh list

Use a specific model:

~/start-openclaw.sh ai/qwen2.5:7B-Q4_K_M

Any model you’ve pulled with docker model pull is available.

How it works (a bit technical)

The pre-built image (olegselajev241/openclaw-dmr:latest) is based on the shell sandbox template with three additions: Node.js 22, OpenClaw, and a tiny networking bridge.

The bridge is needed because Docker Model Runner runs on your host and binds to localhost:12434. But localhost inside the sandbox means the sandbox itself, not your host. The sandbox does have an HTTP proxy, at host.docker.internal:3128, that can reach host services, and we allow it to reach localhost with docker sandbox network proxy --allow-host localhost.

The problem is OpenClaw is Node.js, and Node.js doesn’t respect HTTP_PROXY environment variables. So we wrote a ~20-line bridge script that OpenClaw connects to at 127.0.0.1:54321, which explicitly forwards requests through the proxy to reach Docker Model Runner on the host:

OpenClaw → bridge (localhost:54321) → proxy (host.docker.internal:3128) → Model Runner (host localhost:12434)

The start-openclaw.sh script starts the bridge, starts OpenClaw’s gateway (with proxy vars cleared so it hits the bridge directly), and runs the TUI.

Build Your Own

Want to customize the image or just see how it works? Here’s the full build process.

1. Create a base sandbox and install OpenClaw

docker sandbox create --name my-openclaw shell .
docker sandbox network proxy my-openclaw --allow-host localhost
docker sandbox run my-openclaw

Now let’s install OpenClaw in the sandbox:

# Install Node 22 (OpenClaw requires it)
npm install -g n && n 22
hash -r

# Install OpenClaw
npm install -g openclaw@latest

# Run initial setup
openclaw setup

2. Create the Model Runner bridge

This is the magic piece — a tiny Node.js server that forwards requests through the sandbox proxy to Docker Model Runner on your host:

cat > ~/model-runner-bridge.js << 'EOF'
const http = require("http");
const { URL } = require("url");

const PROXY = new URL(process.env.HTTP_PROXY || "http://host.docker.internal:3128");
const TARGET = "localhost:12434";

http.createServer((req, res) => {
  const proxyReq = http.request({
    hostname: PROXY.hostname,
    port: PROXY.port,
    path: "http://" + TARGET + req.url,
    method: req.method,
    headers: { ...req.headers, host: TARGET }
  }, proxyRes => {
    res.writeHead(proxyRes.statusCode, proxyRes.headers);
    proxyRes.pipe(res);
  });
  proxyReq.on("error", e => { res.writeHead(502); res.end(e.message); });
  req.pipe(proxyReq);
}).listen(54321, "127.0.0.1");
EOF

3. Configure OpenClaw to use Docker Model Runner

Now merge the Docker Model Runner provider into OpenClaw’s config:

python3 -c "
import json
p = '$HOME/.openclaw/openclaw.json'
with open(p) as f: cfg = json.load(f)
cfg['models'] = cfg.get('models', {})
cfg['models']['mode'] = 'merge'
cfg['models']['providers'] = cfg['models'].get('providers', {})
cfg['models']['providers']['docker-model-runner'] = {
    'baseUrl': 'http://127.0.0.1:54321/engines/llama.cpp/v1',
    'apiKey': 'not-needed',
    'api': 'openai-completions',
    'models': [{
        'id': 'ai/qwen2.5:7B-Q4_K_M',
        'name': 'Qwen 2.5 7B (Docker Model Runner)',
        'reasoning': False, 'input': ['text'],
        'cost': {'input': 0, 'output': 0, 'cacheRead': 0, 'cacheWrite': 0},
        'contextWindow': 32768, 'maxTokens': 8192
    }]
}
cfg['agents'] = cfg.get('agents', {})
cfg['agents']['defaults'] = cfg['agents'].get('defaults', {})
cfg['agents']['defaults']['model'] = {'primary': 'docker-model-runner/ai/qwen2.5:7B-Q4_K_M'}
cfg['gateway'] = {'mode': 'local'}
with open(p, 'w') as f: json.dump(cfg, f, indent=2)
"

4. Save and share

Exit the sandbox and save it as a reusable image:

docker sandbox save my-openclaw my-openclaw-image:latest

Push it to a registry so anyone can use it:

docker tag my-openclaw-image:latest yourname/my-openclaw:latest
docker push yourname/my-openclaw:latest

Anyone with Docker Desktop (with the modern sandboxes includes) can spin up the same environment with:

docker sandbox create --name openclaw -t yourname/my-openclaw:latest shell .

What’s next

Docker Sandboxes make it easy to run any AI coding agent in an isolated, reproducible environment. With Docker Model Runner, you get a fully local AI coding setup: no cloud dependencies, no API costs, and complete privacy.

Try it out and let us know what you think.

State of Agentic AI Report: Key Findings

Yiwen Xu — Fri, 20 Feb 2026 17:18:29 +0000

Based on Docker’s State of Agentic AI report, a global survey of more than 800 developers, platform engineers, and technology decision-makers, this blog summarizes key findings of what’s really happening as agentic AI scales within organizations. Drawing on insights from decision-makers and purchase influencers worldwide, we’ll give you a preview on not only where teams are seeing early wins but also what’s still missing to move from experimentation to enterprise-grade adoption.

Rapid adoption, early maturity

60% of organizations already have AI agents in production, and 94% view building agents as a strategic priority, but most deployments remain internal and focused on productivity and operational efficiency.

Security and complexity are the top barriers

40% of respondents cite security as the #1 challenge in scaling agentic AI, with 45% struggling to ensure tools are secure and enterprise-ready. Technical complexity compounds the challenge. One in three organizations (33%) report orchestration difficulties as multi-model and multi-cloud environments proliferate (79% of organizations run agents across two or more environments).

MCP shows promise but isn’t enterprise-ready

85% of teams are familiar with the Model Context Protocol (MCP), yet most report significant security, configuration, and manageability issues that prevent production-scale deployment.

Want the full picture? Download the latest State of Agentic AI report to explore deeper insights and practical recommendations for scaling agentic AI in your organization.

Fear of vendor lock-in is real

Enterprises worry about dependencies in core agent and agentic infrastructure layers such as model hosting, LLM providers, and even cloud platforms. Seventy-six percent of global respondents report active concerns about vendor lock-in, rising to 88% in France, 83%
in Japan, and 82% in the UK.

Containerization remains foundational

94% use containers for agent development or production, and 98% follow the same cloud-native workflows as traditional software, establishing containers as the proven substrate for agentic AI infrastructure.

Long-term outlook

Rather than a “year of the agents,” the data points to a decade-long transformation. Organizations are laying the governance and trust foundations now for scalable, enterprise-grade agent ecosystems.

The path forward

The path forward doesn’t require reinvention so much as consolidation around a trust layer: access to trusted content and components that can be safely discovered and reused; secure-by-default runtimes; standardized orchestration and policy; and portable, auditable packaging.

Agentic AI’s near-term value is already real in internal workflows; unlocking the next wave depends on standardizing how we secure, orchestrate, and ship agents. Teams that invest now in this trust layer, on top of the container foundations they already know, will be first to scale agents from local productivity to durable, enterprise-wide outcomes.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise.

Learn more:

Get your copy of the latest State of Agentic AI report!
Learn more about Docker’s AI solutions
Subscribe to our Developer Newsletter to get the latest news