Products – Docker

Building AI Teams: How Docker Sandboxes and Docker Agent Transform Development

Jennifer Kohl — Wed, 11 Mar 2026 13:00:00 +0000

It’s 11 PM. You’ve got a JIRA ticket open, an IDE with three unsaved files, a browser tab on Stack Overflow, and another on documentation. You’re context-switching between designing UI, writing backend APIs, fixing bugs, and running tests. You’re wearing all the hats, product manager, designer, engineer, QA specialist, and it’s exhausting.

What if instead of doing it all yourself, you could describe the goal and have a team of specialized AI agents handle it for you?

One agent breaks down requirements, another designs the interface, a third builds the backend, a fourth tests it, and a fifth fixes any issues. Each agent focuses on what it does best, working together autonomously while you sip your coffee.That’s not sci-fi, it’s what Agent + Docker Sandboxes delivers today.

What is Docker Agent?

Docker Agent is an open source tool for building teams of specialized AI agents. Instead of prompting one general-purpose model to do everything, you define agents with specific roles that collaborate to solve complex problems.

Here’s a typical dev-team configuration:

agents:
root:
model: openai/gpt-5
description: Product Manager - Leads the development team and coordinates iterations
instruction: |
Break user requirements into small iterations. Coordinate designer → frontend → QA.
- Define feature and acceptance criteria
- Ensure iterations deliver complete, testable features
- Prioritize based on value and dependencies
sub_agents: [designer, awesome_engineer, qa, fixer_engineer]
toolsets:
- type: filesystem
- type: think
- type: todo
- type: memory
path: dev_memory.db

designer:
model: openai/gpt-5
description: UI/UX Designer - Creates user interface designs and wireframes
instruction: |
Create wireframes and mockups for features. Ensure responsive, accessible designs.
- Use consistent patterns and modern principles
- Specify colors, fonts, interactions, and mobile layout
toolsets:
- type: filesystem
- type: think
- type: memory
path: dev_memory.db

qa:
model: openai
description: QA Specialist - Analyzes errors, stack traces, and code to identify bugs
instruction: |
Analyze error logs, stack traces, and code to find bugs. Explain what's wrong and why it's happening.
- Review test results, error messages, and stack traces
.......

awesome_engineer:
model: openai
description: Awesome Engineer - Implements user interfaces based on designs
instruction: |
Implement responsive, accessible UI from designs. Build backend APIs and integrate.
..........

fixer_engineer:
model: openai
description: Test Integration Engineer - Fixes test failures and integration issues
instruction: |
Fix test failures and integration issues reported by QA.
- Review bug reports from QA

The root agent acts as product manager, coordinating the team. When a user requests a feature, root delegates to designer for wireframes, then awesome_engineer for implementation, qa for testing, and fixer_engineer for bug fixes. Each agent uses its own model, has its own context, and accesses tools like filesystem, shell, memory, and MCP servers.

Agent Configuration

Each agent is defined with five key attributes:

model: The AI model to use (e.g., openai/gpt-5, anthropic/claude-sonnet-4-5). Different agents can use different models optimized for their tasks.
description: A concise summary of the agent’s role. This helps Docker Agent understand when to delegate tasks to this agent.
instruction: Detailed guidance on what the agent should do. Includes workflows, constraints, and domain-specific knowledge.
sub_agents: A list of agents this agent can delegate work to. This creates the team hierarchy.
toolsets: The tools available to the agent. Built-in options include filesystem (read/write files), shell (run commands), think (reasoning), todo (task tracking), memory (persistent storage), and mcp (external tool connections).

This configuration system gives you fine-grained control over each agent’s capabilities and how they coordinate with each other.

Why Agent Teams Matter

One agent handling complex work means constant context-switching. Split the work across focused agents instead, each handles what it’s best at. Docker Agent manages the coordination.

The benefits are clear:

Specialization: Each agent is optimized for its role (design vs. coding vs. debugging)
Parallel execution: Multiple agents can work on different aspects simultaneously
Better outcomes: Focused agents produce higher quality work in their domain
Maintainability: Clear separation of concerns makes teams easier to debug and iterate

The Problem: Running AI Agents Safely

Agent teams are powerful, but they come with a serious security concern. These agents need to:

Read and write files on your system
Execute shell commands (npm install, git commit, etc.)
Access external APIs and tools
Run potentially untrusted code

Giving AI agents full access to your development machine is risky. A misconfigured agent could delete files, leak secrets, or run malicious commands. You need isolation, agents should be powerful but contained.

Traditional virtual machines are too heavy. Chroot jails are fragile. You need something that provides:

Strong isolation from your host machine
Workspace access so agents can read your project files
Familiar experience with the same paths and tools
Easy setup without complex networking or configuration

Docker Sandboxes: The Secure Foundation

Docker Sandboxes solves this by providing isolated environments for running AI agents. As of Docker Desktop 4.60+, sandboxes run inside dedicated microVMs, providing a hard security boundary beyond traditional container isolation. When you run docker sandbox run , Docker creates an isolated microVM workspace that:

Mounts your project directory at the same absolute path (on Linux and macOS)
Preserves your Git configuration for proper commit attribution
Does not inherit environment variables from your current shell session
Gives agents full autonomy without compromising your host
Provides network isolation with configurable allow/deny lists

Docker Sandboxes now natively supports six agent types: Claude Code, Gemini, Codex, Copilot, Agent, and Kiro (all experimental). Agent can be launched directly as a sandbox agent:

# Run Agent natively in a sandbox
docker sandbox create agent ~/path/to/workspace
docker sandbox run agent ~/path/to/workspace

Or, for more control, use a detached sandbox:

# Create a sandbox
docker sandbox run -d --name my-agent-sandbox claude

# Copy agent into the sandbox
docker cp /usr/bin/agent :/usr/bin/agent

# Run your agent team
docker exec -it  bash -c "cd /path/to/workspace && agent run dev-team.yaml"

Your workspace /Users/alice/projects/myapp on the host is also /Users/alice/projects/myapp inside the microVM. Error messages, scripts with hard-coded paths, and relative imports all work as expected. But the agent is contained in its own microVM, it can’t access files outside the mounted workspace, and any damage it causes is limited to the sandbox.

Why Docker Sandboxes Matter

The combination of agents and Docker Sandboxes gives you something powerful:

Full agent autonomy: Agents can install packages, run tests, make commits, and use tools without constant human oversight
Complete safety: Even if an agent makes a mistake, it’s contained within the microVM sandbox
Hard security boundary: MicroVM isolation goes beyond containers, each sandbox runs in its own virtual machine
Network control: Allow/deny lists let you restrict which external services agents can access
Familiar experience: Same paths, same tools, same workflow as working directly on your machine
Workspace persistence: Changes sync between host and microVM, so your work is always available

Here’s how the workflow looks in practice:

User requests a feature to the root agent: “Create a bank app with Gradio”
Root creates a todo list and delegates to the designer
Designer generates wireframes and UI specifications
Awesome_engineer implements the code, running pip install gradio and python app/main.py
QA runs tests, finds bugs, and reports them
Fixer_engineer resolves the issues
Root confirms all tests pass and marks the feature complete

All of this happens autonomously inside a sandboxed environment. The agents can install dependencies, modify files, and execute commands, but they’re isolated from your host machine.

Try It Yourself

Let’s walk through setting up a simple agent team in a Docker Sandbox.

Prerequisites

Docker Desktop 4.60+ with sandbox support (microVM-based isolation)
agent (included in Docker Desktop 4.49+)
API key for your model provider (Anthropic, OpenAI, or Google)

Step 1: Create Your Agent Team

Save this configuration as dev-team.yaml:

models:
 openai:
   provider: openai
   model: gpt-5

agents:
 root:
   model: openai
   description: Product Manager - Leads the development team
   instruction: |
     Break user requirements into small iterations. Coordinate designer → frontend → QA.
   sub_agents: [designer, awesome_engineer, qa]
   toolsets:
     - type: filesystem
     - type: think
     - type: todo

 designer:
   model: openai
   description: UI/UX Designer - Creates designs and wireframes
   instruction: |
     Create wireframes and mockups for features. Ensure responsive designs.
   toolsets:
     - type: filesystem
     - type: think

 awesome_engineer:
   model: openai
   description: Developer - Implements features
   instruction: |
     Build features based on designs. Write clean, tested code.
   toolsets:
     - type: filesystem
     - type: shell
     - type: think

 qa:
   model: openai
   description: QA Specialist - Tests and identifies bugs
   instruction: |
     Test features and identify bugs. Report issues to fixer.
   toolsets:
     - type: filesystem
     - type: think

Step 2: Create a Docker Sandbox

The simplest approach is to use agent as a native sandbox agent:

# Run agent directly in a sandbox (experimental)
docker sandbox run agent ~/path/to/your/workspace

Alternatively, use a detached Claude sandbox for more control:

# Start a detached sandbox
docker sandbox run -d --name my-dev-sandbox claude

# Copy agent into the sandbox
which agent  # Find the path on your host
docker cp $(which agent) $(docker sandbox ls --filter name=my-dev-sandbox -q):/usr/bin/agent

Step 3: Set Environment Variables

# Run agent with your API key (passed inline since export doesn't persist across exec calls)
docker exec -it -e OPENAI_API_KEY=your_key_here my-dev-sandbox bash

Step 4: Run Your Agent Team

# Mount your workspace and run agent
docker exec -it my-dev-sandbox bash -c "cd /path/to/your/workspace && agent run dev-team.yaml"

Now you can describe what you want to build, and your agent team will handle the rest:

User: Create a bank application using Python. The bank app should have basic functionality like account savings, show balance, withdraw, add money, etc. Build the UI using Gradio. Create a directory called app, and inside of it, create all of the files needed by the project

Agent (root): I'll break this down into iterations and coordinate with the team...

Watch as the designer creates wireframes, the engineer builds the Gradio app, and QA tests it, all autonomously in a secure sandbox.

Final result from a one shot prompt

Step 5: Clean Up

When you’re done:

# Remove the sandbox
docker sandbox rm my-dev-sandbox

Docker enforces one sandbox per workspace. Running docker sandbox run in the same directory reuses the existing container. To change configuration, remove and recreate the sandbox.

Current Limitations

Docker Sandboxes and Docker Agent are evolving rapidly. Here are a few things to know:

Docker Sandboxes now supports six agent types natively: Claude Code, Gemini, Codex, Copilot, agent, and Kiro. All are experimental and breaking changes may occur between Docker Desktop versions.
Custom Shell that doesn’t include a pre-installed agent binary. Instead, it provides a clean environment where you can install and configure any agent or tool
MicroVM sandboxes require macOS or Windows. Linux users can use legacy container-based sandboxes with Docker Desktop 4.57+
API keys may still need manual configuration depending on the agent type
Sandbox templates are optimized for certain workflows; custom setups may require additional configuration

Why This Matters Now

AI agents are becoming more capable, but they need infrastructure to run safely and effectively. The combination of agent and Docker Sandboxes addresses this by:

Feature	Traditional Approach	With agent + Docker Sandboxes
Autonomy	Limited – requires constant oversight	High – agents work independently
Security	Risky – agents have host access	Isolated – agents run in microVMs
Specialization	One model does everything	Multiple agents with focused roles
Reproducibility	Inconsistent across machines	MicroVM-isolated, version-controlled
Scalability	Manual coordination	Automated team orchestration

This isn’t just about convenience, it’s about enabling AI agents to do real work in production environments, with the safety guarantees that developers expect.

What’s Next

Explore the Docker Agent documentation to build your own agent teams
Check out Docker Sandboxes for advanced configurations
Browse example agent configurations in the agent repository
Integrate agent with your editor or use agents as tools in MCP clients

Conclusion

We’re moving from “prompting AI to write code” to “orchestrating AI teams to build software.” agent gives you the team structure; Docker Sandboxes provides the secure foundation.

The days of wearing every hat as a solo developer are numbered. With specialized AI agents working in isolated containers, you can focus on what matters, designing great software, while your AI team handles the implementation, testing, and iteration.

Try it out. Build your own agent team. Run it in a Docker Sandbox. See what happens when you have a development team at your fingertips, ready to ship features while you grab lunch.

Celebrating Women in AI: 3 Questions with Cecilia Liu on Leading Docker’s MCP Strategy

Yiwen Xu — Fri, 06 Mar 2026 12:59:30 +0000

To celebrate International Women’s Day, we sat down with Cecilia Liu, Senior Product Manager at Docker, for three questions about the vision and strategy behind Docker’s MCP solutions. From shaping product direction to driving AI innovation, Cecilia plays a key role in defining how Docker enables secure, scalable AI tooling.

Cecilia leads product management for Docker’s MCP Catalog and Toolkit, our solution for running MCP servers securely and at scale through containerization. She drives Docker’s AI strategy across both enterprise and developer ecosystems, helping organizations deploy MCP infrastructure with confidence while empowering individual developers to seamlessly discover, integrate, and use MCP in their workflows. With a technical background in AI frameworks and an MBA from NYU Stern, Cecilia bridges the worlds of AI infrastructure and developer tools, turning complex challenges into practical, developer-first solutions.

What products are you responsible for?

I own Docker’s MCP solution. At its core, it’s about solving the problems that anyone working with MCP runs into: how do you find the right MCP servers, how do you actually use them without a steep learning curve, and how do you deploy and manage them reliably across a team or organization.

How does Docker’s MCP solution benefit developers and enterprise customers?

Dev productivity is where my heart is. I want to build something that meaningfully helps developers at every stage of their cycle — and that’s exactly how I think about Docker’s MCP solution.

For end-user developers and vibe coders, the goal is simple: you shouldn’t need to understand the underlying infrastructure to get value from MCP. As long as you’re working with AI, we make it easy to discover, configure, and start using MCP servers without any of the usual setup headaches. One thing I kept hearing in user feedback was that people couldn’t even tell if their setup was actually working. That pushed us to ship in-product setup instructions that walk you through not just configuration, but how to verify everything is running correctly. It sounds small, but it made a real difference.

For developers building MCP servers and integrating them into agents, I’m focused on giving them the right creation and testing tools so they can ship faster and with more confidence. That’s a big part of where we’re headed.

And for security and enterprise admins, we’re solving real deployment pain, making it faster and cheaper to roll out and manage MCP across an entire organization. Custom catalogs, role-based access controls, audit logging, policy enforcement. The goal is to give teams the visibility and control they need to adopt AI tooling confidently at scale.

Customers love us for all of the above, and there’s one more thing that ties it together: the security that comes built-in with Docker. That trust doesn’t happen overnight, and it’s something we take seriously across everything we ship.

What are you excited about when it comes to the future of MCP?

What excites me most is honestly the pace of change itself. The AI landscape is shifting constantly, and with every new tool that makes AI more powerful, there’s a whole new set of developers who need a way to actually use it productively. That’s a massive opportunity.

MCP is where that’s happening right now, and the adoption we’re seeing tells me the need is real. But what gets me out of bed is knowing the problems we’re solving: discoverability, usability, deployment. They are all going to matter just as much for whatever comes next. We’re not just building for today’s tools. We’re building the foundation that developers will reach for every time something new emerges.

Cecilia is speaking about scaling MCP for enterprises at the MCP Dev Summit in NYC on 3rd of April, 2026. If you’re attending, be sure to stop by Docker’s booth (D/P9).

Learn more

Explore Docker’s MCP Catalog and Toolkit on our website.
Dive into our documentation to get started quickly.
Ready to go hands-on? Open Docker Desktop or the CLI and start using MCP to streamline and automate your development workflows.

Announcing Docker Hardened System Packages

Vishrut Iyengar — Tue, 03 Mar 2026 20:30:00 +0000

Your Package Manager, Now with a Security Upgrade

Last December, we made Docker Hardened Images (DHI) free because we believe secure, minimal, production-ready images should be the default. Every developer deserves strong security at no cost. It should not be complicated or locked behind a paywall.

From the start, flexibility mattered just as much as security. Unlike opaque, proprietary hardened alternatives, DHI is built on trusted open source foundations like Alpine and Debian. That gives teams true multi-distro flexibility without forcing change. If you run Alpine, stay on Alpine. If Debian is your standard, keep it. DHI strengthens what you already use. It does not require you to replace it.

Today, we are extending that philosophy beyond images.

With Docker Hardened System Packages, we’re driving security deeper into the stack. Every package is built on the same secure supply chain foundation: source-built and patched by Docker, cryptographically attested, and backed by an SLA.

The best part? Multi-distro support by design.

The result is consistent, end-to-end hardening across environments with the production-grade reliability teams expect.

Since introducing DHI Community (our OSS tier), interest has surged. The DHI catalog has expanded from more than 1,000 to over 2,000 hardened container images. Its openness and ability to meet teams where they are have accelerated adoption across the ecosystem. Companies of all sizes, along with a growing number of open source projects, are making DHI their standard for secure containers.

Just consider this short selection of examples:

n8n.io has moved its production infrastructure to DHI, they share why and how in this recent webinar
Medplum, an open-source electronic health records platform (managing data of 20+ million patients) has now standardized to DHI
Adobe uses DHI because of great alignment with its security posture and developer tooling compatibility
Attentive co-authored this e-book with Docker on helping others move from POC to production with DHI

Docker Hardened System Packages: Going deeper into the container

From day one, Docker has built and secured the most critical operating system packages to deliver on our CVE remediation commitments. That’s how we continuously maintain near-zero CVEs in DHI images. At the same time, we recognize that many teams extend our minimal base images with additional upstream packages to meet their specific requirements. To support that reality, we are expanding our catalog with more than 8,000 hardened Alpine packages, with Debian coverage coming soon.

This expansion gives teams greater flexibility without weakening their security posture. You can start with a DHI base image and tailor it to your needs while maintaining the same hardened supply chain guarantees. There is no need to switch distros to get continuous patching, verified builds through a SLSA Build Level 3 pipeline, and enterprise-grade assurances. Your teams can continue working with the Alpine and Debian environments they know, now backed by Docker’s secure build system from base image to system package.

Why this matters for your security posture:

Complete provenance chain. Every package is built from source by Docker, attested, and cryptographically signed. From base image to final container, your provenance stays intact.

Faster vulnerability remediation. When a vulnerability is identified, we patch it at the package level and publish it to the catalog. Not image by image. That means fixes move faster and remediation scales across your entire container fleet.

Extending the near-zero CVE guarantee. DHI images maintain near-zero. Hardened System Packages extend that guarantee more broadly across the software ecosystem, covering packages you add during customization.

Use hardened packages with your containers. DHI Enterprise customers get access to the secure packages repository, making it possible to use Hardened System Packages beyond DHI images. Integrate them into your own pipelines and across Alpine and Debian workloads throughout your environment.

The work we’re doing on our users’ behalf: Maintaining thousands of packages is continuous work. We monitor upstream projects, backport patches, test compatibility, rebuild when dependencies change, and generate attestations for every release. Alpine alone accounts for more than 8,000 packages today, soon approaching 10,000, with Debian next.

Making enterprise-grade security even more accessible

We’re also simplifying how teams access DHI. The full catalog of thousands of open-source images under Apache 2.0 now has a new name: DHI Community. There are no licensing changes, this is just a name change, so all of that free goodness has an easy name to refer to.

For teams that need SLA-backed CVE remediation and customization capabilities at a more accessible price point, we’re announcing a new pricing tier today, DHI Select. This new tier brings enterprise-grade security at a price of $5,000 per repo.

For organizations with more demanding requirements, including unlimited customizations, access to the Hardened System Packages repo, and extended lifecycle coverage for up to five years after upstream EOL, DHI Enterprise and the DHI Extended Lifecycle Support add-on remain available.

More options means more teams can adopt the right level of security for where they are today.

Build with the standard that’s redefining container security

Docker’s momentum in securing the software supply chain is accelerating. We’re bringing security to more layers of the stack, making it easier for teams to build securely by default, for open source-based containers as well as your company’s internally-developed software. We’re also pushing toward a one-day (or shorter) timeline for critical CVE fixes. Each step builds on the last, moving us closer to end-to-end supply chain security for all of your critical applications.

Get started:

Join the n8n webinar to see how they’re running production workloads on DHI
Start your free trial and get access to the full DHI catalog, now with Docker Hardened System Packages

Docker Model Runner Brings vLLM to macOS with Apple Silicon

Yiwen Xu — Thu, 26 Feb 2026 14:42:57 +0000

vLLM has quickly become the go-to inference engine for developers who need high-throughput LLM serving. We brought vLLM to Docker Model Runner for NVIDIA GPUs on Linux, then extended it to Windows via WSL2.

That changes today. Docker Model Runner now supports vllm-metal, a new backend that brings vLLM inference to macOS using Apple Silicon’s Metal GPU. If you have a Mac with an M-series chip, you can now run MLX models through vLLM with the same OpenAI-compatible API, same Anthropic-compatible API for tools like Claude Code, and all in one, the same Docker workflow.

What is vllm-metal?

vllm-metal is a plugin for vLLM that brings high-performance LLM inference to Apple Silicon. Developed in collaboration between Docker and the vLLM project, it unifies MLX, the Apple’s machine learning framework, and PyTorch under a single compute pathway, plugging directly into vLLM’s existing engine, scheduler, and OpenAI-compatible API server.

The architecture is layered: vLLM’s core (engine, scheduler, tokenizer, API) stays unchanged on top. A plugin layer consisting of MetalPlatform, MetalWorker, and MetalModelRunner handles the Apple Silicon specifics. Underneath, MLX drives the actual inference while PyTorch handles model loading and weight conversion. The whole stack runs on Metal, Apple’s GPU framework.

+-------------------------------------------------------------+
|                          vLLM Core                          |
|        Engine | Scheduler | API | Tokenizers                |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   vllm_metal Plugin Layer                   |
|   +-----------+  +-----------+  +------------------------+  |
|   | Platform  |  | Worker    |  | ModelRunner            |  |
|   +-----------+  +-----------+  +------------------------+  |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   Unified Compute Backend                   |
|   +------------------+    +----------------------------+    |
|   | MLX (Primary)    |    | PyTorch (Interop)          |    |
|   | - SDPA           |    | - HF Loading               |    |
|   | - RMSNorm        |    | - Weight Conversion        |    |
|   | - RoPE           |    | - Tensor Bridge            |    |
|   | - Cache Ops      |    |                            |    |
|   +------------------+    +----------------------------+    |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                       Metal GPU Layer                       |
|           Apple Silicon Unified Memory Architecture         |
+-------------------------------------------------------------+

Figure 1: High-level architecture diagram of vllm-metal. Credit: vllm-metal

What makes this particularly effective on Apple Silicon is unified memory. Unlike discrete GPUs where data must be copied between CPU and GPU memory, Apple Silicon shares a single memory pool. vllm-metal exploits this with zero-copy tensor operations. Combined with paged attention for efficient KV cache management and Grouped-Query Attention support, this means you can serve longer sequences with less memory waste.

vllm-metal runs MLX models published by the mlx-community on Hugging Face. These models are built specifically for the MLX framework and take full advantage of Metal GPU acceleration. Docker Model Runner automatically routes MLX models to vllm-metal when the backend is installed, falling back to the built-in MLX backend otherwise.

How vllm-metal works

vllm-metal runs natively on the host. This is necessary because Metal GPU access requires direct hardware access and there is no GPU passthrough for Metal in containers.

When you install the backend, Docker Model Runner:

Pulls a Docker image from Hub that contains a self-contained Python 3.12 environment with vllm-metal and all dependencies pre-packaged.
Extracts it to `~/.docker/model-runner/vllm-metal/`.
Verifies the installation by importing the `vllm_metal` module.

When a request comes in for a compatible model, the Docker Model Runner’s scheduler starts a vllm-metal server process that communicates over TCP, serving the standard OpenAI API. The model is loaded from Docker’s shared model store, which contains all the models you pull with `docker model pull`.

Which models work with vllm-metal?

vllm-metal works with safetensors models in MLX format. The mlx-community on Hugging Face maintains a large collection of quantized models optimized for Apple Silicon. Some examples you can try:

vLLM everywhere with Docker Model Runner

With vllm-metal, Docker Model Runner now supports vLLM across the three major platforms:

Platform	Backend	GPU
Linux	vllm	NVIDIA (CUDA)
Windows (WSL2)	vllm	NVIDIA (CUDA)
macOS	vllm-metal	Apple Silicon (Metal)

The same docker model commands work regardless of platform. Pull a model, run it. Docker Model Runner picks the right backend for your platform.

Get started

Update to Docker Desktop 4.62 or later for Mac, and install the backend:

docker model install-runner --backend vllm-metal

Check out the Docker Model Runner documentation to learn more. For contributions, feedback, and bug reports, visit the docker/model-runner repository on GitHub.

Giving Back: vllm-metal is Now Open Source

At Docker, we believe that the best way to accelerate AI development is to build in the open. That is why we are proud to announce that Docker has contributed the vllm-metal project to the vLLM community. Originally developed by Docker engineers to power Model Runner on macOS, this project now lives under the vLLM GitHub organization. This ensures that every developer in the ecosystem can benefit from and contribute to high-performance inference on Apple Silicon. The project also has had significant contributions by Lik Xun Yuan, Ricky Chen and Ranran Haoran Zhang.

The $599 AI Development Rig

For a long time, high-throughput vLLM development was gated behind a significant GPU cost. To get started, you typically need a dedicated Linux box with an RTX 4090 ($1,700+) or enterprise-grade A100/H100 cards ($10,000+).

vllm-metal changes the math

Now, a base $599 Mac Mini with an M4 chip becomes a viable vLLM development environment. Because Apple Silicon uses Unified Memory, that 16GB (or upgraded 32GB/64GB) of RAM is directly accessible by the GPU. This allows you to:

Develop & Test Locally: Build your vLLM-based applications on the same machine you use for coding.
Production-Mirroring: Use the exact same OpenAI-compatible API on your Mac Mini as you would on an H100 cluster in production.
Energy Efficiency: Run inference at a fraction of the power consumption (and heat) of a discrete GPU rig.

How does vllm-metal compare to llama.cpp?

We benchmarked both backends using Llama 3.2 1B Instruct with comparable 4-bit quantization, served through Docker Model Runner on Apple Silicon.

	llama.cpp	vLLM-Metal
Model	unsloth/Llama-3.2-1B-Instruct-GGUF:Q4_0	mlx-community/llama-3.2-1b-instruct-4bit
Format	GGUF (Q4_0)	Safetensors (MLX 4-bit)

Throughput (tokens/sec, wall-clock)

max_tokens	llama.cpp	vLLM-Metal	speedup
128	333.3	251.5	1.3x
512	345.1	279.0	1.3x
1024	338.5	275.4	1.2x
2048	339.1	279.5	1.2x

Each configuration was run 3 times across 3 different prompts (9 total requests per data point).

Throughput is measured as completion_tokens / wall_clock_time, applied consistently to both backends.

Key observations:

llama.cpp is consistently ~1.2x faster than vLLM-Metal across all output lengths.
llama.cpp throughput is remarkably stable (~333-345 tok/s regardless of max_tokens), while vLLM-Metal shows more variance between individual runs (134-343 tok/s).
Both backends scale well. Neither backend shows significant degradation as output length increases.
Quantization methods differ (GGUF Q4_0 vs MLX 4-bit), so this benchmarks the full stack, engine + quantization, rather than the engine alone.

The benchmark script used for these results is available as a GitHub Gist.

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. To get involved:

Star the repository: Show your support by starring the Docker Model Runner repo.
Contribute your ideas: Create an issue or submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends and colleagues who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn More

Read the companion post: OpenCode with Docker Model Runner for Private AI Coding
Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo
Get started with a simple hello GenAI application

Open WebUI + Docker Model Runner: Self-Hosted Models, Zero Configuration

Yiwen Xu — Wed, 25 Feb 2026 14:37:33 +0000

We’re excited to share a seamless new integration between Docker Model Runner (DMR) and Open WebUI, bringing together two open source projects to make working with self-hosted models easier than ever.

With this update, Open WebUI automatically detects and connects to Docker Model Runner running at localhost:12434. If Docker Model Runner is enabled, Open WebUI uses it out of the box, no additional configuration required.

The result: a fully Docker-managed, self-hosted model experience running in minutes.

Note for Docker Desktop users:
If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled. Open WebUI connects to Docker Model Runner over HTTP, which requires the TCP port to be exposed:

docker desktop enable model-runner --tcp

Better Together: Docker Model Runner and Open WebUI

Docker Model Runner and Open WebUI come from the same open source mindset. They’re built for developers who want control over where their models run and how their systems are put together, whether that’s on a laptop for quick experimentation or on a dedicated GPU host with more horsepower behind it.

Docker Model Runner focuses on the runtime layer: a Docker-native way to run and manage self-hosted models using the tooling developers already rely on. Open WebUI focuses on the experience: a clean, extensible interface that makes those models accessible and useful.

Now, the two connect automatically.

No manual endpoint configuration. No extra flags.

That’s the kind of integration open source does best, separate projects evolving independently, but designed well enough to fit together naturally.

Zero-Config Setup

If Docker Model Runner is enabled, getting started with Open WebUI is as simple as:

docker run -p 3000:8080 openwebui/open-webui

That’s it.

Open WebUI will automatically connect to Docker Model Runner and begin using your self-hosted models, no environment variables, no manual endpoint configuration, no extra flags.

Visit: http://localhost:3000 and create your account:

And you’re ready to interact with your models through a modern web interface:

Open by design

One of the nice things about this integration is that it didn’t require special coordination or proprietary hooks. Docker Model Runner and Open WebUI are both open source projects with clear boundaries and well-defined interfaces. They were built independently, and they still fit together cleanly.

Docker Model Runner focuses on running and managing models in a way that feels natural to anyone already using Docker.

Open WebUI focuses on making those models usable. It provides the interface layer, conversation management, and extensibility you’d expect from a modern web UI.

Because both projects are open, there’s no hidden contract between them. You can see how the connection works. You can modify it if you need to. You can deploy the pieces separately or together. The integration isn’t a black box, it’s just software speaking a clear interface.

Works with Your Setup

One of the practical benefits of this approach is flexibility.

Docker Model Runner doesn’t dictate where your models run. They might live on your laptop during development, on a more powerful remote machine, or inside a controlled internal environment. As long as Docker Model Runner is reachable, Open WebUI can connect to it.

That separation between runtime and interface is intentional. The UI doesn’t need to know how the model is provisioned. The runtime doesn’t need to know how the UI is presented. Each layer does its job.

With this integration, that boundary becomes almost invisible. Start the container, open your browser, and everything lines up.

You decide where the models run. Open WebUI simply meets them there.

Summary

Open WebUI and Docker Model Runner make self-hosted AI simple, flexible and fully under your control. Docker powers the runtime. Open WebUI delivers a modern interface on top.

With automatic detection and zero configuration, you can go from enabling Docker Model Runner to interact with your models in minutes.

Both projects are open source and built with clear boundaries, so you can run models wherever you choose and deploy the pieces together or separately. We can’t wait to see what you build next!

How You Can Get Involved

The strength of Docker Model Runner lies in its community and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.
Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!
Get started with Docker Model Runner with a simple hello GenAI application

Gordon (Beta): Docker’s AI Agent Just Got an Update

Srini Sekaran — Mon, 23 Feb 2026 14:13:00 +0000

AI agents are moving from demos to daily workflows. They write code, run commands, and complete multi-step tasks without constant hand-holding. But general-purpose agents don’t know Docker. They don’t understand your containers, your images, or your specific setup.

Gordon does. Just run docker ai in your terminal or try it in Docker Desktop.

Available today in Docker Desktop 4.61, still in beta, Gordon is an AI agent purpose-built for Docker. It has shell access, Docker CLI access, your filesystem, and deep knowledge of Docker best practices. Point it at a problem, approve its actions, and watch it work.

Figure 1: docker ai command launching Gordon in terminal interface

Figure 2: Gordon in Docker Desktop sidebar

Why Docker Needs Its Own Agent

When your container exits with code 137, Claude or ChatGPT will explain what OOM means. Gordon checks your container’s memory limit, inspects the logs, identifies the memory-hungry process, and proposes a fix. One approval, and it’s done.

When you need to containerize a Next.js app, Copilot might suggest a Dockerfile. Gordon examines your project structure, detects your dependencies, generates a production-ready Dockerfile with multi-stage builds, creates docker-compose.yml with the right services, and sets up your environment configs.

The difference is context and execution. Gordon knows what’s running on your machine. It can read your Docker state, access your filesystem, and take action. It’s not guessing – it’s working with your actual environment.

What Gordon Does

Debug and fix – Container won’t start. Service is unhealthy. Something is consuming all the memory. Gordon inspects logs, checks container status, identifies root cause, and proposes fixes. You approve, it executes.

Build and containerize – Take this application and make it run in Docker. Gordon examines your project, generates production-ready Dockerfiles with multi-stage builds, creates docker-compose.yml with the right services, handles environment configs and dependencies.

Execute and manage – Clean up disk space. Stop all containers. Pull and run specific images. Routine Docker operations should be conversational, not a trip to the docs.

Develop and optimize – Add health checks. Implement multi-stage builds. Apply security best practices. Reduce image sizes. Make existing Docker setups production-ready.

Gordon handles all of it.

Figure 3: Split screen showing Gordon debugging a mongodb container

How Gordon Works

Gordon is built on cagent, Docker’s agent framework included with Docker Desktop, and runs locally within Docker Desktop. It has access to:

Your shell – Can execute commands after approval
Your filesystem – Reads project structure, configs, logs

Docker CLI – Full access to Docker operations
Docker knowledge base – Documentation, best practices, common patterns

You can configure Gordon’s working directory to point to a specific codebase. This gives Gordon full context on your project structure, dependencies, and existing Docker setup.

The permission model is straightforward: Gordon shows you what it wants to do, you approve or reject, then it executes. Every command. Every file update. Every Docker operation. You’re not watching passively – you’re directing an agent that knows Docker inside and out.

Figure 4: Permissions request

Where to Find Gordon

Docker Desktop: Look for the Gordon icon in the left sidebar

CLI: Run docker ai from your terminal

Get started today

Download Docker Desktop 4.61+
Log in with your Docker account
Click the Gordon icon, select a project directory, and ask “Optimize my Dockerfile”
Explore the full documentation in Docker Docs

Gordon is available now in Docker Desktop 4.61 and later

Run OpenClaw Securely in Docker Sandboxes

Jennifer Kohl — Mon, 23 Feb 2026 14:00:00 +0000

Docker Sandboxes is a new primitive in the Docker’s ecosystem that allows you to run AI agents or any other workloads in isolated micro VMs. It provides strong isolation, convenient developer experience and a strong security boundary with a network proxy configurable to deny agents connecting to arbitrary internet hosts. The network proxy will also conveniently inject the API keys, like your ANTHROPIC_API_KEY, or OPENAI_API_KEY in the network proxy so the agent doesn’t have access to them at all and cannot leak them.

In a previous article I showed how Docker Sandboxes lets you install any tools an AI agent might need, like a JDK for Java projects or some custom CLIs, into a container that’s isolated from the host. Today we’re going a step further: we’ll run OpenClaw, an open-source AI coding agent, on a local model via Docker Model Runner.

No API keys, no cloud costs, fully private. And you can do it in 2-ish commands.

Quick Start

Make sure you have Docker Desktop and that Docker Model Runner is enabled (Settings → Docker Model Runner → Enable), then pull a model:

docker model pull ai/gpt-oss:20B-UD-Q4_K_XL

Now create and run the sandbox:

docker sandbox create --name openclaw -t olegselajev241/openclaw-dmr:latest shell .
docker sandbox network proxy openclaw --allow-host localhost
docker sandbox run openclaw

Inside the sandbox:

~/start-openclaw.sh

And that’s it. You’re in OpenClaw’s terminal UI, talking to a local gpt-oss model on your machine. The model runs in Docker Model Runner on your host, and OpenClaw runs completely isolated in the sandbox: it can only read and write files in the workspace you give it, and there’s a network proxy to deny connections to unwanted hosts.

Cloud models work too

The sandbox proxy will automatically inject API keys from your host environment. If you have ANTHROPIC_API_KEY or OPENAI_API_KEY set, OpenClaw can run cloud models, just specify them in OpenClaw settings. The proxy takes care of credential injection, so your keys will never be exposed inside the sandbox.

This means you can use free local models for experimentation, then switch to cloud models for serious work all in the same sandbox. With cloud models you don’t even need to allow to proxy to host’s localhost, so don’t run docker sandbox network proxy openclaw --allow-host localhost.

Choose Your Model

The startup script automatically discovers models available in your Docker Model Runner. List them:

~/start-openclaw.sh list

Use a specific model:

~/start-openclaw.sh ai/qwen2.5:7B-Q4_K_M

Any model you’ve pulled with docker model pull is available.

How it works (a bit technical)

The pre-built image (olegselajev241/openclaw-dmr:latest) is based on the shell sandbox template with three additions: Node.js 22, OpenClaw, and a tiny networking bridge.

The bridge is needed because Docker Model Runner runs on your host and binds to localhost:12434. But localhost inside the sandbox means the sandbox itself, not your host. The sandbox does have an HTTP proxy, at host.docker.internal:3128, that can reach host services, and we allow it to reach localhost with docker sandbox network proxy --allow-host localhost.

The problem is OpenClaw is Node.js, and Node.js doesn’t respect HTTP_PROXY environment variables. So we wrote a ~20-line bridge script that OpenClaw connects to at 127.0.0.1:54321, which explicitly forwards requests through the proxy to reach Docker Model Runner on the host:

OpenClaw → bridge (localhost:54321) → proxy (host.docker.internal:3128) → Model Runner (host localhost:12434)

The start-openclaw.sh script starts the bridge, starts OpenClaw’s gateway (with proxy vars cleared so it hits the bridge directly), and runs the TUI.

Build Your Own

Want to customize the image or just see how it works? Here’s the full build process.

1. Create a base sandbox and install OpenClaw

docker sandbox create --name my-openclaw shell .
docker sandbox network proxy my-openclaw --allow-host localhost
docker sandbox run my-openclaw

Now let’s install OpenClaw in the sandbox:

# Install Node 22 (OpenClaw requires it)
npm install -g n && n 22
hash -r

# Install OpenClaw
npm install -g openclaw@latest

# Run initial setup
openclaw setup

2. Create the Model Runner bridge

This is the magic piece — a tiny Node.js server that forwards requests through the sandbox proxy to Docker Model Runner on your host:

cat > ~/model-runner-bridge.js << 'EOF'
const http = require("http");
const { URL } = require("url");

const PROXY = new URL(process.env.HTTP_PROXY || "http://host.docker.internal:3128");
const TARGET = "localhost:12434";

http.createServer((req, res) => {
  const proxyReq = http.request({
    hostname: PROXY.hostname,
    port: PROXY.port,
    path: "http://" + TARGET + req.url,
    method: req.method,
    headers: { ...req.headers, host: TARGET }
  }, proxyRes => {
    res.writeHead(proxyRes.statusCode, proxyRes.headers);
    proxyRes.pipe(res);
  });
  proxyReq.on("error", e => { res.writeHead(502); res.end(e.message); });
  req.pipe(proxyReq);
}).listen(54321, "127.0.0.1");
EOF

3. Configure OpenClaw to use Docker Model Runner

Now merge the Docker Model Runner provider into OpenClaw’s config:

python3 -c "
import json
p = '$HOME/.openclaw/openclaw.json'
with open(p) as f: cfg = json.load(f)
cfg['models'] = cfg.get('models', {})
cfg['models']['mode'] = 'merge'
cfg['models']['providers'] = cfg['models'].get('providers', {})
cfg['models']['providers']['docker-model-runner'] = {
    'baseUrl': 'http://127.0.0.1:54321/engines/llama.cpp/v1',
    'apiKey': 'not-needed',
    'api': 'openai-completions',
    'models': [{
        'id': 'ai/qwen2.5:7B-Q4_K_M',
        'name': 'Qwen 2.5 7B (Docker Model Runner)',
        'reasoning': False, 'input': ['text'],
        'cost': {'input': 0, 'output': 0, 'cacheRead': 0, 'cacheWrite': 0},
        'contextWindow': 32768, 'maxTokens': 8192
    }]
}
cfg['agents'] = cfg.get('agents', {})
cfg['agents']['defaults'] = cfg['agents'].get('defaults', {})
cfg['agents']['defaults']['model'] = {'primary': 'docker-model-runner/ai/qwen2.5:7B-Q4_K_M'}
cfg['gateway'] = {'mode': 'local'}
with open(p, 'w') as f: json.dump(cfg, f, indent=2)
"

4. Save and share

Exit the sandbox and save it as a reusable image:

docker sandbox save my-openclaw my-openclaw-image:latest

Push it to a registry so anyone can use it:

docker tag my-openclaw-image:latest yourname/my-openclaw:latest
docker push yourname/my-openclaw:latest

Anyone with Docker Desktop (with the modern sandboxes includes) can spin up the same environment with:

docker sandbox create --name openclaw -t yourname/my-openclaw:latest shell .

What’s next

Docker Sandboxes make it easy to run any AI coding agent in an isolated, reproducible environment. With Docker Model Runner, you get a fully local AI coding setup: no cloud dependencies, no API costs, and complete privacy.

Try it out and let us know what you think.

The Multi-Model Database for AI Agents: Deploy SurrealDB with Docker Extension

Jennifer Kohl — Tue, 17 Feb 2026 14:00:00 +0000

When it comes to building dynamic and real-work solutions, developers need to stitch multiple databases (relational, document, graph, vector, time-series, search) together and build complex API layers to integrate them. This generates significant complexity, cost, and operational risk, and reduces speed of innovation. More often than not, developers end up focusing on building glue code and managing infrastructure rather than building application logic. For AI use cases, using multiple databases means AI Agents have fragmented data, context and memory, producing bad outputs at high latency.

Enter SurrealDB.

SurrealDB is a multi-model database built in Rust that unifies document, graph, relational, time-series, geospatial, key-value, and vector data into a single engine. Its SQL-like query language, SurrealQL, lets you traverse graphs, perform vector search, and query structured data – all in one statement.

Designed for data-intensive workloads like AI agent memory, knowledge graphs, real-time applications, and edge deployments, SurrealDB runs as a single binary anywhere: embedded in your app, in the browser via WebAssembly, at the edge, or as a distributed cluster.

What problem does SurrealDB solve?

Modern AI systems place very different demands on data infrastructure than traditional applications. SurrealDB addresses these pressures directly:

Single runtime for multiple data models – AI systems frequently combine vector search, graph traversal, document storage, real-time state, and relational data in the same request path. SurrealDB supports these models natively in one engine, avoiding brittle cross-database APIs, ETL pipelines, and consistency gaps.
Low-latency access to changing context – Voice agents, interactive assistants, and stateful agents are sensitive to both latency and data freshness. SurrealDB’s query model and real-time features serve up-to-date context without polling or background sync jobs.
Reduced system complexity – Replacing multiple specialized databases with a single multi-model store reduces services, APIs, and failure modes. This simplifies deployment, debugging, and long-term maintenance.
Faster iteration on data-heavy features – Opt in schemas definitions and expressive queries let teams evolve data models alongside AI features without large migrations. This is particularly useful when experimenting with embeddings, relationships, or agent memory structures.
Built-in primitives for common AI patterns – Native support for vectors, graphs, and transactional consistency enables RAG, graph-augmented retrieval, recommendation pipelines, and agent state management – without external systems or custom glue code.

In this article, you’ll see how to build a WhatsApp RAG chatbot using SurrealDB Docker Extension. You’ll learn how SurrealDB Docker Extension powers an intelligent WhatsApp chatbot that turns your chat history into searchable, AI-enhanced conversations with vector embeddings and precise source citations.

Understanding SurrealDB Architecture

SurrealDB’s architecture unifies multiple data models within a single database engine, eliminating the need for separate systems and synchronization logic (figure below).

Caption: SurrealDB Architecture diagram

Caption: Architecture diagram of SurrealDB showing a unified multi-model database with real-time capabilities. (more information at https://surrealdb.com/docs/surrealdb/introduction/architecture)

With SurrealDB, you can:

Model complex relationships using graph traversal syntax (e.g., ->bought_together->product)
Store flexible documents alongside structured relational tables
Subscribe to real-time changes with LIVE SELECT queries that push updates instantly
Ensure data consistency with ACID-compliant transactions across all models

Learn more about SurrealDB’s architecture and key features on the official documentation.

How does Surreal work?

SurrealDB separates storage from compute, enabling you to scale these independently without the need to manually shard your data.

The query layer (otherwise known as the compute layer) handles queries from the client, analyzing which records need to be selected, created, updated, or deleted.

The storage layer handles the storage of the data for the query layer. By scaling storage nodes, you are able to increase the amount of supported data for each deployment.

SurrealDB supports all the way from single-node to highly scalable fault-tolerant deployments with large amounts of data.

For more information, see https://surrealdb.com/docs/surrealdb/introduction/architecture.

Why should you run SurrealDB as a Docker Extension

For developers already using Docker Desktop, running SurrealDB as an extension eliminates friction. There’s no separate installation, no dependency management, no configuration files – just a single click from the Extensions Marketplace.

Docker provides the ideal environment to bundle and run SurrealDB in a lightweight, isolated container. This encapsulation ensures consistent behavior across macOS, Windows, and Linux, so what works on your laptop works identically in staging.

The Docker Desktop Extension includes:

Visual query editor with SurrealQL syntax highlighting
Real-time data explorer showing live updates as records change
Schema visualization for tables and relationships
Connection management to switch between local and remote instances
Built-in backup/restore for easy data export and import

With Docker Desktop as the only prerequisite, you can go from zero to a running SurrealDB instance in under a minute.

Getting Started

To begin, download and install Docker Desktop on your machine. Then follow these steps:

Open Docker Desktop and select Extensions in the left sidebar
Switch to the Browse tab
In the Filters dropdown, select the Database category
Find SurrealDB and click Install

Caption: Installing the SurrealDB Extension from Docker Desktop’s Extensions Marketplace.

Real-World Example

Smart Team Communication Assistant

Imagine searching through months of team WhatsApp conversations to answer the question: “What did we decide about the marketing campaign budget?”

Traditional keyword search fails, but RAG with SurrealDB and LangChain solves this by combining semantic vector search with relationship graphs.

This architecture analyzes group chats (WhatsApp, Instagram, Slack) by storing conversations as vector embeddings while simultaneously building a knowledge graph linking conversations through extracted keywords like “budget,” “marketing,” and “decision.” When queried, the system retrieves relevant context using both similarity matching and graph traversal, delivering accurate answers about past discussions, decisions, and action items even when phrased differently than the original conversation.

This project is inspired by Multi-model RAG with LangChain | GitHub Example

1. Clone the repository:

git clone https://github.com/Raveendiran-RR/surrealdb-rag-demo

2. Enable Docker Model Runner by visiting Docker Desktop > Settings > AI

Caption: Enable Docker Model Runner in Docker Desktop > settings > AI

3. Pull llama3.2 model from Docker Hub

Search for llama 3.2 under Models > Docker Hub and pull the right model.

Caption: Pull the Docker model llama3.2

4. Download the embeddinggemma model from Docker Hub

Caption: Click on Models > Search for embeddinggemma > download the model

5. Run this command to connect to the persistent surrealDB container

Browse to the directory where you have cloned the repository
Create directory “mydata”

mkdir -p mydata

6. Run this command:

docker run -d --name demo_data \
  -p 8002:8000 \
  -v "$(pwd)/mydata:/mydata" \
  surrealdb/surrealdb:latest \
  start --log debug --user root --pass root \
  rocksdb://mydata

Note: use the path based on the operating system.

For windows , use rocksdb://mydata
For linux and macOS, use rocksdb:/mydata

7. Open SurrealDB Docker Extension and connect with SurrealDB.

Caption: Connecting to SurrealDB through Docker Desktop Extension

Connection name: RAGBot

Remote address: http://localhost:8002

Username: root | password: root

Click on Create Connection

8. Run the setup instructions

9. Upload the whatsapp chat

Start the UI for the RAG bot (http://localhost:8080)

Caption: Create connection to the SurrealDB Docker container

10. Start chatting with the RAG bot and have fun

11. We can verify the correctness data in SurrealDB list

Ensure that you connect to the right namespace (whatsapp) and database (chats)

python3 load_whatsapp.py
python3 rag_chat_ui.py

Caption: connect to the “whatsapp” namespace and “chats” database

Caption: Data stored as vectors in SurrealDB

Caption: Interact with the RAG bot UI where it gives you the answer and exact reference for it

Using this chat bot, now you can get information about the chat.txt file that was ingested. You can also verify the information in the query editor as shown below when you can run custom queries to validate the results from the chat bot. You can ingest new messages through the load_whatsapp.py file, please ensure that the message format is same as in the sample whatsChatExport.txt file.

Learn more about SurrealQL here.

Caption: SurrealDB Query editor in the Docker Desktop Extension

Conclusion

The SurrealDB Docker Extension offers an accessible and powerful solution for developers building data-intensive applications – especially those working with AI agents, knowledge graphs, and real-time systems. Its multi-model architecture eliminates the need to stitch together separate databases, letting you store documents, traverse graphs, query vectors, and subscribe to live updates from a single engine.

With Docker Desktop integration, getting started takes seconds rather than hours. No configuration files, no dependency management – just install the extension and start building. The visual query editor and real-time data explorer make it easy to prototype schemas, test queries, and inspect data as it changes.

Whether you’re building agent memory systems, real-time recommendation engines, or simply looking to consolidate a sprawling database stack, SurrealDB’s Docker Extension provides an intuitive path forward. Install it today and see how a unified data layer can simplify your architecture.

If you have questions or want to connect with other SurrealDB users, join the SurrealDB community on Discord.

Learn More

Install the SurrealDB Docker Extension
Get the latest release of Docker Desktop
SurrealDB documentation
Vote on what’s next! Check out our public roadmap
Have questions? The Docker community is here to help
New to Docker? Get started

Running NanoClaw in a Docker Shell Sandbox

Jennifer Kohl — Mon, 16 Feb 2026 14:00:00 +0000

Ever wanted to run a personal AI assistant that monitors your WhatsApp messages 24/7, but worried about giving it access to your entire system? Docker Sandboxes’ new shell sandbox type is the perfect solution. In this post, I’ll show you how to run NanoClaw, a lightweight Claude-powered WhatsApp assistant, inside a secure, isolated Docker sandbox.

What is the Shell Sandbox?

Docker Sandboxes provides pre-configured environments for running AI coding agents like Claude Code, Gemini CLI, and others. But what if you want to run a different agent or tool that isn’t built-in?
That’s where the shell sandbox comes in. It’s a minimal sandbox that drops you into an interactive bash shell inside an isolated microVM. No pre-installed agent, no opinions — just a clean Ubuntu environment with Node.js, Python, git, and common dev tools. You install whatever you need.

Why Run NanoClaw in a Sandbox?

NanoClaw already runs its agents in containers, so it’s security-conscious by design. But running the entire NanoClaw process inside a Docker sandbox adds another layer:

Filesystem isolation – NanoClaw can only see the workspace directory you mount, not your home directory
Credential management – API keys are injected via Docker’s proxy, never stored inside the sandbox
Clean environment – No conflicts with your host’s Node.js version or global packages
Disposability – Nuke it and start fresh anytime with docker sandbox rm

Prerequisites

Docker Desktop installed and running
Docker Sandboxes CLI (docker sandbox command available) (v.0.12.0 available in the nightly build as of Feb 13)
An Anthropic API key in an env variable

Setting It Up

Create the sandbox

Pick a directory on your host that will be mounted as the workspace inside the sandbox. This is the only part of your filesystem the sandbox can see:

mkdir -p ~/nanoclaw-workspace
docker sandbox create --name nanoclaw shell ~/nanoclaw-workspace

Connect to it

docker sandbox run nanoclaw

You’re now inside the sandbox – an Ubuntu shell running in an isolated VM. Everything from here on happens inside the sandbox.

Install Claude Code

The shell sandbox comes with Node.js 20 pre-installed, so we can install Claude Code directly via npm:

npm install -g @anthropic-ai/claude-code

Configure the API key

This is the one extra step needed in a shell sandbox. The built-in claude sandbox type does this automatically, but since we’re in a plain shell, we need to tell Claude Code to get its API key from Docker’s credential proxy:

mkdir -p ~/.claude && cat > ~/.claude/settings.json << 'EOF'
{
  "apiKeyHelper": "echo proxy-managed",
  "defaultMode": "bypassPermissions",
  "bypassPermissionsModeAccepted": true
}
EOF

What this does: apiKeyHelper tells Claude Code to run echo proxy-managed to get its API key. The sandbox’s network proxy intercepts outgoing API calls and swaps this sentinel value for your real Anthropic key, so the actual key never exists inside the sandbox.

Clone NanoClaw and install dependencies

cd ~/workspace
git clone https://github.com/qwibitai/nanoclaw
cd nanoclaw
npm install

Run Claude and set up NanoClaw

NanoClaw uses Claude Code for its initial setup – configuring WhatsApp authentication, the database, and the container runtime:

claude

Once Claude starts, run /setup and follow the prompts. Claude will walk you through scanning a WhatsApp QR code and configuring everything else.

Start NanoClaw

After setup completes, start the assistant:

npm start

NanoClaw is now running and listening for WhatsApp messages inside the sandbox.

Managing the Sandbox

# List all sandboxes
docker sandbox ls

# Stop the sandbox (stops NanoClaw too)
docker sandbox stop nanoclaw

# Start it again
docker sandbox start nanoclaw

# Remove it entirely
docker sandbox rm nanoclaw

What Else Could You Run?

The shell sandbox isn’t specific to NanoClaw. Anything that runs on Linux and talks to AI APIs is a good fit:

Custom agents built with the Claude Agent SDK or any other AI agent: Claude code, Codex, Github Copilot, OpenCode, Kiro, and more.
AI-powered bots and automation scripts
Experimental tools you don’t want running on your host

The pattern is always the same: create a sandbox, install what you need, configure credentials via the proxy, and run it.

docker sandbox create --name my-shell shell ~/my-workspace
docker sandbox run my-shell

How to solve the context size issues with context packing with Docker Model Runner and Agentic Compose

Yiwen Xu — Fri, 13 Feb 2026 13:57:36 +0000

If you’ve worked with local language models, you’ve probably run into the context window limit, especially when using smaller models on less powerful machines. While it’s an unavoidable constraint, techniques like context packing make it surprisingly manageable.

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. In my previous blog post, I wrote about how to make a very small model useful by using RAG. I had limited the message history to 2 to keep the context length short.

But in some cases, you’ll need to keep more messages in your history. For example, a long conversation to generate code:

- generate an http server server in golang
- add a human structure and a list of humans
- add a handler to add a human to the list
- add a handler to list all humans
- add a handler to get a human by id
- etc...

Let’s imagine we have a conversation for which we want to keep 10 messages in the history. Moreover, we’re using a very verbose model (which a lot of tokens), so we’ll quickly encounter this type of error:

error: {
    code: 400,
    message: 'request (8860 tokens) exceeds the available context size (8192 tokens), try increasing it',
    type: 'exceed_context_size_error',
    n_prompt_tokens: 8860,
    n_ctx: 8192
  },
  code: 400,
  param: undefined,
  type: 'exceed_context_size_error'
}

What happened?

Understanding context windows and their limits in local LLMs

Our LLM has a context window, which has a limited size. This means that if the conversation becomes too long… It will bug out.

This window is the total number of tokens the model can process at once, like a short-term working memory. Read this IBM article for a deep dive on context window

In our example in the code snippet above, this size was set to 8192 tokens for LLM engines that power local LLM, like Docker Model Runner, Ollama, Llamacpp, …

This window includes everything: system prompt, user message, history, injected documents, and the generated response. Refer to this Redis post for more info.

Example: if the model has 32k context, the sum (input + history + generated output) must remain ≤ 32k tokens. Learn more here.

It’s possible to change the default context size (up or down) in the compose.yml file:

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m
    # Increased context size for better handling of larger inputs
    context_size: 16384

You can also do this with Docker with the following command: docker model configure –context-size 8192 ai/qwen2.5-coder `

And so we solve the problem, but only part of the problem. Indeed, it’s not guaranteed that your model supports a larger context size (like 16384), and even if it does, it can very quickly degrade the model’s performance.

Thus, with hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, when the number of tokens in the context approaches 16384 tokens, generation can become (much) slower (at least on my machine). Again, this will depend on the model’s capacity (read its documentation). And remember, the smaller the model, the harder it will be to handle a large context and stay focused.

Tips: always provide an option (a /clear command for example) in your application to empty the message list, or to reduce it. Automatic or manual. Keep the initial system instructions though.

So we’re at an impasse. How can we go further with our small models?

Well, there is still a solution, which is called context packing.

Using context packing to fit more information into limited context windows

We can’t indefinitely increase the context size. To still manage to fit more information in the context, we can use a technique called “context packing”, which consists of having the model itself summarize previous messages (or entrust the task to another model), and replace the history with this summary and thus free up space in the context.

So we decide that from a certain token limit, we’ll have the history of previous messages summarized, and replace this history with the generated summary.

I’ve therefore modified my example to add a context packing step. For the exercise, I decided to use another model to do the summarization.

Modification of the compose.yml file

I added a new model in the compose.yml file: ai/qwen2.5:1.5B-F16

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

  context-packing-model:
    model: ai/qwen2.5:1.5B-F16

Then:

I added the model in the models section of the service that runs our program.
I increased the number of messages in the history to 10 (instead of 2 previously).
I set a token limit at 5120 before triggering context compression.
And finally, I defined instructions for the “context packing” model, asking it to summarize previous messages.

excerpt from the service:

golang-expert-v3:
build:
    context: .
    dockerfile: Dockerfile
environment:

    HISTORY_MESSAGES: 10
    TOKEN_LIMIT: 5120
    # ...
   
configs:
    - source: system.instructions.md
    target: /app/system.instructions.md
    - source: context-packing.instructions.md
    target: /app/context-packing.instructions.md

models:
    chat-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CHAT

    context-packing-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CONTEXT_PACKING

    embedding-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_EMBEDDING

You’ll find the complete version of the file here: compose.yml

System instructions for the context packing model

Still in the compose.yml file, I added a new system instruction for the “context packing” model, in a context-packing.instructions.md file:

context-packing.instructions.md:
content: |\
    You are a context packing assistant.
    Your task is to condense and summarize provided content to fit within token limits while preserving essential information.
    Always:
    - Retain key facts, figures, and concepts
    - Remove redundant or less important details
    - Ensure clarity and coherence in the condensed output
    - Aim to reduce the token count significantly without losing critical information

    The goal is to help fit more relevant information into a limited context window for downstream processing.

All that’s left is to implement the context packing logic in the assistant’s code.

Applying context packing to the assistant’s code

First, I define the connection with the context packing model in the Setup part of my assistant:

const contextPackingModel = new ChatOpenAI({
  model: process.env.MODEL_RUNNER_LLM_CONTEXT_PACKING || `ai/qwen2.5:1.5B-F16`,
  apiKey: "",
  configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
  },
  temperature: 0.0,
  top_p: 0.9,
  presencePenalty: 2.2,
});

I also retrieve the system instructions I defined for this model, as well as the token limit:

let contextPackingInstructions = fs.readFileSync('/app/context-packing.instructions.md', 'utf8');

let tokenLimit = parseInt(process.env.TOKEN_LIMIT) || 7168

Once in the conversation loop, I’ll estimate the number of tokens consumed by previous messages, and if this number exceeds the defined limit, I’ll call the context packing model to summarize the history of previous messages and replace this history with the generated summary (the assistant-type message: [“assistant”, summary]). Then I continue generating the response using the main model.

excerpt from the conversation loop:

 let estimatedTokenCount = messages.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);
  console.log(` Estimated token count for messages: ${estimatedTokenCount} tokens`);

  if (estimatedTokenCount >= tokenLimit) {
    console.log(` Warning: Estimated token count (${estimatedTokenCount}) exceeds the model's context limit (${tokenLimit}). Compressing conversation history...`);

    // Calculate original history size
    const originalHistorySize = history.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);

    // Prepare messages for context packing
    const contextPackingMessages = [
      ["system", contextPackingInstructions],
      ...history,
      ["user", "Please summarize the above conversation history to reduce its size while retaining important information."]
    ];

    // Generate summary using context packing model
    console.log(" Generating summary with context packing model...");
    let summary = '';
    const summaryStream = await contextPackingModel.stream(contextPackingMessages);
    for await (const chunk of summaryStream) {
      summary += chunk.content;
      process.stdout.write('\x1b[32m' + chunk.content + '\x1b[0m');
    }
    console.log();

    // Calculate compressed size
    const compressedSize = Math.ceil(summary.length / 4);
    const reductionPercentage = ((originalHistorySize - compressedSize) / originalHistorySize * 100).toFixed(2);

    console.log(` History compressed: ${originalHistorySize} tokens → ${compressedSize} tokens (${reductionPercentage}% reduction)`);

    // Replace all history with the summary
    conversationMemory.set("default-session-id", [["assistant", summary]]);

    estimatedTokenCount = compressedSize

    // Rebuild messages with compressed history
    messages = [
      ["assistant", summary],
      ["system", systemInstructions],
      ["system", knowledgeBase],
      ["user", userMessage]
    ];
  }

You’ll find the complete version of the code here: index.js

All that’s left is to test our assistant and have it hold a long conversation, to see context packing in action.

docker compose up --build -d
docker compose exec golang-expert-v3 node index.js

And after a while in the conversation, you should see the warning message about the token limit, followed by the summary generated by the context packing model, and finally, the reduction in the number of tokens in the history:

Estimated token count for messages: 5984 tokens
Warning: Estimated token count (5984) exceeds the model's context limit (5120). Compressing conversation history...
Generating summary with context packing model...
Sure, here's a summary of the conversation:

1. The user asked for an example in Go of creating an HTTP server.
2. The assistant provided a simple example in Go that creates an HTTP server and handles GET requests to display "Hello, World!".
3. The user requested an equivalent example in Java.
4. The assistant presented a Java implementation that uses the `java.net.http` package to create an HTTP server and handle incoming requests.

The conversation focused on providing examples of creating HTTP servers in both Go and Java, with the goal of reducing the token count while retaining essential information.
History compressed: 4886 tokens → 153 tokens (96.87% reduction)

This way, we ensure that our assistant can handle a long conversation while maintaining good generation performance.

Summary

The context window is an unavoidable constraint when working with local language models, particularly with small models and on machines with limited resources. However, by using techniques like context packing, you can easily work around this limitation. Using Docker Model Runner and Agentic Compose, you can implement this pattern to support long, verbose conversations without overwhelming your model.

All the source code is available on Codeberg: context-packing. Give it a try!