AI/ML – Docker

What’s Holding Back AI Agents? It’s Still Security

Yiwen Xu — Tue, 10 Mar 2026 12:59:28 +0000

It’s hard to find a team today that isn’t talking about agents. For most organizations, this isn’t a “someday” project anymore. Building agents is a strategic priority for 95% of respondents that we surveyed across the globe with 800+ developers and decision makers in our latest State of Agentic AI research. The shift is happening fast: agent adoption has moved beyond experiments and demos into something closer to early operational maturity. 60% of organizations already report having AI agents in production, though a third of those remain in early stages.

Agent adoption today is driven by a pragmatic focus on productivity, efficiency, and operational transformation, not revenue growth or cost reduction. Early adoption is concentrated in internal, productivity-focused use cases, especially across software, infrastructure, and operations. The feedback loops are fast, and the risks are easier to control.

So what’s holding back agent scaling? Friction shows up and nearly all roads lead to the same place: AI agent security.

AI agent security isn’t one issue it’s the constraint

When teams talk about what’s holding them back, AI agent security rises to the top. In the same survey, 40% of respondents cite security as their top blocker when building agents. The reason it hits so hard is that it’s not confined to a single layer of the stack. It shows up everywhere, and it compounds as deployments grow.

For starters, when it comes to infrastructure, as organizations expand agent deployments, teams emphasize the need for secure sandboxing and runtime isolation, even for internal agents.

At the operations layer, complexity becomes a security problem. Once you have more tools, more integrations, and more orchestration logic, it gets harder to see what’s happening end-to-end and harder to control it. Our latest research data reflects that sprawl: over a third of respondents report challenges coordinating multiple tools, and a comparable share say integrations introduce security or compliance risk. That’s a classic pattern: operational complexity creates blind spots, and blind spots become exposure.

45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready.

And at the governance layer, enterprises want something simple: consistency. They want guardrails, policy enforcement, and auditability that work across teams and workflows. But current tooling isn’t meeting that bar yet. In fact, 45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready. That’s not a minor complaint: it’s the difference between “we can try this” and “we can scale this.”

MCP is popular but not ready for enterprise

Many teams are adopting Model Context Protocol (MCP) because it gives agents a standardized way to connect to tools, data, and external systems, making agents more useful and customized. Among respondents further along in their agent journey, 85% say they’re familiar with MCP and two-thirds say they actively use it across personal and professional projects.

Research data suggests that most teams are operating in what could be described as “leap-of-faith mode” when it comes to MCP, adopting the protocol without security guarantees and operational controls they would demand from mature enterprise infrastructure.

But the security story hasn’t caught up yet. Teams adopt MCP because it works, but they do so without the security guarantees and operational controls they would expect from mature enterprise infrastructure. For teams earlier in their agentic journey: 46% of them identify security and compliance as the top challenge with MCP.

Organizations are increasingly watching for threats like prompt injection and tool poisoning, along with the more foundational issues of access control, credentials, and authentication. The immaturity and security challenges of current MCP tooling make for a fragile foundation at this stage of agentic adoption.

Conclusion and recommendations

Ai agent security is what sets the speed limit for agentic AI in the enterprise. Organizations aren’t lacking interest, they’re lacking confidence that today’s tooling is enterprise-ready, that access controls can be enforced reliably, and that agents can be kept safely isolated from sensitive systems.

The path forward is clear. Unlocking agents’ full potential will require new platforms built for enterprise scale, with secure-by-default foundations, strong governance, and policy enforcement that’s integrated, not bolted on.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise.

Join us on March 25, 2026, for a webinar where we’ll walk through the key findings and the strategies that can help you prioritize what comes next.

Learn more:

Get your copy of the latest State of Agentic AI report!
Learn more about Docker’s AI solutions
Read more about why AI agents challenge existing governance approaches and explore a new framework designed for agentic AI.

Celebrating Women in AI: 3 Questions with Cecilia Liu on Leading Docker’s MCP Strategy

Yiwen Xu — Fri, 06 Mar 2026 12:59:30 +0000

To celebrate International Women’s Day, we sat down with Cecilia Liu, Senior Product Manager at Docker, for three questions about the vision and strategy behind Docker’s MCP solutions. From shaping product direction to driving AI innovation, Cecilia plays a key role in defining how Docker enables secure, scalable AI tooling.

Cecilia leads product management for Docker’s MCP Catalog and Toolkit, our solution for running MCP servers securely and at scale through containerization. She drives Docker’s AI strategy across both enterprise and developer ecosystems, helping organizations deploy MCP infrastructure with confidence while empowering individual developers to seamlessly discover, integrate, and use MCP in their workflows. With a technical background in AI frameworks and an MBA from NYU Stern, Cecilia bridges the worlds of AI infrastructure and developer tools, turning complex challenges into practical, developer-first solutions.

What products are you responsible for?

I own Docker’s MCP solution. At its core, it’s about solving the problems that anyone working with MCP runs into: how do you find the right MCP servers, how do you actually use them without a steep learning curve, and how do you deploy and manage them reliably across a team or organization.

How does Docker’s MCP solution benefit developers and enterprise customers?

Dev productivity is where my heart is. I want to build something that meaningfully helps developers at every stage of their cycle — and that’s exactly how I think about Docker’s MCP solution.

For end-user developers and vibe coders, the goal is simple: you shouldn’t need to understand the underlying infrastructure to get value from MCP. As long as you’re working with AI, we make it easy to discover, configure, and start using MCP servers without any of the usual setup headaches. One thing I kept hearing in user feedback was that people couldn’t even tell if their setup was actually working. That pushed us to ship in-product setup instructions that walk you through not just configuration, but how to verify everything is running correctly. It sounds small, but it made a real difference.

For developers building MCP servers and integrating them into agents, I’m focused on giving them the right creation and testing tools so they can ship faster and with more confidence. That’s a big part of where we’re headed.

And for security and enterprise admins, we’re solving real deployment pain, making it faster and cheaper to roll out and manage MCP across an entire organization. Custom catalogs, role-based access controls, audit logging, policy enforcement. The goal is to give teams the visibility and control they need to adopt AI tooling confidently at scale.

Customers love us for all of the above, and there’s one more thing that ties it together: the security that comes built-in with Docker. That trust doesn’t happen overnight, and it’s something we take seriously across everything we ship.

What are you excited about when it comes to the future of MCP?

What excites me most is honestly the pace of change itself. The AI landscape is shifting constantly, and with every new tool that makes AI more powerful, there’s a whole new set of developers who need a way to actually use it productively. That’s a massive opportunity.

MCP is where that’s happening right now, and the adoption we’re seeing tells me the need is real. But what gets me out of bed is knowing the problems we’re solving: discoverability, usability, deployment. They are all going to matter just as much for whatever comes next. We’re not just building for today’s tools. We’re building the foundation that developers will reach for every time something new emerges.

Cecilia is speaking about scaling MCP for enterprises at the MCP Dev Summit in NYC on 3rd of April, 2026. If you’re attending, be sure to stop by Docker’s booth (D/P9).

Learn more

Explore Docker’s MCP Catalog and Toolkit on our website.
Dive into our documentation to get started quickly.
Ready to go hands-on? Open Docker Desktop or the CLI and start using MCP to streamline and automate your development workflows.

Docker Model Runner Brings vLLM to macOS with Apple Silicon

Yiwen Xu — Thu, 26 Feb 2026 14:42:57 +0000

vLLM has quickly become the go-to inference engine for developers who need high-throughput LLM serving. We brought vLLM to Docker Model Runner for NVIDIA GPUs on Linux, then extended it to Windows via WSL2.

That changes today. Docker Model Runner now supports vllm-metal, a new backend that brings vLLM inference to macOS using Apple Silicon’s Metal GPU. If you have a Mac with an M-series chip, you can now run MLX models through vLLM with the same OpenAI-compatible API, same Anthropic-compatible API for tools like Claude Code, and all in one, the same Docker workflow.

What is vllm-metal?

vllm-metal is a plugin for vLLM that brings high-performance LLM inference to Apple Silicon. Developed in collaboration between Docker and the vLLM project, it unifies MLX, the Apple’s machine learning framework, and PyTorch under a single compute pathway, plugging directly into vLLM’s existing engine, scheduler, and OpenAI-compatible API server.

The architecture is layered: vLLM’s core (engine, scheduler, tokenizer, API) stays unchanged on top. A plugin layer consisting of MetalPlatform, MetalWorker, and MetalModelRunner handles the Apple Silicon specifics. Underneath, MLX drives the actual inference while PyTorch handles model loading and weight conversion. The whole stack runs on Metal, Apple’s GPU framework.

+-------------------------------------------------------------+
|                          vLLM Core                          |
|        Engine | Scheduler | API | Tokenizers                |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   vllm_metal Plugin Layer                   |
|   +-----------+  +-----------+  +------------------------+  |
|   | Platform  |  | Worker    |  | ModelRunner            |  |
|   +-----------+  +-----------+  +------------------------+  |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                   Unified Compute Backend                   |
|   +------------------+    +----------------------------+    |
|   | MLX (Primary)    |    | PyTorch (Interop)          |    |
|   | - SDPA           |    | - HF Loading               |    |
|   | - RMSNorm        |    | - Weight Conversion        |    |
|   | - RoPE           |    | - Tensor Bridge            |    |
|   | - Cache Ops      |    |                            |    |
|   +------------------+    +----------------------------+    |
+-------------------------------------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|                       Metal GPU Layer                       |
|           Apple Silicon Unified Memory Architecture         |
+-------------------------------------------------------------+

Figure 1: High-level architecture diagram of vllm-metal. Credit: vllm-metal

What makes this particularly effective on Apple Silicon is unified memory. Unlike discrete GPUs where data must be copied between CPU and GPU memory, Apple Silicon shares a single memory pool. vllm-metal exploits this with zero-copy tensor operations. Combined with paged attention for efficient KV cache management and Grouped-Query Attention support, this means you can serve longer sequences with less memory waste.

vllm-metal runs MLX models published by the mlx-community on Hugging Face. These models are built specifically for the MLX framework and take full advantage of Metal GPU acceleration. Docker Model Runner automatically routes MLX models to vllm-metal when the backend is installed, falling back to the built-in MLX backend otherwise.

How vllm-metal works

vllm-metal runs natively on the host. This is necessary because Metal GPU access requires direct hardware access and there is no GPU passthrough for Metal in containers.

When you install the backend, Docker Model Runner:

Pulls a Docker image from Hub that contains a self-contained Python 3.12 environment with vllm-metal and all dependencies pre-packaged.
Extracts it to `~/.docker/model-runner/vllm-metal/`.
Verifies the installation by importing the `vllm_metal` module.

When a request comes in for a compatible model, the Docker Model Runner’s scheduler starts a vllm-metal server process that communicates over TCP, serving the standard OpenAI API. The model is loaded from Docker’s shared model store, which contains all the models you pull with `docker model pull`.

Which models work with vllm-metal?

vllm-metal works with safetensors models in MLX format. The mlx-community on Hugging Face maintains a large collection of quantized models optimized for Apple Silicon. Some examples you can try:

vLLM everywhere with Docker Model Runner

With vllm-metal, Docker Model Runner now supports vLLM across the three major platforms:

Platform	Backend	GPU
Linux	vllm	NVIDIA (CUDA)
Windows (WSL2)	vllm	NVIDIA (CUDA)
macOS	vllm-metal	Apple Silicon (Metal)

The same docker model commands work regardless of platform. Pull a model, run it. Docker Model Runner picks the right backend for your platform.

Get started

Update to Docker Desktop 4.62 or later for Mac, and install the backend:

docker model install-runner --backend vllm-metal

Check out the Docker Model Runner documentation to learn more. For contributions, feedback, and bug reports, visit the docker/model-runner repository on GitHub.

Giving Back: vllm-metal is Now Open Source

At Docker, we believe that the best way to accelerate AI development is to build in the open. That is why we are proud to announce that Docker has contributed the vllm-metal project to the vLLM community. Originally developed by Docker engineers to power Model Runner on macOS, this project now lives under the vLLM GitHub organization. This ensures that every developer in the ecosystem can benefit from and contribute to high-performance inference on Apple Silicon. The project also has had significant contributions by Lik Xun Yuan, Ricky Chen and Ranran Haoran Zhang.

The $599 AI Development Rig

For a long time, high-throughput vLLM development was gated behind a significant GPU cost. To get started, you typically need a dedicated Linux box with an RTX 4090 ($1,700+) or enterprise-grade A100/H100 cards ($10,000+).

vllm-metal changes the math

Now, a base $599 Mac Mini with an M4 chip becomes a viable vLLM development environment. Because Apple Silicon uses Unified Memory, that 16GB (or upgraded 32GB/64GB) of RAM is directly accessible by the GPU. This allows you to:

Develop & Test Locally: Build your vLLM-based applications on the same machine you use for coding.
Production-Mirroring: Use the exact same OpenAI-compatible API on your Mac Mini as you would on an H100 cluster in production.
Energy Efficiency: Run inference at a fraction of the power consumption (and heat) of a discrete GPU rig.

How does vllm-metal compare to llama.cpp?

We benchmarked both backends using Llama 3.2 1B Instruct with comparable 4-bit quantization, served through Docker Model Runner on Apple Silicon.

	llama.cpp	vLLM-Metal
Model	unsloth/Llama-3.2-1B-Instruct-GGUF:Q4_0	mlx-community/llama-3.2-1b-instruct-4bit
Format	GGUF (Q4_0)	Safetensors (MLX 4-bit)

Throughput (tokens/sec, wall-clock)

max_tokens	llama.cpp	vLLM-Metal	speedup
128	333.3	251.5	1.3x
512	345.1	279.0	1.3x
1024	338.5	275.4	1.2x
2048	339.1	279.5	1.2x

Each configuration was run 3 times across 3 different prompts (9 total requests per data point).

Throughput is measured as completion_tokens / wall_clock_time, applied consistently to both backends.

Key observations:

llama.cpp is consistently ~1.2x faster than vLLM-Metal across all output lengths.
llama.cpp throughput is remarkably stable (~333-345 tok/s regardless of max_tokens), while vLLM-Metal shows more variance between individual runs (134-343 tok/s).
Both backends scale well. Neither backend shows significant degradation as output length increases.
Quantization methods differ (GGUF Q4_0 vs MLX 4-bit), so this benchmarks the full stack, engine + quantization, rather than the engine alone.

The benchmark script used for these results is available as a GitHub Gist.

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. To get involved:

Star the repository: Show your support by starring the Docker Model Runner repo.
Contribute your ideas: Create an issue or submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends and colleagues who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn More

Read the companion post: OpenCode with Docker Model Runner for Private AI Coding
Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo
Get started with a simple hello GenAI application

Open WebUI + Docker Model Runner: Self-Hosted Models, Zero Configuration

Yiwen Xu — Wed, 25 Feb 2026 14:37:33 +0000

We’re excited to share a seamless new integration between Docker Model Runner (DMR) and Open WebUI, bringing together two open source projects to make working with self-hosted models easier than ever.

With this update, Open WebUI automatically detects and connects to Docker Model Runner running at localhost:12434. If Docker Model Runner is enabled, Open WebUI uses it out of the box, no additional configuration required.

The result: a fully Docker-managed, self-hosted model experience running in minutes.

Note for Docker Desktop users:
If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled. Open WebUI connects to Docker Model Runner over HTTP, which requires the TCP port to be exposed:

docker desktop enable model-runner --tcp

Better Together: Docker Model Runner and Open WebUI

Docker Model Runner and Open WebUI come from the same open source mindset. They’re built for developers who want control over where their models run and how their systems are put together, whether that’s on a laptop for quick experimentation or on a dedicated GPU host with more horsepower behind it.

Docker Model Runner focuses on the runtime layer: a Docker-native way to run and manage self-hosted models using the tooling developers already rely on. Open WebUI focuses on the experience: a clean, extensible interface that makes those models accessible and useful.

Now, the two connect automatically.

No manual endpoint configuration. No extra flags.

That’s the kind of integration open source does best, separate projects evolving independently, but designed well enough to fit together naturally.

Zero-Config Setup

If Docker Model Runner is enabled, getting started with Open WebUI is as simple as:

docker run -p 3000:8080 openwebui/open-webui

That’s it.

Open WebUI will automatically connect to Docker Model Runner and begin using your self-hosted models, no environment variables, no manual endpoint configuration, no extra flags.

Visit: http://localhost:3000 and create your account:

And you’re ready to interact with your models through a modern web interface:

Open by design

One of the nice things about this integration is that it didn’t require special coordination or proprietary hooks. Docker Model Runner and Open WebUI are both open source projects with clear boundaries and well-defined interfaces. They were built independently, and they still fit together cleanly.

Docker Model Runner focuses on running and managing models in a way that feels natural to anyone already using Docker.

Open WebUI focuses on making those models usable. It provides the interface layer, conversation management, and extensibility you’d expect from a modern web UI.

Because both projects are open, there’s no hidden contract between them. You can see how the connection works. You can modify it if you need to. You can deploy the pieces separately or together. The integration isn’t a black box, it’s just software speaking a clear interface.

Works with Your Setup

One of the practical benefits of this approach is flexibility.

Docker Model Runner doesn’t dictate where your models run. They might live on your laptop during development, on a more powerful remote machine, or inside a controlled internal environment. As long as Docker Model Runner is reachable, Open WebUI can connect to it.

That separation between runtime and interface is intentional. The UI doesn’t need to know how the model is provisioned. The runtime doesn’t need to know how the UI is presented. Each layer does its job.

With this integration, that boundary becomes almost invisible. Start the container, open your browser, and everything lines up.

You decide where the models run. Open WebUI simply meets them there.

Summary

Open WebUI and Docker Model Runner make self-hosted AI simple, flexible and fully under your control. Docker powers the runtime. Open WebUI delivers a modern interface on top.

With automatic detection and zero configuration, you can go from enabling Docker Model Runner to interact with your models in minutes.

Both projects are open source and built with clear boundaries, so you can run models wherever you choose and deploy the pieces together or separately. We can’t wait to see what you build next!

How You Can Get Involved

The strength of Docker Model Runner lies in its community and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.
Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!
Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!
Get started with Docker Model Runner with a simple hello GenAI application

Gordon (Beta): Docker’s AI Agent Just Got an Update

Srini Sekaran — Mon, 23 Feb 2026 14:13:00 +0000

AI agents are moving from demos to daily workflows. They write code, run commands, and complete multi-step tasks without constant hand-holding. But general-purpose agents don’t know Docker. They don’t understand your containers, your images, or your specific setup.

Gordon does. Just run docker ai in your terminal or try it in Docker Desktop.

Available today in Docker Desktop 4.61, still in beta, Gordon is an AI agent purpose-built for Docker. It has shell access, Docker CLI access, your filesystem, and deep knowledge of Docker best practices. Point it at a problem, approve its actions, and watch it work.

Figure 1: docker ai command launching Gordon in terminal interface

Figure 2: Gordon in Docker Desktop sidebar

Why Docker Needs Its Own Agent

When your container exits with code 137, Claude or ChatGPT will explain what OOM means. Gordon checks your container’s memory limit, inspects the logs, identifies the memory-hungry process, and proposes a fix. One approval, and it’s done.

When you need to containerize a Next.js app, Copilot might suggest a Dockerfile. Gordon examines your project structure, detects your dependencies, generates a production-ready Dockerfile with multi-stage builds, creates docker-compose.yml with the right services, and sets up your environment configs.

The difference is context and execution. Gordon knows what’s running on your machine. It can read your Docker state, access your filesystem, and take action. It’s not guessing – it’s working with your actual environment.

What Gordon Does

Debug and fix – Container won’t start. Service is unhealthy. Something is consuming all the memory. Gordon inspects logs, checks container status, identifies root cause, and proposes fixes. You approve, it executes.

Build and containerize – Take this application and make it run in Docker. Gordon examines your project, generates production-ready Dockerfiles with multi-stage builds, creates docker-compose.yml with the right services, handles environment configs and dependencies.

Execute and manage – Clean up disk space. Stop all containers. Pull and run specific images. Routine Docker operations should be conversational, not a trip to the docs.

Develop and optimize – Add health checks. Implement multi-stage builds. Apply security best practices. Reduce image sizes. Make existing Docker setups production-ready.

Gordon handles all of it.

Figure 3: Split screen showing Gordon debugging a mongodb container

How Gordon Works

Gordon is built on cagent, Docker’s agent framework included with Docker Desktop, and runs locally within Docker Desktop. It has access to:

Your shell – Can execute commands after approval
Your filesystem – Reads project structure, configs, logs

Docker CLI – Full access to Docker operations
Docker knowledge base – Documentation, best practices, common patterns

You can configure Gordon’s working directory to point to a specific codebase. This gives Gordon full context on your project structure, dependencies, and existing Docker setup.

The permission model is straightforward: Gordon shows you what it wants to do, you approve or reject, then it executes. Every command. Every file update. Every Docker operation. You’re not watching passively – you’re directing an agent that knows Docker inside and out.

Figure 4: Permissions request

Where to Find Gordon

Docker Desktop: Look for the Gordon icon in the left sidebar

CLI: Run docker ai from your terminal

Get started today

Download Docker Desktop 4.61+
Log in with your Docker account
Click the Gordon icon, select a project directory, and ask “Optimize my Dockerfile”
Explore the full documentation in Docker Docs

Gordon is available now in Docker Desktop 4.61 and later

State of Agentic AI Report: Key Findings

Yiwen Xu — Fri, 20 Feb 2026 17:18:29 +0000

Based on Docker’s State of Agentic AI report, a global survey of more than 800 developers, platform engineers, and technology decision-makers, this blog summarizes key findings of what’s really happening as agentic AI scales within organizations. Drawing on insights from decision-makers and purchase influencers worldwide, we’ll give you a preview on not only where teams are seeing early wins but also what’s still missing to move from experimentation to enterprise-grade adoption.

Rapid adoption, early maturity

60% of organizations already have AI agents in production, and 94% view building agents as a strategic priority, but most deployments remain internal and focused on productivity and operational efficiency.

Security and complexity are the top barriers

40% of respondents cite security as the #1 challenge in scaling agentic AI, with 45% struggling to ensure tools are secure and enterprise-ready. Technical complexity compounds the challenge. One in three organizations (33%) report orchestration difficulties as multi-model and multi-cloud environments proliferate (79% of organizations run agents across two or more environments).

MCP shows promise but isn’t enterprise-ready

85% of teams are familiar with the Model Context Protocol (MCP), yet most report significant security, configuration, and manageability issues that prevent production-scale deployment.

Want the full picture? Download the latest State of Agentic AI report to explore deeper insights and practical recommendations for scaling agentic AI in your organization.

Fear of vendor lock-in is real

Enterprises worry about dependencies in core agent and agentic infrastructure layers such as model hosting, LLM providers, and even cloud platforms. Seventy-six percent of global respondents report active concerns about vendor lock-in, rising to 88% in France, 83%
in Japan, and 82% in the UK.

Containerization remains foundational

94% use containers for agent development or production, and 98% follow the same cloud-native workflows as traditional software, establishing containers as the proven substrate for agentic AI infrastructure.

Long-term outlook

Rather than a “year of the agents,” the data points to a decade-long transformation. Organizations are laying the governance and trust foundations now for scalable, enterprise-grade agent ecosystems.

The path forward

The path forward doesn’t require reinvention so much as consolidation around a trust layer: access to trusted content and components that can be safely discovered and reused; secure-by-default runtimes; standardized orchestration and policy; and portable, auditable packaging.

Agentic AI’s near-term value is already real in internal workflows; unlocking the next wave depends on standardizing how we secure, orchestrate, and ship agents. Teams that invest now in this trust layer, on top of the container foundations they already know, will be first to scale agents from local productivity to durable, enterprise-wide outcomes.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise.

Learn more:

Get your copy of the latest State of Agentic AI report!
Learn more about Docker’s AI solutions
Subscribe to our Developer Newsletter to get the latest news

How to solve the context size issues with context packing with Docker Model Runner and Agentic Compose

Yiwen Xu — Fri, 13 Feb 2026 13:57:36 +0000

If you’ve worked with local language models, you’ve probably run into the context window limit, especially when using smaller models on less powerful machines. While it’s an unavoidable constraint, techniques like context packing make it surprisingly manageable.

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. In my previous blog post, I wrote about how to make a very small model useful by using RAG. I had limited the message history to 2 to keep the context length short.

But in some cases, you’ll need to keep more messages in your history. For example, a long conversation to generate code:

- generate an http server server in golang
- add a human structure and a list of humans
- add a handler to add a human to the list
- add a handler to list all humans
- add a handler to get a human by id
- etc...

Let’s imagine we have a conversation for which we want to keep 10 messages in the history. Moreover, we’re using a very verbose model (which a lot of tokens), so we’ll quickly encounter this type of error:

error: {
    code: 400,
    message: 'request (8860 tokens) exceeds the available context size (8192 tokens), try increasing it',
    type: 'exceed_context_size_error',
    n_prompt_tokens: 8860,
    n_ctx: 8192
  },
  code: 400,
  param: undefined,
  type: 'exceed_context_size_error'
}

What happened?

Understanding context windows and their limits in local LLMs

Our LLM has a context window, which has a limited size. This means that if the conversation becomes too long… It will bug out.

This window is the total number of tokens the model can process at once, like a short-term working memory. Read this IBM article for a deep dive on context window

In our example in the code snippet above, this size was set to 8192 tokens for LLM engines that power local LLM, like Docker Model Runner, Ollama, Llamacpp, …

This window includes everything: system prompt, user message, history, injected documents, and the generated response. Refer to this Redis post for more info.

Example: if the model has 32k context, the sum (input + history + generated output) must remain ≤ 32k tokens. Learn more here.

It’s possible to change the default context size (up or down) in the compose.yml file:

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m
    # Increased context size for better handling of larger inputs
    context_size: 16384

You can also do this with Docker with the following command: docker model configure –context-size 8192 ai/qwen2.5-coder `

And so we solve the problem, but only part of the problem. Indeed, it’s not guaranteed that your model supports a larger context size (like 16384), and even if it does, it can very quickly degrade the model’s performance.

Thus, with hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, when the number of tokens in the context approaches 16384 tokens, generation can become (much) slower (at least on my machine). Again, this will depend on the model’s capacity (read its documentation). And remember, the smaller the model, the harder it will be to handle a large context and stay focused.

Tips: always provide an option (a /clear command for example) in your application to empty the message list, or to reduce it. Automatic or manual. Keep the initial system instructions though.

So we’re at an impasse. How can we go further with our small models?

Well, there is still a solution, which is called context packing.

Using context packing to fit more information into limited context windows

We can’t indefinitely increase the context size. To still manage to fit more information in the context, we can use a technique called “context packing”, which consists of having the model itself summarize previous messages (or entrust the task to another model), and replace the history with this summary and thus free up space in the context.

So we decide that from a certain token limit, we’ll have the history of previous messages summarized, and replace this history with the generated summary.

I’ve therefore modified my example to add a context packing step. For the exercise, I decided to use another model to do the summarization.

Modification of the compose.yml file

I added a new model in the compose.yml file: ai/qwen2.5:1.5B-F16

models:
  chat-model:
    model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

  embedding-model:
    model: ai/embeddinggemma:latest

  context-packing-model:
    model: ai/qwen2.5:1.5B-F16

Then:

I added the model in the models section of the service that runs our program.
I increased the number of messages in the history to 10 (instead of 2 previously).
I set a token limit at 5120 before triggering context compression.
And finally, I defined instructions for the “context packing” model, asking it to summarize previous messages.

excerpt from the service:

golang-expert-v3:
build:
    context: .
    dockerfile: Dockerfile
environment:

    HISTORY_MESSAGES: 10
    TOKEN_LIMIT: 5120
    # ...
   
configs:
    - source: system.instructions.md
    target: /app/system.instructions.md
    - source: context-packing.instructions.md
    target: /app/context-packing.instructions.md

models:
    chat-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CHAT

    context-packing-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_CONTEXT_PACKING

    embedding-model:
    endpoint_var: MODEL_RUNNER_BASE_URL
    model_var: MODEL_RUNNER_LLM_EMBEDDING

You’ll find the complete version of the file here: compose.yml

System instructions for the context packing model

Still in the compose.yml file, I added a new system instruction for the “context packing” model, in a context-packing.instructions.md file:

context-packing.instructions.md:
content: |\
    You are a context packing assistant.
    Your task is to condense and summarize provided content to fit within token limits while preserving essential information.
    Always:
    - Retain key facts, figures, and concepts
    - Remove redundant or less important details
    - Ensure clarity and coherence in the condensed output
    - Aim to reduce the token count significantly without losing critical information

    The goal is to help fit more relevant information into a limited context window for downstream processing.

All that’s left is to implement the context packing logic in the assistant’s code.

Applying context packing to the assistant’s code

First, I define the connection with the context packing model in the Setup part of my assistant:

const contextPackingModel = new ChatOpenAI({
  model: process.env.MODEL_RUNNER_LLM_CONTEXT_PACKING || `ai/qwen2.5:1.5B-F16`,
  apiKey: "",
  configuration: {
    baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
  },
  temperature: 0.0,
  top_p: 0.9,
  presencePenalty: 2.2,
});

I also retrieve the system instructions I defined for this model, as well as the token limit:

let contextPackingInstructions = fs.readFileSync('/app/context-packing.instructions.md', 'utf8');

let tokenLimit = parseInt(process.env.TOKEN_LIMIT) || 7168

Once in the conversation loop, I’ll estimate the number of tokens consumed by previous messages, and if this number exceeds the defined limit, I’ll call the context packing model to summarize the history of previous messages and replace this history with the generated summary (the assistant-type message: [“assistant”, summary]). Then I continue generating the response using the main model.

excerpt from the conversation loop:

 let estimatedTokenCount = messages.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);
  console.log(` Estimated token count for messages: ${estimatedTokenCount} tokens`);

  if (estimatedTokenCount >= tokenLimit) {
    console.log(` Warning: Estimated token count (${estimatedTokenCount}) exceeds the model's context limit (${tokenLimit}). Compressing conversation history...`);

    // Calculate original history size
    const originalHistorySize = history.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);

    // Prepare messages for context packing
    const contextPackingMessages = [
      ["system", contextPackingInstructions],
      ...history,
      ["user", "Please summarize the above conversation history to reduce its size while retaining important information."]
    ];

    // Generate summary using context packing model
    console.log(" Generating summary with context packing model...");
    let summary = '';
    const summaryStream = await contextPackingModel.stream(contextPackingMessages);
    for await (const chunk of summaryStream) {
      summary += chunk.content;
      process.stdout.write('\x1b[32m' + chunk.content + '\x1b[0m');
    }
    console.log();

    // Calculate compressed size
    const compressedSize = Math.ceil(summary.length / 4);
    const reductionPercentage = ((originalHistorySize - compressedSize) / originalHistorySize * 100).toFixed(2);

    console.log(` History compressed: ${originalHistorySize} tokens → ${compressedSize} tokens (${reductionPercentage}% reduction)`);

    // Replace all history with the summary
    conversationMemory.set("default-session-id", [["assistant", summary]]);

    estimatedTokenCount = compressedSize

    // Rebuild messages with compressed history
    messages = [
      ["assistant", summary],
      ["system", systemInstructions],
      ["system", knowledgeBase],
      ["user", userMessage]
    ];
  }

You’ll find the complete version of the code here: index.js

All that’s left is to test our assistant and have it hold a long conversation, to see context packing in action.

docker compose up --build -d
docker compose exec golang-expert-v3 node index.js

And after a while in the conversation, you should see the warning message about the token limit, followed by the summary generated by the context packing model, and finally, the reduction in the number of tokens in the history:

Estimated token count for messages: 5984 tokens
Warning: Estimated token count (5984) exceeds the model's context limit (5120). Compressing conversation history...
Generating summary with context packing model...
Sure, here's a summary of the conversation:

1. The user asked for an example in Go of creating an HTTP server.
2. The assistant provided a simple example in Go that creates an HTTP server and handles GET requests to display "Hello, World!".
3. The user requested an equivalent example in Java.
4. The assistant presented a Java implementation that uses the `java.net.http` package to create an HTTP server and handle incoming requests.

The conversation focused on providing examples of creating HTTP servers in both Go and Java, with the goal of reducing the token count while retaining essential information.
History compressed: 4886 tokens → 153 tokens (96.87% reduction)

This way, we ensure that our assistant can handle a long conversation while maintaining good generation performance.

Summary

The context window is an unavoidable constraint when working with local language models, particularly with small models and on machines with limited resources. However, by using techniques like context packing, you can easily work around this limitation. Using Docker Model Runner and Agentic Compose, you can implement this pattern to support long, verbose conversations without overwhelming your model.

All the source code is available on Codeberg: context-packing. Give it a try!

Get Started with the Atlassian Rovo MCP Server Using Docker

Yiwen Xu — Wed, 04 Feb 2026 13:52:53 +0000

We’re excited to announce that the remote Atlassian Rovo MCP server is now available in Docker’s MCP Catalog and Toolkit, making it easier than ever to connect AI assistants to Jira and Confluence. With just a few clicks, technical teams can use their favorite AI agents to create and update Jira issues, epics, and Confluence pages without complex setup or manual integrations.

In this post, we’ll show you how to get started with the Atlassian remote MCP server in minutes and how to use it to automate everyday workflows for product and engineering teams.

Figure 1: Discover over 300+ MCP servers including the remote Atlassian MCP server in Docker MCP Catalog.

What is the Atlassian Rovo MCP Server?

Like many teams, we rely heavily on Atlassian tools, especially Jira to plan, track, and ship product and engineering work. The Atlassian Rovo MCP server enables AI assistants and agents to interact directly with Jira and Confluence, closing the gap between where work happens and how teams want to use AI.

With the Atlassian Rovo MCP server, you can:

Create and update Jira issues and epics
Generate and edit Confluence pages
Use your preferred AI assistant or agent to automate everyday workflows

Traditionally, setting up and configuring MCP servers can be time-consuming and complex. Docker removes that friction, making it easy to get up and running securely in minutes.

Enable the Atlassian Rovo MCP Server with One Click

Docker’s MCP Catalog is a curated collection of 300+ MCP servers, including both local and remote options. It provides a reliable starting point for developers building with MCP so you don’t have to wire everything together yourself.

Prerequisites

Before you begin, make sure you have:
A machine with 8GB RAM minimum, ideally 16GB
Install Docker Desktop

To get started with the Atlassian remote MCP server:

Open Docker Desktop and click on the MCP Toolkit tab.
Navigate to Docker MCP Catalog
Search for the Atlassian Rovo MCP server.
Select the remote version with cloud icon
Enable it with a single click

That’s it. No manual installs. No dependency wrangling.

Why use the Atlassian Rovo MCP server with Docker

Demo by Cecilia Liu: Set up the Atlassian Rovo MCP server with Docker with just a few clicks and use it to generate Jira epics with Claude Desktop

Seamless Authentication with Built-in OAuth

The Atlassian Rovo MCP server uses Docker’s built-in OAuth, so authorization is seamless. Docker securely manages your credentials and allows you to reuse them across multiple MCP clients. You authenticate once, and you’re good to go.

Behind the scenes, this frictionless experience is powered by the MCP Toolkit, which handles environment setup and dependency management for you.

Works with Your Favorite AI Agent

Once the Atlassian Rovo MCP server is enabled, you can connect it to any MCP-compatible client.

For popular clients like Claude Desktop, Claude Code, Codex, or Gemini CLI, connecting is just one click. Just click Connect, restart Claude Desktop, and now we’re ready to go.

From there, we can ask Claude to:

Write a short PRD about MCP
Turn that PRD into Jira epics and stories
Review the generated epics and confirm they’re correct

And just like that, Jira is updated.

One Setup, Any MCP Client

Sometimes AI assistants have hiccups. Maybe you hit a daily usage limit in one tool. That’s not a blocker here.

Because the Atlassian Rovo MCP server is connected through the Docker MCP Toolkit, the setup is completely client-agnostic. Switching to another assistant like Gemini CLI or Cursor is as simple as clicking Connect. No need for reconfiguration or additional setup!

Now we can ask any connected AI assistant such as Gemini CLI to, for example, check all new unassigned Jira tickets. It just works.

Coming Soon: Share Atlassian-Based Workflows Across Teams

We’re working on new enhancements that will make Atlassian-powered workflows even more powerful and easy to share. Soon, you’ll be able to package complete workflows that combine MCP servers, clients, and configurations. Imagine a workflow that turns customer feedback into Jira tickets using Atlassian and Confluence, then shares that entire setup instantly with your team or across projects. That’s where we’re headed.

Frequently Asked Questions (FAQ)

What is the Atlassian Rovo MCP server?

The Atlassian MCP Rovo server enables AI assistants and agents to securely interact with Jira and Confluence. It allows AI tools to create and update Jira issues and epics, generate and edit Confluence pages, and automate everyday workflows for product and engineering teams.

How do I use the Atlassian Rovo MCP server with Docker?

You can enable the Atlassian Rovo MCP server directly from Docker Desktop or CLI. Simply open the MCP Toolkit tab, search for the Atlassian MCP server, select the remote version, and enable it with one click. Connect to any MCP-compatible client. For popular tools like Claude Code, Codex, and Gemini, setup is even easier with one-click integration.

Why use Docker to run the Atlassian Rovo MCP server?

Using Docker to run the Atlassian Rovo MCP server removes the complexity of setup, authentication, and client integration. Docker provides one-click enablement through the MCP Catalog, built-in OAuth for secure credential management, and a client-agnostic MCP Toolkit that lets teams connect any AI assistant or agent without reconfiguration so you can focus on automating Jira and Confluence workflows instead of managing infrastructure.

Less Setup. Less Context Switching. More Work Shipped.

That’s how easy it is to set up and use the Atlassian Rovo MCP server with Docker. By combining the MCP Catalog and Toolkit, Docker removes the friction from connecting AI agents to the tools teams already rely on.

Learn more

Get started with MCP Catalog and Toolkit
Explore the Docker MCP Catalog: Discover containerized, security-hardened MCP servers
Read more about the Docker MCP Toolkit: Official Documentation

The 3Cs: A Framework for AI Agent Security

Srini Sekaran — Wed, 04 Feb 2026 02:02:19 +0000

Every time execution models change, security frameworks need to change with them. Agents force the next shift.

The Unattended Laptop Problem

No developer would leave their laptop unattended and unlocked. The risk is obvious. A developer laptop has root-level access to production systems, repositories, databases, credentials, and APIs. If someone sat down and started using it, they could review pull requests, modify files, commit code, and access anything the developer can access.

Yet this is how many teams are deploying agents today. Autonomous systems are given credentials, tools, and live access to sensitive environments with minimal structure. Work executes in parallel and continuously, at a pace no human could follow. Code is generated faster than developers can realistically review, and they cannot monitor everything operating on their behalf.

Once execution is parallel and continuous, the potential for mistakes or cascading failures scales quickly. Teams will continue to adopt agents because the gains are real. What remains unresolved is how to make this model safe enough to operate without requiring manual approval for every action. Manual approval slows execution back down to human speed and eliminates the value of agents entirely. And consent fatigue is real.

Why AI Agents Break Existing Governance

Traditional security controls were designed around a human operator. A person sits at the keyboard, initiates actions deliberately, and operates within organizational and social constraints. Reviews worked because there was time between intent and execution. Perimeter security protected the network boundary, while automated systems operated within narrow execution limits.

But traditional security assumes something deeper: that a human is operating the machine. Firewalls trust the laptop because an employee is using it. VPNs trust the connection because an engineer authenticated. Secrets managers grant access because a person requested it. The model depends on someone who can be held accountable and who operates at human speed.

Agents break this assumption. They act directly, reading repositories, calling APIs, modifying files, using credentials. They have root-level privileges and execute actions at machine speed.

Legacy controls were never intended for this. The default response has been more visibility and approvals, adding alerts, prompts, and confirmations for every action. This does not scale and generates “consent fatigue”, annoying developers and undermining the very security it seeks to enforce. When agents execute hundreds of actions in parallel, humans cannot review them meaningfully. Warnings become noise.

AI Governance and the Execution Layer: The Three Cs Framework

Each major shift in computing has moved security closer to execution. Agents follow the same trajectory. If agents execute, security must operate at the agentic execution layer.

That shift maps governance to three structural requirements: the 3Cs.

Contain: Bound the Blast Radius

Every execution model relies on isolation. Processes required memory protection. Virtual machines required hypervisors. Containers required namespaces. Agents require an equivalent boundary. Containment limits failure so mistakes made by an agent don’t have permanent consequences for your data, workflows, and business. Unlocking full agent autonomy requires the confidence that experimentation won’t be reckless. . Without it, autonomous execution fails.

Curate: Define the Agent’s Environment

What an agent can do is determined by what exists in its environment. The tools it can invoke, the code it can see, the credentials it can use, the context it operates within. All of this shapes execution before the agent acts.

Curation isn’t approval. It is construction. You are not reviewing what the agent wants to do. You are defining the world it operates in. Agents do not reason about your entire system. They act within the environment they are given. If that environment is deliberate, execution becomes predictable. If it is not, you have autonomy without structure, which is just risk.

Control: Enforce Boundaries in Real Time

Governance that exists only on paper has no effect on autonomous systems. Rules must apply as actions occur. File access, network calls, tool invocation, and credential use require runtime enforcement. This is where alert-based security breaks down. Logging and warnings explain what happened or ask permission after execution is already underway.

Control determines what can happen, when, where, and who has the privilege to make it happen. Properly executed control does not remove autonomy. It defines its limits and removes the need for humans to approve every action under pressure. If this sounds like a policy engine, you aren’t wrong. But this must be dynamic and adaptable, able to keep pace with an agentic workforce.

Putting the 3Cs Into Practice

The three Cs reinforce one another. Containment limits the cost of failure. Curation narrows what agents can attempt and makes them more useful to developers by applying semantic knowledge to craft tools and context to suit the specific environment and task. Control at the runtime layer replaces reactive approval with structural enforcement.

In practice, this work falls to platform teams. It means standardized execution environments with isolation by default, curated tool and credential surfaces aligned to specific use cases, and policy enforcement that operates before actions complete rather than notifying humans afterward. Teams that build with these principles can use agents effectively without burning out developers or drowning them in alerts. Teams that do not will discover that human attention is not a scalable control plane.

Docker Sandboxes: Run Claude Code and Other Coding Agents Unsupervised (but Safely)

Srini Sekaran — Fri, 30 Jan 2026 23:39:54 +0000

We introduced Docker Sandboxes in experimental preview a few months ago. Today, we’re launching the next evolution with microVM isolation, available now for macOS and Windows.

We started Docker Sandboxes to answer the question:

How do I run Claude Code or Gemini CLI safely?

Sandboxes provide disposable, isolated environments purpose-built for coding agents. Each agent runs in an isolated version of your development environment, so when it installs packages, modifies configurations, deletes files, or runs Docker containers, your host machine remains untouched.

This isolation lets you run agents like Claude Code, Codex CLI, Copilot CLI, Gemini CLI, and Kiro with autonomy. Since they can’t harm your computer, let them run free.

Since our first preview, Docker Sandboxes have evolved. They’re now more secure, easier to use, and more powerful.

Level 4 Coding Agent Autonomy

Claude Code and other coding agents fundamentally change how developers write and maintain code. But a practical question remains: how do you let an agent run unattended (without constant permission prompts), while still protecting your machine and data?

Most developers quickly run into the same set of problems trying to solve this:

OS-level sandboxing interrupts workflows and isn’t consistent across platforms
Containers seem like the obvious answer, until the agent needs to run Docker itself
Full VMs work, but are slow, manual, and hard to reuse across projects

We started building Docker Sandboxes specifically to fill this gap.

Docker Sandboxes: MicroVM-Based Isolation for Coding Agents

Defense-in-depth, isolation by default

Each agent runs inside a dedicated microVM
Only your project workspace is mounted into the sandbox
Hypervisor-based isolation significantly reduces host risk

A real development environment

Agents can install system packages, run services, and modify files
Workflows run unattended, without constant permission approvals

Safe Docker access for coding agents

Coding agents can build and run Docker containers inside the MicroVM
They have no access to the host Docker daemon

One sandbox, many coding agents

Use the same sandbox experience with Claude Code, Copilot CLI, Codex CLI, Gemini CLI, and Kiro
More to come (and we’re taking requests!)

Fast reset, no cleanup

If an agent goes off the rails, delete the sandbox and spin up a fresh one in seconds

What’s New Since the Preview and What’s Next

The experimental preview validated the core idea: coding agents need an execution environment with clear isolation boundaries, not a stream of permission prompts. The early focus was developer experience, making it easy to spin up an environment that felt natural and productive for real workflows.

As Matt Pocock put it, “Docker Sandboxes have the best DX of any local AI coding sandbox I’ve tried.”

With this release, we’re making Sandboxes more powerful and secure with no compromise on developer experience.

What’s New

MicroVM-based isolation
Sandboxes now run on dedicated microVMs, adding a hard security boundary.
Network isolation with allow and deny lists
Control over coding agent network access.
Secure Docker execution for agents
Docker Sandboxes are the only sandboxing solution we’re aware of that allows coding agents to build and run Docker containers while remaining isolated from the host system.

What’s Next

We’re continuing to expand Docker Sandboxes based on developer feedback:

Linux support
MCP Gateway support
Ability to expose ports to the host device and access host-exposed services
Support for additional coding agents

Docker Sandboxes were made for developers who want to run coding agents unattended, experiment freely, and recover instantly when something goes wrong. They extend the usability of containers’ isolation principles but with hard boundaries.

If you’ve been holding back on using agents because of permission prompts, system risk, or Docker-in-Docker limitations, Docker Sandboxes are built to remove those constraints.

We’re iterating quickly, and feedback from real-world usage will directly shape what comes next.