Community – Docker

Building AI Teams: How Docker Sandboxes and Docker Agent Transform Development

Jennifer Kohl — Wed, 11 Mar 2026 13:00:00 +0000

It’s 11 PM. You’ve got a JIRA ticket open, an IDE with three unsaved files, a browser tab on Stack Overflow, and another on documentation. You’re context-switching between designing UI, writing backend APIs, fixing bugs, and running tests. You’re wearing all the hats, product manager, designer, engineer, QA specialist, and it’s exhausting.

What if instead of doing it all yourself, you could describe the goal and have a team of specialized AI agents handle it for you?

One agent breaks down requirements, another designs the interface, a third builds the backend, a fourth tests it, and a fifth fixes any issues. Each agent focuses on what it does best, working together autonomously while you sip your coffee.That’s not sci-fi, it’s what Agent + Docker Sandboxes delivers today.

What is Docker Agent?

Docker Agent is an open source tool for building teams of specialized AI agents. Instead of prompting one general-purpose model to do everything, you define agents with specific roles that collaborate to solve complex problems.

Here’s a typical dev-team configuration:

agents:
root:
model: openai/gpt-5
description: Product Manager - Leads the development team and coordinates iterations
instruction: |
Break user requirements into small iterations. Coordinate designer → frontend → QA.
- Define feature and acceptance criteria
- Ensure iterations deliver complete, testable features
- Prioritize based on value and dependencies
sub_agents: [designer, awesome_engineer, qa, fixer_engineer]
toolsets:
- type: filesystem
- type: think
- type: todo
- type: memory
path: dev_memory.db

designer:
model: openai/gpt-5
description: UI/UX Designer - Creates user interface designs and wireframes
instruction: |
Create wireframes and mockups for features. Ensure responsive, accessible designs.
- Use consistent patterns and modern principles
- Specify colors, fonts, interactions, and mobile layout
toolsets:
- type: filesystem
- type: think
- type: memory
path: dev_memory.db

qa:
model: openai
description: QA Specialist - Analyzes errors, stack traces, and code to identify bugs
instruction: |
Analyze error logs, stack traces, and code to find bugs. Explain what's wrong and why it's happening.
- Review test results, error messages, and stack traces
.......

awesome_engineer:
model: openai
description: Awesome Engineer - Implements user interfaces based on designs
instruction: |
Implement responsive, accessible UI from designs. Build backend APIs and integrate.
..........

fixer_engineer:
model: openai
description: Test Integration Engineer - Fixes test failures and integration issues
instruction: |
Fix test failures and integration issues reported by QA.
- Review bug reports from QA

The root agent acts as product manager, coordinating the team. When a user requests a feature, root delegates to designer for wireframes, then awesome_engineer for implementation, qa for testing, and fixer_engineer for bug fixes. Each agent uses its own model, has its own context, and accesses tools like filesystem, shell, memory, and MCP servers.

Agent Configuration

Each agent is defined with five key attributes:

model: The AI model to use (e.g., openai/gpt-5, anthropic/claude-sonnet-4-5). Different agents can use different models optimized for their tasks.
description: A concise summary of the agent’s role. This helps Docker Agent understand when to delegate tasks to this agent.
instruction: Detailed guidance on what the agent should do. Includes workflows, constraints, and domain-specific knowledge.
sub_agents: A list of agents this agent can delegate work to. This creates the team hierarchy.
toolsets: The tools available to the agent. Built-in options include filesystem (read/write files), shell (run commands), think (reasoning), todo (task tracking), memory (persistent storage), and mcp (external tool connections).

This configuration system gives you fine-grained control over each agent’s capabilities and how they coordinate with each other.

Why Agent Teams Matter

One agent handling complex work means constant context-switching. Split the work across focused agents instead, each handles what it’s best at. Docker Agent manages the coordination.

The benefits are clear:

Specialization: Each agent is optimized for its role (design vs. coding vs. debugging)
Parallel execution: Multiple agents can work on different aspects simultaneously
Better outcomes: Focused agents produce higher quality work in their domain
Maintainability: Clear separation of concerns makes teams easier to debug and iterate

The Problem: Running AI Agents Safely

Agent teams are powerful, but they come with a serious security concern. These agents need to:

Read and write files on your system
Execute shell commands (npm install, git commit, etc.)
Access external APIs and tools
Run potentially untrusted code

Giving AI agents full access to your development machine is risky. A misconfigured agent could delete files, leak secrets, or run malicious commands. You need isolation, agents should be powerful but contained.

Traditional virtual machines are too heavy. Chroot jails are fragile. You need something that provides:

Strong isolation from your host machine
Workspace access so agents can read your project files
Familiar experience with the same paths and tools
Easy setup without complex networking or configuration

Docker Sandboxes: The Secure Foundation

Docker Sandboxes solves this by providing isolated environments for running AI agents. As of Docker Desktop 4.60+, sandboxes run inside dedicated microVMs, providing a hard security boundary beyond traditional container isolation. When you run docker sandbox run , Docker creates an isolated microVM workspace that:

Mounts your project directory at the same absolute path (on Linux and macOS)
Preserves your Git configuration for proper commit attribution
Does not inherit environment variables from your current shell session
Gives agents full autonomy without compromising your host
Provides network isolation with configurable allow/deny lists

Docker Sandboxes now natively supports six agent types: Claude Code, Gemini, Codex, Copilot, Agent, and Kiro (all experimental). Agent can be launched directly as a sandbox agent:

# Run Agent natively in a sandbox
docker sandbox create agent ~/path/to/workspace
docker sandbox run agent ~/path/to/workspace

Or, for more control, use a detached sandbox:

# Create a sandbox
docker sandbox run -d --name my-agent-sandbox claude

# Copy agent into the sandbox
docker cp /usr/bin/agent :/usr/bin/agent

# Run your agent team
docker exec -it  bash -c "cd /path/to/workspace && agent run dev-team.yaml"

Your workspace /Users/alice/projects/myapp on the host is also /Users/alice/projects/myapp inside the microVM. Error messages, scripts with hard-coded paths, and relative imports all work as expected. But the agent is contained in its own microVM, it can’t access files outside the mounted workspace, and any damage it causes is limited to the sandbox.

Why Docker Sandboxes Matter

The combination of agents and Docker Sandboxes gives you something powerful:

Full agent autonomy: Agents can install packages, run tests, make commits, and use tools without constant human oversight
Complete safety: Even if an agent makes a mistake, it’s contained within the microVM sandbox
Hard security boundary: MicroVM isolation goes beyond containers, each sandbox runs in its own virtual machine
Network control: Allow/deny lists let you restrict which external services agents can access
Familiar experience: Same paths, same tools, same workflow as working directly on your machine
Workspace persistence: Changes sync between host and microVM, so your work is always available

Here’s how the workflow looks in practice:

User requests a feature to the root agent: “Create a bank app with Gradio”
Root creates a todo list and delegates to the designer
Designer generates wireframes and UI specifications
Awesome_engineer implements the code, running pip install gradio and python app/main.py
QA runs tests, finds bugs, and reports them
Fixer_engineer resolves the issues
Root confirms all tests pass and marks the feature complete

All of this happens autonomously inside a sandboxed environment. The agents can install dependencies, modify files, and execute commands, but they’re isolated from your host machine.

Try It Yourself

Let’s walk through setting up a simple agent team in a Docker Sandbox.

Prerequisites

Docker Desktop 4.60+ with sandbox support (microVM-based isolation)
agent (included in Docker Desktop 4.49+)
API key for your model provider (Anthropic, OpenAI, or Google)

Step 1: Create Your Agent Team

Save this configuration as dev-team.yaml:

models:
 openai:
   provider: openai
   model: gpt-5

agents:
 root:
   model: openai
   description: Product Manager - Leads the development team
   instruction: |
     Break user requirements into small iterations. Coordinate designer → frontend → QA.
   sub_agents: [designer, awesome_engineer, qa]
   toolsets:
     - type: filesystem
     - type: think
     - type: todo

 designer:
   model: openai
   description: UI/UX Designer - Creates designs and wireframes
   instruction: |
     Create wireframes and mockups for features. Ensure responsive designs.
   toolsets:
     - type: filesystem
     - type: think

 awesome_engineer:
   model: openai
   description: Developer - Implements features
   instruction: |
     Build features based on designs. Write clean, tested code.
   toolsets:
     - type: filesystem
     - type: shell
     - type: think

 qa:
   model: openai
   description: QA Specialist - Tests and identifies bugs
   instruction: |
     Test features and identify bugs. Report issues to fixer.
   toolsets:
     - type: filesystem
     - type: think

Step 2: Create a Docker Sandbox

The simplest approach is to use agent as a native sandbox agent:

# Run agent directly in a sandbox (experimental)
docker sandbox run agent ~/path/to/your/workspace

Alternatively, use a detached Claude sandbox for more control:

# Start a detached sandbox
docker sandbox run -d --name my-dev-sandbox claude

# Copy agent into the sandbox
which agent  # Find the path on your host
docker cp $(which agent) $(docker sandbox ls --filter name=my-dev-sandbox -q):/usr/bin/agent

Step 3: Set Environment Variables

# Run agent with your API key (passed inline since export doesn't persist across exec calls)
docker exec -it -e OPENAI_API_KEY=your_key_here my-dev-sandbox bash

Step 4: Run Your Agent Team

# Mount your workspace and run agent
docker exec -it my-dev-sandbox bash -c "cd /path/to/your/workspace && agent run dev-team.yaml"

Now you can describe what you want to build, and your agent team will handle the rest:

User: Create a bank application using Python. The bank app should have basic functionality like account savings, show balance, withdraw, add money, etc. Build the UI using Gradio. Create a directory called app, and inside of it, create all of the files needed by the project

Agent (root): I'll break this down into iterations and coordinate with the team...

Watch as the designer creates wireframes, the engineer builds the Gradio app, and QA tests it, all autonomously in a secure sandbox.

Final result from a one shot prompt

Step 5: Clean Up

When you’re done:

# Remove the sandbox
docker sandbox rm my-dev-sandbox

Docker enforces one sandbox per workspace. Running docker sandbox run in the same directory reuses the existing container. To change configuration, remove and recreate the sandbox.

Current Limitations

Docker Sandboxes and Docker Agent are evolving rapidly. Here are a few things to know:

Docker Sandboxes now supports six agent types natively: Claude Code, Gemini, Codex, Copilot, agent, and Kiro. All are experimental and breaking changes may occur between Docker Desktop versions.
Custom Shell that doesn’t include a pre-installed agent binary. Instead, it provides a clean environment where you can install and configure any agent or tool
MicroVM sandboxes require macOS or Windows. Linux users can use legacy container-based sandboxes with Docker Desktop 4.57+
API keys may still need manual configuration depending on the agent type
Sandbox templates are optimized for certain workflows; custom setups may require additional configuration

Why This Matters Now

AI agents are becoming more capable, but they need infrastructure to run safely and effectively. The combination of agent and Docker Sandboxes addresses this by:

Feature	Traditional Approach	With agent + Docker Sandboxes
Autonomy	Limited – requires constant oversight	High – agents work independently
Security	Risky – agents have host access	Isolated – agents run in microVMs
Specialization	One model does everything	Multiple agents with focused roles
Reproducibility	Inconsistent across machines	MicroVM-isolated, version-controlled
Scalability	Manual coordination	Automated team orchestration

This isn’t just about convenience, it’s about enabling AI agents to do real work in production environments, with the safety guarantees that developers expect.

What’s Next

Explore the Docker Agent documentation to build your own agent teams
Check out Docker Sandboxes for advanced configurations
Browse example agent configurations in the agent repository
Integrate agent with your editor or use agents as tools in MCP clients

Conclusion

We’re moving from “prompting AI to write code” to “orchestrating AI teams to build software.” agent gives you the team structure; Docker Sandboxes provides the secure foundation.

The days of wearing every hat as a solo developer are numbered. With specialized AI agents working in isolated containers, you can focus on what matters, designing great software, while your AI team handles the implementation, testing, and iteration.

Try it out. Build your own agent team. Run it in a Docker Sandbox. See what happens when you have a development team at your fingertips, ready to ship features while you grab lunch.

From the Captain’s Chair: Kristiyan Velkov

Jennifer Kohl — Tue, 24 Feb 2026 14:00:00 +0000

Docker Captains are leaders from the developer community that are both experts in their field and are passionate about sharing their Docker knowledge with others. “From the Captain’s Chair” is a blog series where we get a closer look at one Captain to learn more about them and their experiences.

Today we are interviewing Kristiyan Velkov, a Docker Captain and Front-end Tech Lead with over a decade of hands-on experience in web development and DevOps.

Kristiyan builds applications with React, Next.js, Angular, and Vue.js, and designs modern front-end architectures. Over the years, Docker has become a core part of his daily work — used as a practical tool for building, testing, and deploying front-end applications in a predictable way.

He focuses on production-ready Docker setups for front-end teams, including clean Dockerfiles, multi-stage builds, and CI/CD pipelines that work consistently across environments. His work is grounded in real projects and long-term maintenance, not theoretical examples.

Kristiyan is the author of four technical books, one of which is “Docker for Front-end Developers”. He actively contributes to open-source projects and is the person behind several official Docker guides, including guides for React.js, Node.js, Angular, Vue.js, and related front-end technologies.

Through writing, open source,speaking and mentoring, he helps developers understand Docker better — explaining not just how things work, but why they are done a certain way.

As a Docker Captain, his goal is to help bridge the gap between front-end developers and DevOps teams.

Can you share how you first got involved with Docker?

I first started using Docker because I was tired of making the excuse “it works on my machine”. We didn’t have many DevOps people, and the ones we had didn’t really know the front-end or how the application was supposed to behave. At the same time, I didn’t know Docker. That made communication difficult and problems hard to debug.

As a front-end developer, I initially thought Docker wasn’t something I needed to care about. It felt like a DevOps concern. But setting up projects and making sure they worked the same everywhere kept causing issues. Docker solved that problem and completely changed the way I work.

At first, Docker wasn’t easy to understand. But the more I used it, the more I saw how much simpler things became. My projects started running the same across environments, and that consistency saved time and reduced stress.

Over time, my curiosity grew and I went deeper — learning how to design well-structured, production-ready Dockerfiles, optimize build performance, and integrate Docker into CI/CD pipelines following clear, proven best practices, not just setups that work, but ones that are reliable and maintainable long term.

For me, Docker has never been about trends. I started using it to reduce friction between teams and avoid recurring problems, and it has since become a core part of my daily work.

What inspired you to become a Docker Captain?

What inspired me to become a Docker Captain was the desire to share the real struggles I faced as a front-end developer. When I first started using Docker, I wasn’t looking for recognition or titles — I was just trying to fix the problems that were slowing me down and it was hard to explain to some DevOps developers what and why this should work like that without knowing the DevOps terms.

I clearly remember how exhausting it was to set up projects and how much time I wasted dealing with environment issues instead of real front-end work. Docker slowly changed the way I approached development and gave me a more reliable way to build and ship applications.

At some point, I realized I wasn’t the only one in this situation. Many front-end developers were avoiding Docker because they believed it was only meant for back-end or DevOps engineers. I wanted to change that perspective and show that Docker can be practical and approachable for front-end developers as well.

That’s also why I wrote the book Docker for Front-end Developers, where I explain Docker from a front-end perspective, using a real React.js application and walking through how to containerize and deploy it to AWS, with practical code examples and clear diagrams. The goal was to make Docker understandable and useful for people who build user-facing applications every day.

I also contributed official Docker guides for React.js, Angular, and Vue.js — not because I had all the answers, but because I remembered how difficult it felt when there was no clear guidance.

For me, becoming a Docker Captain was never about a title. It has always been about sharing what I’ve learned, building a bridge between front-end developers and containerization, and hopefully making someone else’s journey a little easier than mine.

What are some of your personal goals for the next year?

Over the next year, I want to continue writing books. Writing helps me structure my own knowledge, go deeper into the topics I work with, and hopefully make things clearer for other developers as well. I also want to push myself to speak at more conferences. Public speaking doesn’t come naturally to me, but it’s a good way to grow and to share real, hands-on experience with a broader audience and meet amazing people. I plan to keep contributing to open-source projects and maintaining the official Docker guides I’ve written for Angular, Vue.js, and React.js. People actively use these guides, so keeping them accurate and up to date is important to me. Alongside that, I’ll continue writing on my blog and newsletter, sharing practical insights from day-to-day work.

If you weren’t working in tech, what would you be doing instead?

If I weren’t working in tech, I’d probably be a lawyer — I’m a law graduate. Studying law gave me a strong sense of discipline and a structured approach to problem-solving, which I still rely on today. Over time, though, I realized that technology gives me a different kind of fulfillment. It allows me to build things, create practical solutions, and share knowledge in a way that has a direct and visible impact on people. I don’t think anything else would give me the same satisfaction. In tech, I get to solve problems every day, write code, contribute to open-source projects, write books, and share what I’ve learned with the community. That mix of challenge, creativity, and real impact is hard to replace. Law could have been my profession, but technology is where I truly feel at home.

Can you share a memorable story from collaborating with the Docker community?

One of my most memorable experiences with the Docker community was publishing my open-source project frontend-prod-dockerfiles, which provides production-ready Dockerfiles for most of the popular front-end applications. I originally created it to solve a gap I kept seeing: front-end developers didn’t have a clear, reliable reference for well-structured and optimized Dockerfiles.

The response from the community was better than I expected. Developers from all over the world started using it, sharing feedback and suggesting ideas I hadn’t even considered.

That experience was a strong reminder of what makes the Docker community special — openness, collaboration, and a genuine willingness to help each other grow.

The Docker Captains Conference in Turkey (2025) was amazing. It was well organized, inspiring, and full of great energy. I met great people who share the same passion for Docker.

What’s your favorite Docker product or feature right now, and why?

Right now, my favorite Docker features are Docker Offload and Docker Model Runner.

Offload is a game-changer because it lets me move heavy builds and GPU workloads to secure cloud resources directly from the same Docker CLI/Desktop flow I already use. I don’t have to change the way I work locally, but I get cloud-scale speed whenever I need it.

Model Runner lets me run open models locally in just minutes. And when I need more power, I can pair it with Offload to scale out to GPUs.

Can you walk us through a tricky technical challenge you solved recently?

A recent challenge I dealt with was reviewing Dockerfiles that had been generated with AI. A lot of developers were starting to use AI in our company, but I noticed some serious problems right away, images that were too large, broken caching, hardcoded environment variables, and containers running as root. It was a good reminder that while AI can help, we still need to carefully review and apply best practices when it comes to security and performance.

What’s one Docker tip you wish every developer knew?

One tip I wish every developer knew is that Docker is for everyone, not just DevOps or back-end developers. Front-end developers can benefit just as much by using Docker to create consistent environments, ship production-ready builds, and collaborate more smoothly with their teams. It’s not just infrastructure , it’s a productivity boost for the whole stack. I saw a racing number of tech jobs required to have such kind of basic knowledge which overall is positive.

If you could containerize any non-technical object in real life, what would it be and why?

If I could containerize any non-technical object, it would be a happy day. I’d package a perfectly joyful day and redeploy it whenever I needed , no wasted hours, no broken routines, just a consistent, repeatable “build” of happiness.

Where can people find you online?

On LinkedIn, x.com and also my website. I regularly write technical articles on Medium and share insights in my newsletter Front-end World. My open-source projects, including production-ready Dockerfiles for front-end frameworks, are available on GitHub.

Rapid Fire Questions

Cats or Dogs?

Both, I love animals.

Morning person or night owl?

Morning person for study, night owl for work.

Favorite comfort food?

Pasta.

One word friends would use to describe you?

Persistent

A hobby you picked up recently?

Hiking, I love nature

How Medplum Secured Their Healthcare Platform with Docker Hardened Images (DHI)

Jennifer Kohl — Thu, 19 Feb 2026 14:00:00 +0000

Special thanks to Cody Ebberson and the Medplum team for their open-source contribution and for sharing their migration experience with the community. A real-world example of migrating a HIPAA-compliant EHR platform to DHI with minimal code changes.

Healthcare software runs on trust. When patient data is at stake, security isn’t just a feature but a fundamental requirement. For healthcare platform providers, proving that trust to enterprise customers is an ongoing challenge that requires continuous investment in security posture, compliance certifications, and vulnerability management.

That’s why we’re excited to share how Medplum, an open-source healthcare platform serving over 20 million patients, recently migrated to Docker Hardened Images (DHI). This migration demonstrates exactly what we designed DHI to deliver: enterprise-grade security with minimal friction. Medplum’s team made the switch with just 54 lines of changes across 5 files – a near net-zero code change that dramatically improved their security posture.

Medplum is a headless EHR; the platform handles patient data, clinical workflows, and compliance so developers can focus on building healthcare apps. Built by and for healthcare developers, the platform provides:

HIPAA and SOC2 compliance out of the box
FHIR R4 API for healthcare data interoperability
Self-hosted or managed deployment options
Support for 20+ million patients across hundreds of practices

With over 500,000 pulls on Docker Hub for their medplum-server image, Medplum has become a trusted foundation for healthcare developers worldwide. As an open-source project licensed under Apache 2.0, their entire codebase, including Docker configurations, is publicly available onGitHub. This transparency made their DHI migration a perfect case study for the community.

Caption: Medplum is a headless EHR; the platform handles patient data, clinical workflows, and compliance so developers can focus on building healthcare apps.

Medplum is developer-first. It’s not a plug-and-play low-code tool, it’s designed for engineering teams that want a strong FHIR-based foundation with full control over the codebase.

The Challenge: Vulnerability Noise and Security Toil

Healthcare software development comes with unique challenges. Integration with existing EHR systems, compliance with regulations like HIPAA, and the need for robust security all add complexity and cost to development cycles.

“The Medplum team found themselves facing a challenge common to many high-growth platforms: “Vulnerability Noise.” Even with lean base images, standard distributions often include non-essential packages that trigger security flags during enterprise audits. For a company helping others achieve HIPAA compliance, every “Low” or “Medium” CVE (Common Vulnerability and Exposure) requires investigation and documentation, creating significant “security toil” for their engineering team.”

Reshma Khilnani

CEO, Medplum

Medplum addresses this by providing a compliant foundation. But even with that foundation, their team found themselves facing another challenge common to high-growth platforms: “Vulnerability Noise.”

Healthcare is one of the most security-conscious industries. Medplum’s enterprise customers, including Series C and D funded digital health companies, don’t just ask about security; they actively verify it. These customers routinely scan Medplum’s Docker images as part of their security due diligence.

Even with lean base images, standard distributions often include non-essential packages that trigger security flags during enterprise audits. For a company helping others achieve HIPAA compliance, every “Low” or “Medium” CVE requires investigation and documentation. This creates significant “security toil” for their engineering team.

The First Attempt: Distroless

This wasn’t Medplum’s first attempt at solving the problem. Back in November 2024, the team investigated Google’s distroless images as a potential solution.

The motivations were similar to what DHI would later deliver:

Less surface area in production images, and therefore less CVE noise
Smaller images for faster deployments
Simpler build process without manual hardening scripts

The idea was sound. Distroless images strip away everything except the application runtime: no shell, no package manager, minimal attack surface. On paper, it was exactly what Medplum needed.

But the results were mixed. Image sizes actually increased. Build times went up. There were concerns about multi-architecture support for native dependencies. The PR was closed without merging.

The core problem remained: many CVEs in standard images simply aren’t actionable. Often there isn’t a fix available, so all you can do is document and explain why it doesn’t apply to your use case. And often the vulnerability is in a corner of the image you’re not even using, like Perl, which comes preinstalled on Debian but serves no purpose in a Node.js application.

Fully removing these unused components is the only real answer. The team knew they needed hardened images. They just hadn’t found the right solution yet.

The Solution: Docker Hardened Images

When Docker made Hardened Images freely available under Apache 2.0, Medplum’s team saw an opportunity to simplify their security posture while maintaining compatibility with their existing workflows.

By switching to Docker Hardened Images, Medplum was able to offload the repetitive work of OS-level hardening – like configuring non-root users and stripping out unnecessary binaries – to Docker. This allowed them to provide their users with a “Secure-by-Default” image that meets enterprise requirements without adding complexity to their open-source codebase.

This shift is particularly significant for an open-source project. Rather than maintaining custom hardening scripts that contributors need to understand and maintain, Medplum can now rely on Docker’s expertise and continuous maintenance. The security posture improves automatically with each DHI update, without requiring changes to Medplum’s Dockerfiles.

“By switching to Docker Hardened Images, Medplum was able to offload the repetitive work of OS-level hardening—like configuring non-root users and stripping out unnecessary binaries—to Docker. This allowed their users to provide their users with a “Secure-by-Default” image that meets enterprise requirements without adding complexity to their open-source codebase.”

Cody Ebberson

CTO, Medplum

The Migration: Real Code Changes

The migration was remarkably clean. Previously, Medplum’s Dockerfile required manual steps to ensure security best practices. By moving to DHI, they could simplify their configuration significantly.

Let’s look at what actually changed. Here’s the complete server Dockerfile after the migration:

# Medplum production Dockerfile
# Uses Docker "Hardened Images":
# https://hub.docker.com/hardened-images/catalog/dhi/node/guides

# Supported architectures: linux/amd64, linux/arm64

# Stage 1: Build the application and install production dependencies
FROM dhi.io/node:24-dev AS build-stage
ENV NODE_ENV=production
WORKDIR /usr/src/medplum
ADD ./medplum-server-metadata.tar.gz ./
RUN npm ci --omit=dev && \
  rm package-lock.json

# Stage 2: Create the runtime image
FROM dhi.io/node:24 AS runtime-stage
ENV NODE_ENV=production
WORKDIR /usr/src/medplum
COPY --from=build-stage /usr/src/medplum/ ./
ADD ./medplum-server-runtime.tar.gz ./

EXPOSE 5000 8103

ENTRYPOINT [ "node", "--require", "./packages/server/dist/otel/instrumentation.js", "packages/server/dist/index.js" ]

Notice what’s not there:

No groupadd or useradd commands: DHI runs as non-root by default
No chown commands: permissions are already correct
No USER directive: the default user is already non-privileged

Before vs. After: Server Dockerfile

Before (node:24-slim):

FROM node:24-slim
ENV NODE_ENV=production
WORKDIR /usr/src/medplum

ADD ./medplum-server.tar.gz ./

# Install dependencies, create non-root user, and set permissions
RUN npm ci && \
  rm package-lock.json && \
  groupadd -r medplum && \
  useradd -r -g medplum medplum && \
  chown -R medplum:medplum /usr/src/medplum

EXPOSE 5000 8103

# Switch to the non-root user
USER medplum

ENTRYPOINT [ "node", "--require", "./packages/server/dist/otel/instrumentation.js", "packages/server/dist/index.js" ]

After (dhi.io/node:24):

FROM dhi.io/node:24-dev AS build-stage
ENV NODE_ENV=production
WORKDIR /usr/src/medplum
ADD ./medplum-server-metadata.tar.gz ./
RUN npm ci --omit=dev && rm package-lock.json

FROM dhi.io/node:24 AS runtime-stage
ENV NODE_ENV=production
WORKDIR /usr/src/medplum
COPY --from=build-stage /usr/src/medplum/ ./
ADD ./medplum-server-runtime.tar.gz ./

EXPOSE 5000 8103

ENTRYPOINT [ "node", "--require", "./packages/server/dist/otel/instrumentation.js", "packages/server/dist/index.js" ]

The migration also introduced a cleaner multi-stage build pattern, separating metadata (package.json files) from runtime artifacts.

Before vs. After: App Dockerfile (Nginx)

The web app migration was even more dramatic:

Before (nginx-unprivileged:alpine):

FROM nginxinc/nginx-unprivileged:alpine

# Start as root for permissions
USER root

COPY <

After (dhi.io/nginx:1):

FROM dhi.io/nginx:1

COPY <

Results: Improved Security Posture

After merging the changes, Medplum’s team shared their improved security scan results. The migration to DHI resulted in:

Dramatically reduced CVE count – DHI’s minimal base means fewer packages to patch
Non-root by default – No manual user configuration required
No shell access in production – Reduced attack surface for container escape attempts
Continuous patching – All DHI images are rebuilt when upstream security updates are available

For organizations that require stronger guarantees, Docker Hardened Images Enterprise adds SLA-backed remediation timelines, image customizations, and FIPS/STIG variants.

Most importantly, all of this was achieved with zero functional changes to the application. The same tests passed, the same workflows worked, and the same deployment process applied.

CI/CD Integration

Medplum also updated their GitHub Actions workflow to authenticate with the DHI registry:

- name: Login to Docker Hub
  uses: docker/login-action@v2.2.0
  with:
    username: ${{ secrets.DOCKERHUB_USERNAME }}
    password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Login to Docker Hub Hardened Images
  uses: docker/login-action@v2.2.0
  with:
    registry: dhi.io
    username: ${{ secrets.DOCKERHUB_USERNAME }}
    password: ${{ secrets.DOCKERHUB_TOKEN }}

This allows their CI/CD pipeline to pull hardened base images during builds. The same Docker Hub credentials work for both standard and hardened image registries.

The Multi-Stage Pattern for DHI

One pattern worth highlighting from Medplum’s migration is the use of multi-stage builds with DHI variants:

Build stage: Use dhi.io/node:24-dev which includes npm/yarn for installing dependencies
Runtime stage: Use dhi.io/node:24 which is minimal and doesn’t include package managers

This pattern ensures that build tools never make it into the production image, further reducing the attack surface. It’s a best practice for any containerized Node.js application, and DHI makes it straightforward by providing purpose-built variants for each stage.

Medplum’s Production Architecture

Medplum’s hosted offering runs on AWS using containerized workloads. Their medplum/medplum-server image, built on DHI base images, now deploys to production.

Here’s how the build-to-deploy flow works:

Build time: GitHub Actions pulls dhi.io/node:24-dev and dhi.io/node:24 as base images
Push: The resulting hardened image is pushed to medplum/medplum-server on Docker Hub
Deploy: AWS Fargate pulls medplum/medplum-server:latest and runs the hardened container

The deployed containers inherit all DHI security properties (non-root execution, minimal attack surface, no shell) because they’re built on DHI base images. This demonstrates that DHI works seamlessly with production-grade infrastructure including:

AWS Fargate/ECS for container orchestration
Elastic Load Balancing for high availability
Aurora PostgreSQL for managed database
ElastiCache for Redis caching
CloudFront for CDN and static assets

No infrastructure changes were required. The same deployment pipeline, the same Fargate configuration, just a more secure base image.

Why This Matters for Healthcare

For healthcare organizations evaluating container security, Medplum’s migration offers several lessons:

1. Eliminating “Vulnerability Noise”

The biggest win from DHI isn’t just security, it’s reducing the operational burden of security. Fewer packages means fewer CVEs to investigate, document, and explain to customers. For teams without dedicated security staff, this reclaimed time is invaluable.

2. Compliance-Friendly Defaults

HIPAA requires covered entities to implement technical safeguards including access controls and audit controls. DHI’s non-root default and minimal attack surface align with these requirements out of the box. For companies pursuing SOC 2 Type 2 certification, which Medplum implemented from Day 1, or HITRUST certification, DHI provides a stronger foundation for the technical controls auditors evaluate.

3. Reduced Audit Surface

When security teams audit container configurations, DHI provides a cleaner story. Instead of explaining custom hardening scripts or why certain CVEs don’t apply, teams can point to Docker’s documented hardening methodology, SLSA Level 3 provenance, and independent security validation by SRLabs. This is particularly valuable during enterprise sales cycles where customers scan vendor images as part of due diligence.

4. Practicing What You Preach

For platforms like Medplum that help customers achieve compliance, using hardened images isn’t just good security, it’s good business. When you’re helping healthcare organizations meet regulatory requirements, your own infrastructure needs to set the example.

5. Faster Security Response

With DHI Enterprise, critical CVEs are patched within 7 days. For healthcare organizations where security incidents can have regulatory implications, this SLA provides meaningful risk reduction and a concrete commitment to share with customers.

Conclusion

Medplum’s migration to Docker Hardened Images demonstrates that improving container security doesn’t have to be painful. With minimal code changes (54 additions and 52 deletions) they achieved:

Secure-by-Default images that meet enterprise requirements
Automatic non-root execution
Dramatically reduced CVE surface
Simplified Dockerfiles with no manual hardening scripts
Less “security toil” for their engineering team
A stronger compliance story for enterprise customers

By offloading OS-level hardening to Docker, Medplum can focus on what they do best: building healthcare infrastructure while their security posture improves automatically with each DHI update.

For a platform with 500,000+ Docker Hub pulls serving healthcare organizations worldwide, this migration shows that DHI is ready for production workloads at scale. More importantly, it shows that security improvements can actually reduce operational burden rather than add to it.

For platforms helping others achieve compliance, practicing what you preach matters. With Docker Hardened Images, that just got a lot easier.

Ready to harden your containers? Explore the Docker Hardened Images documentation or browse the free DHI catalog to find hardened versions of your favorite base images.

Resources

The Multi-Model Database for AI Agents: Deploy SurrealDB with Docker Extension

Jennifer Kohl — Tue, 17 Feb 2026 14:00:00 +0000

When it comes to building dynamic and real-work solutions, developers need to stitch multiple databases (relational, document, graph, vector, time-series, search) together and build complex API layers to integrate them. This generates significant complexity, cost, and operational risk, and reduces speed of innovation. More often than not, developers end up focusing on building glue code and managing infrastructure rather than building application logic. For AI use cases, using multiple databases means AI Agents have fragmented data, context and memory, producing bad outputs at high latency.

Enter SurrealDB.

SurrealDB is a multi-model database built in Rust that unifies document, graph, relational, time-series, geospatial, key-value, and vector data into a single engine. Its SQL-like query language, SurrealQL, lets you traverse graphs, perform vector search, and query structured data – all in one statement.

Designed for data-intensive workloads like AI agent memory, knowledge graphs, real-time applications, and edge deployments, SurrealDB runs as a single binary anywhere: embedded in your app, in the browser via WebAssembly, at the edge, or as a distributed cluster.

What problem does SurrealDB solve?

Modern AI systems place very different demands on data infrastructure than traditional applications. SurrealDB addresses these pressures directly:

Single runtime for multiple data models – AI systems frequently combine vector search, graph traversal, document storage, real-time state, and relational data in the same request path. SurrealDB supports these models natively in one engine, avoiding brittle cross-database APIs, ETL pipelines, and consistency gaps.
Low-latency access to changing context – Voice agents, interactive assistants, and stateful agents are sensitive to both latency and data freshness. SurrealDB’s query model and real-time features serve up-to-date context without polling or background sync jobs.
Reduced system complexity – Replacing multiple specialized databases with a single multi-model store reduces services, APIs, and failure modes. This simplifies deployment, debugging, and long-term maintenance.
Faster iteration on data-heavy features – Opt in schemas definitions and expressive queries let teams evolve data models alongside AI features without large migrations. This is particularly useful when experimenting with embeddings, relationships, or agent memory structures.
Built-in primitives for common AI patterns – Native support for vectors, graphs, and transactional consistency enables RAG, graph-augmented retrieval, recommendation pipelines, and agent state management – without external systems or custom glue code.

In this article, you’ll see how to build a WhatsApp RAG chatbot using SurrealDB Docker Extension. You’ll learn how SurrealDB Docker Extension powers an intelligent WhatsApp chatbot that turns your chat history into searchable, AI-enhanced conversations with vector embeddings and precise source citations.

Understanding SurrealDB Architecture

SurrealDB’s architecture unifies multiple data models within a single database engine, eliminating the need for separate systems and synchronization logic (figure below).

Caption: SurrealDB Architecture diagram

Caption: Architecture diagram of SurrealDB showing a unified multi-model database with real-time capabilities. (more information at https://surrealdb.com/docs/surrealdb/introduction/architecture)

With SurrealDB, you can:

Model complex relationships using graph traversal syntax (e.g., ->bought_together->product)
Store flexible documents alongside structured relational tables
Subscribe to real-time changes with LIVE SELECT queries that push updates instantly
Ensure data consistency with ACID-compliant transactions across all models

Learn more about SurrealDB’s architecture and key features on the official documentation.

How does Surreal work?

SurrealDB separates storage from compute, enabling you to scale these independently without the need to manually shard your data.

The query layer (otherwise known as the compute layer) handles queries from the client, analyzing which records need to be selected, created, updated, or deleted.

The storage layer handles the storage of the data for the query layer. By scaling storage nodes, you are able to increase the amount of supported data for each deployment.

SurrealDB supports all the way from single-node to highly scalable fault-tolerant deployments with large amounts of data.

For more information, see https://surrealdb.com/docs/surrealdb/introduction/architecture.

Why should you run SurrealDB as a Docker Extension

For developers already using Docker Desktop, running SurrealDB as an extension eliminates friction. There’s no separate installation, no dependency management, no configuration files – just a single click from the Extensions Marketplace.

Docker provides the ideal environment to bundle and run SurrealDB in a lightweight, isolated container. This encapsulation ensures consistent behavior across macOS, Windows, and Linux, so what works on your laptop works identically in staging.

The Docker Desktop Extension includes:

Visual query editor with SurrealQL syntax highlighting
Real-time data explorer showing live updates as records change
Schema visualization for tables and relationships
Connection management to switch between local and remote instances
Built-in backup/restore for easy data export and import

With Docker Desktop as the only prerequisite, you can go from zero to a running SurrealDB instance in under a minute.

Getting Started

To begin, download and install Docker Desktop on your machine. Then follow these steps:

Open Docker Desktop and select Extensions in the left sidebar
Switch to the Browse tab
In the Filters dropdown, select the Database category
Find SurrealDB and click Install

Caption: Installing the SurrealDB Extension from Docker Desktop’s Extensions Marketplace.

Real-World Example

Smart Team Communication Assistant

Imagine searching through months of team WhatsApp conversations to answer the question: “What did we decide about the marketing campaign budget?”

Traditional keyword search fails, but RAG with SurrealDB and LangChain solves this by combining semantic vector search with relationship graphs.

This architecture analyzes group chats (WhatsApp, Instagram, Slack) by storing conversations as vector embeddings while simultaneously building a knowledge graph linking conversations through extracted keywords like “budget,” “marketing,” and “decision.” When queried, the system retrieves relevant context using both similarity matching and graph traversal, delivering accurate answers about past discussions, decisions, and action items even when phrased differently than the original conversation.

This project is inspired by Multi-model RAG with LangChain | GitHub Example

1. Clone the repository:

git clone https://github.com/Raveendiran-RR/surrealdb-rag-demo

2. Enable Docker Model Runner by visiting Docker Desktop > Settings > AI

Caption: Enable Docker Model Runner in Docker Desktop > settings > AI

3. Pull llama3.2 model from Docker Hub

Search for llama 3.2 under Models > Docker Hub and pull the right model.

Caption: Pull the Docker model llama3.2

4. Download the embeddinggemma model from Docker Hub

Caption: Click on Models > Search for embeddinggemma > download the model

5. Run this command to connect to the persistent surrealDB container

Browse to the directory where you have cloned the repository
Create directory “mydata”

mkdir -p mydata

6. Run this command:

docker run -d --name demo_data \
  -p 8002:8000 \
  -v "$(pwd)/mydata:/mydata" \
  surrealdb/surrealdb:latest \
  start --log debug --user root --pass root \
  rocksdb://mydata

Note: use the path based on the operating system.

For windows , use rocksdb://mydata
For linux and macOS, use rocksdb:/mydata

7. Open SurrealDB Docker Extension and connect with SurrealDB.

Caption: Connecting to SurrealDB through Docker Desktop Extension

Connection name: RAGBot

Remote address: http://localhost:8002

Username: root | password: root

Click on Create Connection

8. Run the setup instructions

9. Upload the whatsapp chat

Start the UI for the RAG bot (http://localhost:8080)

Caption: Create connection to the SurrealDB Docker container

10. Start chatting with the RAG bot and have fun

11. We can verify the correctness data in SurrealDB list

Ensure that you connect to the right namespace (whatsapp) and database (chats)

python3 load_whatsapp.py
python3 rag_chat_ui.py

Caption: connect to the “whatsapp” namespace and “chats” database

Caption: Data stored as vectors in SurrealDB

Caption: Interact with the RAG bot UI where it gives you the answer and exact reference for it

Using this chat bot, now you can get information about the chat.txt file that was ingested. You can also verify the information in the query editor as shown below when you can run custom queries to validate the results from the chat bot. You can ingest new messages through the load_whatsapp.py file, please ensure that the message format is same as in the sample whatsChatExport.txt file.

Learn more about SurrealQL here.

Caption: SurrealDB Query editor in the Docker Desktop Extension

Conclusion

The SurrealDB Docker Extension offers an accessible and powerful solution for developers building data-intensive applications – especially those working with AI agents, knowledge graphs, and real-time systems. Its multi-model architecture eliminates the need to stitch together separate databases, letting you store documents, traverse graphs, query vectors, and subscribe to live updates from a single engine.

With Docker Desktop integration, getting started takes seconds rather than hours. No configuration files, no dependency management – just install the extension and start building. The visual query editor and real-time data explorer make it easy to prototype schemas, test queries, and inspect data as it changes.

Whether you’re building agent memory systems, real-time recommendation engines, or simply looking to consolidate a sprawling database stack, SurrealDB’s Docker Extension provides an intuitive path forward. Install it today and see how a unified data layer can simplify your architecture.

If you have questions or want to connect with other SurrealDB users, join the SurrealDB community on Discord.

Learn More

Install the SurrealDB Docker Extension
Get the latest release of Docker Desktop
SurrealDB documentation
Vote on what’s next! Check out our public roadmap
Have questions? The Docker community is here to help
New to Docker? Get started

Permission-Aware RAG: End-to-End Testing with the SpiceDB Testcontainer

Jennifer Kohl — Thu, 15 Jan 2026 14:00:00 +0000

We use GenAI in every facet of technology now – internal knowledge bases, customer support systems, and code review bots, to name just a few use cases. And in nearly every one of these, someone eventually asks:

What stops the model from returning something the user shouldn’t see?”

This is a roadblock that companies building RAG features or AI Agents eventually hit – the moment where an LLM returns data from a document that the user was not authorized to access, introducing potential legal, financial, and reputational risk to all parties. Unfortunately, traditional methods of authorization are not suited for the hierarchical, dynamic nature of access control in RAG. This is exactly where modern authorization permissioning systems such as SpiceDB shine: in building fine-grained authorization for filtering content in your AI-powered applications.

In fact, OpenAI uses SpiceDB to secure 37 Billion documents for 5 Million users who use ChatGPT Connectors – a feature where you bring your data from different sources such as Google Drive, Dropbox, GitHub etc. into ChatGPT.

This blog post shows how you can pair SpiceDB with Testcontainers to give you the ability to test your permission logic inside your RAG pipeline, end-to-end, automatically, with zero infrastructure dependencies.The example repo can be found here.

Quick Primer on Authorization

Before diving into implementation, let’s clarify two foundational concepts: Authentication (verifying who a user is) and Authorization (deciding what they can access).

Authorization is commonly implemented via techniques such as:

Access Control Lists (ACLs)
Role-Based Access Control (RBAC)
Attribute-Based Access Control (ABAC)

However, for complex, dynamic, and context-rich applications like RAG pipelines, traditional methods such as RBAC or ABAC fall short. The new kid on the block – ReBAC (Relationship-Based Access Control) is ideal as it models access as a graph of relationships rather than fixed rules, providing the necessary flexibility and scalability required.

ReBAC was popularized in Google Zanzibar, the internal authorization system Google built to manage permissions across all its products (e.g., Google Docs, Drive). Zanzibar systems are optimized for low-latency, high-throughput authorization checks, and global consistency – requirements that are well-suited for RAG systems.

SpiceDB is the most scalable open-source implementation of Google’s Zanzibar authorization model. It stores access as a relationship graph, where the fundamental check reduces to:

Is this actor allowed to perform this action on this resource?

For a Google Docs-style example:

definition user {}
definition document {
  relation reader: user
  relation writer: user

  permission read = reader + writer
  permission write = writer
}

This schema defines object types (user and document), explicit Relations between the objects (reader, writer), and derived Permissions (read, write). SpiceDB evaluates the relationship graph in microseconds, enabling real-time authorization checks at massive scale.

Access Control for RAG

RAG (Retrieval-Augmented Generation) is an architectural pattern that enhances Large Language Models (LLMs) by letting them consult an external knowledge base, typically involving a Retriever component finding document chunks and the LLM generating an informed response.

This pattern is now used by businesses and enterprises for apps like chatbots that query sensitive data such as customer playbooks or PII – all stored in a vector database for performance. However, the fundamental risk in this flow is data leakage: the Retriever component ignores permissions, and the LLM will happily summarize unauthorized data. In fact, OWASP has a Top 10 Risks for Large Language Model Applications list which includes Sensitive Information Disclosure, Excessive Agency & Vector and Embedding Weaknesses. The consequences of this leakage can be severe, ranging from loss of customer trust to massive financial and reputational damage from compliance violations.

This setup desperately needs fine-grained authorization, and that’s where SpiceDB comes in. SpiceDB can post-filter retrieved documents by performing real-time authorization checks, ensuring the model only uses data the querying user is permitted to see. The only requirement is that the documents have metadata that indicates where the information came from.But testing this critical permission logic without mocks, manual Docker setup, or flaky Continuous Integration (CI) environments is tricky. Testcontainers provides the perfect solution, allowing you to spin up a real, production-grade, and disposable SpiceDB instance inside your unit tests to deterministically verify that your RAG pipeline respects permissions end-to-end.

Spin Up Real Authorization for Every Test

Instead of mocking your authorization system or manually running it on your workstation, you can add this line of code in your test:

container, _ := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")

And Testcontainers will:

Pull the real SpiceDB image
Start it in a clean, isolated environment
Assign it dynamic ports
Wait for it to be ready
Hand you the gRPC endpoint
Clean up afterwards

Because Testcontainers handles the full lifecycle – from pulling the container, exposing dynamic ports, and tearing it down automatically, you eliminate manual processes such as running Docker commands, and writing cleanup scripts. This isolation ensures that every single test runs with a fresh, clean authorization graph, preventing data conflicts, and making your permission tests completely reproducible in your IDE and across parallel Continuous Integration (CI) builds.

Suddenly you have a real, production-grade, Zanzibar-style permissions engine inside your unit test.

Using SpiceDB & Testcontainers

Here’s a walkthrough of how you can achieve end-to-end permissions testing using SpiceDB and Testcontainers. The source code for this tutorial can be found here.

1. Testing Our RAG

For the sake of simplicity, we have a minimal RAG and the retrieval mechanism is trivial too.

We’re going to test three documents which have doc_ids (doc1 doc2 ..) that act as metadata.

doc1: Internal roadmap
doc2: Customer playbook
doc3: Public FAQ

And three users:

Emilia owns doc1
Beatrice can view doc2
Charlie (or anyone) can view doc3

This SpiceDB schema defines a user and a document object type. A user has read permission on a document if they are the direct viewer or the owner of the document.

definition user {}

definition document {
  relation owner: user
  relation viewer: user | owner
  permission read = owner + viewer
}

2. Starting the Testcontainer

Here’s how a line of code can start a test to launch the disposable SpiceDB instance:

container, err := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")
require.NoError(t, err)

Next, we connect to the running containerized service:

host, _ := container.Host(ctx)
port, _ := container.MappedPort(ctx, "50051/tcp")
endpoint := fmt.Sprintf("%s:%s", host, port.Port())

client, err := authzed.NewClient(
    endpoint,
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpcutil.WithInsecureBearerToken("somepresharedkey"),
)

This is now a fully-functional SpiceDB instance running inside your test runner.

3. Load the Schema + Test Data

The test seeds data the same way your application would:

_, err := client.WriteSchema(ctx, &apiv1.WriteSchemaRequest{Schema: schema})
require.NoError(t, err)

Then:

rel("document", "doc1", "owner", "user", "emilia")
rel("document", "doc2", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "emilia")
rel("document", "doc3", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "charlie")

We now have a predictable, reproducible authorization graph for every test run.

4. Post-Filtering With SpiceDB

Before the LLM sees anything, we check permissions with SpiceDB which acts as the source of truth of the permissions in the documents.

resp, err := r.spiceClient.CheckPermission(ctx, &apiv1.CheckPermissionRequest{
    Resource:   docObject,
    Permission: "read",
    Subject:    userSubject,
})

If SpiceDB says no, the doc is never fed into the LLM, thereby ensuring the user gets an answer to their query only based on what they have permissions to read.

This avoids:

Accidental data leakage
Overly permissive vector search
Compliance problems

Traditional access controls break down when data becomes embeddings hence having guardrails prevents this from happening.

End-to-End Permission Checks in a Single Test

Here’s what the full test asserts:

Emilia queries “roadmap” → gets doc1
Because they’re the owner.

Beatrice queries “playbook” → gets doc2
Because she’s a viewer.

Charlie queries “public” → gets doc3
Because it’s the only doc he can read, as it’s a public doc

If there is a single failing permission rule, the end-to-end test will immediately fail, which is critical given the constant changes in RAG pipelines (such as new retrieval modes, embeddings, document types, or permission rules).

What If Your RAG Pipeline Isn’t in Go?

First, a shoutout to Guillermo Mariscal for his original contribution to the SpiceDB Go Testcontainers module.

What if your RAG pipeline is written in a different language such as Python? Not to worry, there’s also a community Testcontainers module written in Python that you can use similarly. The module can be found here.

Typically, you would integrate it in your integration tests like this:

 # Your RAG pipeline test
  def test_rag_pipeline_respects_permissions():
      with SpiceDBContainer() as spicedb:
          # Set up permissions schema
          client = create_spicedb_client(
              spicedb.get_endpoint(),
              spicedb.get_secret_key()
          )

          # Load your permissions model
          client.WriteSchema(your_document_permission_schema)

          # Write test relationships
          # User A can access Doc 1
          # User B can access Doc 2

          # Test RAG pipeline with User A
          results = rag_pipeline.search(query="...", user="A")
          assert "Doc 1" in results
          assert "Doc 2" not in results  # Should be filtered out!

Similar to the Go module, this container gives you a clean, isolated SpiceDB instance for every test run.

Why This Approach Matters

Authorization testing in RAG pipelines can be tricky, given the scale and latency requirement and it can get trickier in systems handling sensitive data. By integrating the flexibility and scale of SpiceDB with the automated, isolated environments of Testcontainers, you shift to a completely reliable, deterministic approach to authorization.

Every time your code ships, a fresh, production-grade authorization engine is spun up, loaded with test data, and torn down cleanly, guaranteeing zero drift between your development machine and CI. This pattern can ensure that your RAG system is safe, correct, and permission-aware as it scales from three documents to millions.

Try It Yourself

The complete working example in Go along with a sample RAG pipeline is here:
https://github.com/sohanmaheshwar/spicedb-testcontainer-rag
Clone it.
Run go test -v.
Watch it spin up a fresh SpiceDB instance, load permissions, and assert RAG behavior.
Also, find the community modules for the SpiceDB testcontainer in Go and Python.

From the Captain’s Chair: Igor Aleksandrov

Jennifer Kohl — Fri, 19 Dec 2025 14:00:00 +0000

Today we are interviewing Igor Aleksandrov. Igor is the CTO and co-founder of JetRockets, a Ruby on Rails development agency based in NYC, bringing over 20 years of software engineering experience and a deep commitment to the Rails ecosystem since 2008. He’s an open-source contributor to projects like the Crystal programming language and Kamal, a regular conference speaker sharing expertise on different topics from container orchestration to migration from React to Hotwire.

Can you share how you first got involved with Docker? What inspired you to become a Docker Captain?

Looking back at my journey to becoming a Docker Captain, it all started with a very practical problem that many Rails teams face: dependency hell.

By 2018, JetRockets had been building Ruby on Rails applications for years. I’d been working with Rails since version 2.2 back in 2009, and we had established solid development practices. But as our team grew and our projects became more complex, we kept running into the same frustrating issues:

“It works on my machine” became an all-too-common phrase during deployments
Setting up new developer environments was a time-consuming process fraught with version mismatches
Our staging and production environments occasionally behaved differently despite our best efforts
Managing system-level dependencies across different projects was becoming increasingly complex

We needed a unified way to manage application dependencies that would work consistently across development, staging, and production environments.

Unlike many teams that start with Docker locally and gradually move to production, we decided to implement Docker in production and staging first. This might sound risky, but it aligned perfectly with our goal of achieving true environment parity.

We chose our first Rails application to containerize and started writing our first Dockerfile. Those early Dockerfiles were much simpler than the highly optimized ones we create today, but they solved our core problem: every environment now ran the same container with the same dependencies.

Even though AWS Beanstalk has never been a developer friendly solution, the goal was reached – we had achieved true environment consistency, and the mental overhead of managing different configurations across environments had virtually disappeared.

That initial Docker adoption in 2018 sparked a journey that would eventually lead to me becoming a Docker Captain. What began with a simple need for dependency management evolved into deep expertise in container optimization, advanced deployment strategies with tools like Kamal, and ultimately contributing back to the Docker community.

Today, I write extensively about Rails containerization best practices, from image slimming techniques to sophisticated CI/CD pipelines. But it all traces back to that moment in 2019 when we decided to solve our dependency challenges with Docker.

What are some of your personal goals for the next year?

I want to speak at more conferences and meetups, sharing the expertise I’ve built over the years. Living in the Atlanta area, I would like to become more integrated into the local tech community. Atlanta has such a vibrant IT scene, and I think there’s a real opportunity to contribute here. Whether that’s organizing Docker meetups, participating in Rails groups, or just connecting with other CTOs and technical leaders who are facing similar challenges.

If you weren’t working in tech, what would you be doing instead?

If I weren’t working in tech, I think I’d be doing woodworking. There’s something deeply satisfying about creating things with your hands, and woodworking offers that same creative problem-solving that draws me to programming – except you’re working with natural materials and traditional tools instead of code.

I truly enjoy working with my hands and seeing tangible results from my efforts. In many ways, building software and building furniture aren’t that different – you’re taking raw materials, applying craftsmanship and attention to detail, and creating something functional and beautiful.

If not woodworking, I’d probably pursue diving. I’m already a PADI certified rescue diver, and I truly like the ocean. There’s something about the underwater world that’s entirely different from our digital lives – it’s peaceful, challenging, and always surprising. Getting my diving instructor certification and helping others discover that underwater world would be incredibly rewarding.

Can you share a memorable story from collaborating with the Docker community?

One of the most rewarding aspects of being a Docker Captain is our regular Captains meetings, and honestly, I enjoy each one of them. These aren’t just typical corporate meetings – they’re genuine collaborations with some of the most passionate and knowledgeable people in the containerization space.

What makes these meetings special is the diversity of perspectives. You have Captains from completely different backgrounds – some focused on enterprise Kubernetes deployments, others working on AI, developers like me optimizing Rails applications, and people solving problems I’ve never even thought about.

What’s your favorite Docker product or feature right now, and why?

Currently, I’m really excited about the Build Debugging feature that was recently integrated into VS Code. As someone who spends a lot of time optimizing Rails Dockerfiles and writing about containerization best practices, this feature has been a game-changer for my development workflow.

When you’re crafting complex multi-stage builds for Rails applications – especially when you’re trying to optimize image size, manage build caches, and handle dependencies like Node.js and Ruby gems – debugging build failures used to be a real pain.

Can you walk us through a tricky technical challenge you solved recently?

Recently, I was facing a really frustrating development workflow issue that I think many Rails developers can relate to. We had a large database dump file, about 150GB, that we needed to use as a template for local development. The problem was that restoring this SQL dump into PostgreSQL was taking up to an hour every time we needed to reset our development database to a clean state.

For a development team, this was killing our productivity. Every time someone had to test a migration rollback, debug data-specific issues, or just start fresh, they’d have to wait an hour for the database restore. That’s completely unacceptable.

Initially, we were doing what most teams do: running pg_restore against the SQL dump file directly. But with a 150GB database, this involves PostgreSQL parsing the entire dump, executing thousands of INSERT statements, rebuilding indexes, and updating table statistics. It’s inherently slow because the database engine has to do real work.

I realized the bottleneck wasn’t the data itself – it was the database restoration process. So I wrote a Bash script that takes an entirely different approach:

Create a template volume: Start with a fresh Docker volume and spin up a PostgreSQL container
One-time restoration: Restore the SQL dump into this template database (this still takes an hour, but only once)
Volume snapshot: Use a BusyBox container to copy the entire database volume at the filesystem level
Instant resets: When developers need a fresh database, just copy the template volume to a new working volume

The magic is in step 4. Instead of restoring from SQL, we’re essentially copying files at the Docker volume level. This takes seconds instead of an hour because we’re just copying the already-processed PostgreSQL data files.

Docker volumes are just filesystem directories under the hood. PostgreSQL stores its data in a very specific directory structure with data files, indexes, and metadata. By copying the entire volume, we’re getting a perfect snapshot of the database in its “ready to use” state.

The script handles all the orchestration – creating volumes, managing container lifecycles, and ensuring the copied database starts up cleanly. What used to be a one-hour reset cycle is now literally 5-10 seconds. Developers can experiment freely, test destructive operations, and reset their environment without hesitation. It’s transformed how our team approaches database-dependent development.

What’s one Docker tip you wish every developer knew?

If something looks weird in your Dockerfile, you are doing it wrong. This is the single most important lesson I’ve learned from years of optimizing Rails Dockerfiles. I see this constantly when reviewing other developers’ container setups – there’s some convoluted RUN command, a bizarre COPY pattern, or a workaround that just feels off.

Your Dockerfile should read like clean, logical instructions. If you find yourself writing something like:

RUN apt-get update && apt-get install -y wget && \
    wget some-random-script.sh && chmod +x some-random-script.sh && \
    ./some-random-script.sh && rm some-random-script.sh

…you’re probably doing it wrong.

The best Dockerfiles are almost boring in their simplicity and clarity. Every line should have a clear purpose, and the overall flow should make sense to anyone reading it. If you’re adding odd hacks, unusual file permissions, or complex shell gymnastics, step back and ask why.

This principle has saved me countless hours of debugging. Instead of trying to make unusual things work, I’ve learned to redesign the approach. Usually, there’s a cleaner, more standard way to achieve what you’re trying to do.

If you could containerize any non-technical object in real life, what would it be and why?

If I could containerize any non-technical object, it would definitely be knowledge itself. Imagine being able to package up skills, experiences, and expertise into portable containers that you could load and unload from your mind as needed. As someone who’s constantly learning new technologies and teaching others, I’m fascinated by how we acquire and transfer knowledge. Currently, if I want to dive deep into a new programming language like I did with Crystal, or master a deployment tool like Kamal, it takes months of dedicated study and practice.

But what if knowledge worked like Docker containers? You could have a “Ruby 3.3 expertise” container, a “Advanced Kubernetes” container, or even a “Woodworking joinery techniques” container. Need to debug a complex Rails application? Load the container. Working on a diving certification course? Swap in the marine biology knowledge base.

The real power would be in the consistency and portability – just like how Docker containers ensure your application runs the same way everywhere, knowledge containers would give you the same depth of understanding regardless of context. No more forgetting syntax, no more struggling to recall that one debugging technique you learned years ago.

Plus, imagine the collaborative possibilities. Experienced developers could literally package their hard-earned expertise and share it with the community. It would democratize learning in the same way Docker democratized deployment.

Of course, the human experience of learning and growing would be lost, but from a pure efficiency standpoint? That would be incredible.

Where can people find you online? (talks, blog posts, or open source projects, etc)

I am always active in X (@igor_alexandrov) and on LinkedIn. I try to give at least 2-3 talks at tech conferences and meetups each year, and besides this, I have my personal blog.

Rapid Fire Questions

Cats or Dogs?

Dogs

Morning person or night owl?

Both

Favorite comfort food?

Dumplings

One word friends would use to describe you?

Perfectionist

A hobby you picked up recently?

Cycling

Develop and deploy voice AI apps using Docker

Jennifer Kohl — Tue, 16 Dec 2025 14:00:00 +0000

Voice is the next frontier of conversational AI. It is the most natural modality for people to chat and interact with another intelligent being. However, the voice AI software stack is complex, with many moving parts. Docker has emerged as one of the most useful tools for AI agent deployment.

In this article, we’ll explore how to use open-source technologies and Docker to create voice AI agents that utilize your custom knowledge base, voice style, actions, fine-tuned AI models, and run on your own computer. It is based on a talk I recently gave at the Docker Captains Summit in Istanbul.

Docker and AI

Most developers consider Docker the “container store” for software. The Docker container provides a reliable and reproducible environment for developing software locally on your own machine and then shipping it to the cloud. It also provides a safe sandbox to isolate, run, and scale user-submitted software in the cloud. For complex AI applications, Docker provides a suite of tools that makes it easy for developers and platform engineers to build and deploy.

The Docker container is a great tool for running software components and functions in an AI agent system. It can run web servers, API servers, workflow orchestrators, LLM actions or tool calls, code interpreters, simulated web browsers, search engines, and vector databases.
With the NVIDIA Container Toolkit, you can access the host machine’s GPU from inside Docker containers, enabling you to run inference applications such as LlamaEdge that serve open-source AI models inside the container.
The Docker Model Runner runs OpenAI-compatible API servers for open-source LLMs locally on your own computer.
The Docker MCP Toolkit provides an easy way to run MCP servers in containers and make them available to AI agents.

The EchoKit platform provides a set of Docker images and utilizes Docker tools to simplify the deployment of complex AI workflows.

EchoKit

The EchoKit consists of a server and a client. The client could be an ESP32-based hardware device that can listen for user voices using a microphone, stream the voice data to the server, receive and play the server’s voice response through a speaker. EchoKit provides the device hardware specifications and firmware under open-source licenses. To see it in action, check out the following video demos.

You can check out the GitHub repo for EchoKit.

The AI agent orchestrator

The EchoKit server is an open-source AI service orchestrator focused on real-time voice use cases. It starts up a WebSocket server that listens for streaming audio input and returns streaming audio responses. It ties together multiple AI models, including voice activity detection (VAD), automatic speech recognition (ASR), large language models (LLM), and text-to-speech (TTS), using one model’s output as the input for the next model.

You can start an EchoKit server on your local computer and configure the EchoKit device to access it over the local WiFi network. The “edge server” setup reduces network latency, which is crucial for voice AI applications.

The EchoKit team publishes a multi-platform Docker image that you can use directly to start an EchoKit server. The following command starts the EchoKit server with your own config.toml file and runs in the background.

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server &

The config.toml file is mapped into the container to configure how the EchoKit server utilizes various AI services in its voice response workflow. The following is an example of config.toml. It starts the WebSocket server on port 8080. That’s why in the Docker command, we map the container’s port 8080 to the same port on the host. That allows the EchoKit server to be accessible through the host computer’s IP address. The rest of the config.toml specifies how to access the ASR, LLM, and TTS models to generate a voice response for the input voice data.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hello\n你好\n(noise)\n(bgm)\n(silence)\n"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

The AI services configured for the above EchoKit server are as follows.

It utilizes Groq for ASR (voice-to-text) and LLM tasks. You will need to fill in your own Groq API key.
It utilizes ElevenLabs for streaming TTS (text-to-speech). You will need to fill in your own ElevenLabs API key.

Then, in the EchoKit device setup, you just need to point your device to the local EchoKit server.

ws://local-network-ip.address:8080/ws/

For more options on the EchoKit server configuration, please refer to our documentation!

The VAD server

The voice-to-text ASR is not sufficient by itself. It could hallucinate and generate nonsensical text if the input voice is not human speech (e.g., background noise, street noise, or music). It also would not know when the user has finished speaking, and the EchoKit server needs to ask the LLM to start generating a response.

A VAD model is used to detect human voice and conversation turns in the voice stream. The EchoKit team has a multi-platform Docker image that incorporates the open-source Silero VAD model. The image is much larger than the plain EchoKit server, and it requires more CPU resources to run. But it delivers substantially better voice recognition results. Here is the Docker command to start the EchoKit server with VAD in the background.

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server-vad &

The config.toml file for this Docker container also needs an additional line in the ASR section, so that the EchoKit server knows to stream incoming audio data to the local VAD service and act on the VAD signals. The Docker container runs the Silero VAD model as a WebSocket service inside the container on port 8000. There is no need to expose the container port 8000 to the host.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hello\n你好\n(noise)\n(bgm)\n(silence)\n"
vad_url = "http://localhost:9093/v1/audio/vad"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

We recommend using the VAD enabled EchoKit server whenever possible.

MCP services

A key feature of AI agents is to perform actions, such as making web-based API calls, on behalf of LLMs. For example, the “US civics test prep” example for EchoKit requires the agent to get exam questions from a database, and then generate responses that guide the user toward the official answer.

The MCP protocol is the industry standard for providing tools (function calls) to LLM agents. For example, the DuckDuckGo MCP server provides a search tool for LLMs to search the internet if the user asks for current information that is not available in the LLM’s pre-training data. The Docker MCP Toolkit provides a set of tools that make it easy to run MCP servers that can be utilized by EchoKit.

The command below starts a Docker MCP gateway server. The MCP protocol defines several ways for agents or LLMs to access MCP tools. Our gateway server is accessible through the streaming HTTP protocol at port 8011.

docker mcp gateway run --port 8011 --transport streaming

Next, you can add the DuckDuckGo MCP server to the gateway. The search tool provided by the DuckDuckGo MCP server is now available on HTTP port 8011.

docker mcp server enable duckduckgo

You can simply configure the EchoKit server to use the DuckDuckGo MCP tools in the config.toml file.

[[llm.mcp_server]]
server = "http://localhost:8011/mcp"
type = "http_streamable"
call_mcp_message = "Please hold on a few seconds while I am searching for an answer!"

Now, when you ask EchoKit a current event question, such as “What is the latest Tesla stock price?”, it will first call the DuckDuckGo MCP’s search tool to retrieve this information and then respond to the user.

The call_mcp_message field is a message the EchoKit device will read aloud when the server calls the MCP tool. It is needed since the MCP tool call could introduce significant latency in the response.

Docker Model Runner

The EchoKit server orchestrates multiple AI services. In the examples in this article so far, the EchoKit server is configured to use cloud-based AI services, such as Groq and ElevenLabs. However, many applications—especially in the voice AI area—require the AI models to run locally or on-premises for security, cost, and performance reasons.

Docker Model Runner is Docker’s solution to run LLMs locally. For example, the following command downloads and starts OpenAI’s open-source gpt-oss-20b model on your computer.

docker model run ai/gpt-oss

The Docker Model Runner starts an OpenAI-compatible API server at port 12434. It could be directly utilized by the EchoKit server via config.toml.

[llm]
platform = "openai_chat"
url = "http://localhost:12434/engines/llama.cpp/v1/chat/completions"
model = "ai/gpt-oss"
history = 20

At the time of this writing, the Docker Model Runner only supports LLMs. The EchoKit server still relies on cloud services, or local AI solutions such as LlamaEdge, for other types of AI services.

Conclusion

The complexity of the AI agent software stack has created new challenges in software deployment and security. Docker is a proven and extremely reliable tool for delivering software to production. Docker images are repeatable and cross-platform deployment packages. The Docker container isolates software execution to eliminate large categories of security issues.

With new AI tools, such as the Docker Model Runner and MCP Toolkit, Docker continues to address emerging challenges in AI portability, discoverability, and security.

The easiest, most reliable, and most secure way to set up your own EchoKit servers is to use Docker.

Learn more

Check out the GitHub repo for EchoKit
Explore the MCP Catalog: Discover containerized, security-hardened MCP servers.
Get started with the MCP Toolkit: Run MCP servers easily and securely.
Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Highlights from AWS re:Invent: Supercharging Kiro with Docker Sandboxes and MCP Catalog

Jin Kim — Fri, 12 Dec 2025 20:22:05 +0000

At the recent AWS re:Invent, Docker focused on a very real developer problem: how to run AI agents locally without giving them access to your machine, credentials, or filesystem.

With AWS introducing Kiro, Docker demonstrated how Docker Sandboxes and MCP Toolkit allow developers to run agents inside isolated containers, keeping host environments and secrets out of reach. The result is a practical setup where agents can write code, run tests, and use tools safely, while you stay focused on building, not cleaning up accidental damage.

Local AI Agents, Isolation, and Docker at AWS re:Invent

Two weeks ago, a Reddit user posted how their filesystem was accidentally deleted by Google Antigravity. And the top comment?

Alright no more antigravity outside of a container

And another user’s home directory was recently wiped using Claude Code this past week. And yet another top comment:

That’s exactly why Claude code should be used only inside an isolated container or vm

We agree that this should never happen and that containers provide the proper isolation and segmentation.

At AWS re:Invent 2025, we were able to show off this vision using Kiro running in our new Docker sandboxes, using MCP servers provided by the Docker MCP Toolkit.

If you weren’t able to attend or visit us at the booth, fear not! I’ll share the demo with you.

Jim Clark, one of Docker’s Principal Engineers, providing a demo of running an secured AI development environment using Docker’s sandboxes and MCP Toolkit

Giving Kiro safety guardrails

Docker Sandboxes provide the ability to run an agent inside an isolated environment using containers. In this environment, the agent has no access to credentials stored on the host and can only access the files of the specified project directory.

As an example, I have some demo AWS credentials on my machine:

> cat ~/.aws/credentials
[default]
aws_access_key_id=demo_access_key
aws_secret_access_key=demo_secret_key

Now, I’m going to clone the Catalog Service demo project and start a sandbox using Kiro:

git clone https://github.com/dockersamples/catalog-service-node.git
cd catalog-service-node
docker sandbox run --mount-docker-socket kiro

The --mount-docker-socket flag is added to give the sandbox the Docker socket, which will allow the agent to run my integration tests that use Testcontainers.

On the first launch, I will be required to authenticate. After that’s done, I will ask Kiro to tell me about the AWS credentials it has access to:

     ⢀⣴⣶⣶⣦⡀⠀⠀⠀⢀⣴⣶⣦⣄⡀⠀⠀⢀⣴⣶⣶⣦⡀⠀⠀⢀⣴⣶⣶⣶⣶⣶⣶⣶⣶⣶⣦⣄⡀⠀⠀⠀⠀⠀⠀⢀⣠⣴⣶⣶⣶⣶⣶⣦⣄⡀⠀⠀⠀
    ⢰⣿⠋⠁⠈⠙⣿⡆⠀⢀⣾⡿⠁⠀⠈⢻⡆⢰⣿⠋⠁⠈⠙⣿⡆⢰⣿⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠻⣦⠀⠀⠀⠀⣴⡿⠟⠋⠁⠀⠀⠀⠈⠙⠻⢿⣦⠀⠀
    ⢸⣿⠀⠀⠀⠀⣿⣇⣴⡿⠋⠀⠀⠀⢀⣼⠇⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⢠⣤⣤⣤⣤⣄⠀⠀⠀⠀⣿⡆⠀⠀⣼⡟⠀⠀⠀⠀⣀⣀⣀⠀⠀⠀⠀⢻⣧⠀
    ⢸⣿⠀⠀⠀⠀⣿⡿⠋⠀⠀⠀⢀⣾⡿⠁⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⢸⣿⠉⠉⠉⣿⡇⠀⠀⠀⣿⡇⠀⣼⡟⠀⠀⠀⣰⡿⠟⠛⠻⢿⣆⠀⠀⠀⢻⣧
    ⢸⣿⠀⠀⠀⠀⠙⠁⠀⠀⢀⣼⡟⠁⠀⠀⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⢸⣿⣶⣶⡶⠋⠀⠀⠀⠀⣿⠇⢰⣿⠀⠀⠀⢰⣿⠀⠀⠀⠀⠀⣿⡆⠀⠀⠀⣿⡆
    ⢸⣿⠀⠀⠀⠀⠀⠀⠀⠀⠹⣷⡀⠀⠀⠀⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⣼⠟⠀⢸⣿⠀⠀⠀⢸⣿⠀⠀⠀⠀⠀⣿⡇⠀⠀⠀⣿⡇
    ⢸⣿⠀⠀⠀⠀⠀⣠⡀⠀⠀⠹⣷⡄⠀⠀⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⠀⣤⣄⠀⠀⠀⠀⠹⣿⡅⠀⠀⠸⣿⠀⠀⠀⠸⣿⠀⠀⠀⠀⠀⣿⠇⠀⠀⠀⣿⠇
    ⢸⣿⠀⠀⠀⠀⣾⡟⣷⡀⠀⠀⠘⣿⣆⠀⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⠀⣿⡟⣷⡀⠀⠀⠀⠘⣿⣆⠀⠀⢻⣧⠀⠀⠀⠹⣷⣦⣤⣤⣾⠏⠀⠀⠀⣼⡟
    ⢸⣿⠀⠀⠀⠀⣿⡇⠹⣷⡀⠀⠀⠈⢻⡇⠀⢸⣿⠀⠀⠀⠀⣿⡇⢸⣿⠀⠀⠀⠀⣿⡇⠹⣷⡀⠀⠀⠀⠈⢻⡇⠀⠀⢻⣧⠀⠀⠀⠀⠉⠉⠉⠀⠀⠀⠀⣼⡟
    ⠸⣿⣄⡀⢀⣠⣿⠇⠀⠙⣷⡀⠀⢀⣼⠇⠀⠸⣿⣄⡀⢀⣠⣿⠇⠸⣿⣄⡀⢀⣠⣿⠇⠀⠙⣷⡀⠀⠀⢀⣼⠇⠀⠀⠀⠻⣷⣦⣄⡀⠀⠀⠀⢀⣠⣴⣾⠟
    ⠀⠈⠻⠿⠿⠟⠁⠀⠀⠀⠈⠻⠿⠿⠟⠁⠀⠀⠈⠻⠿⠿⠟⠁⠀⠀⠈⠻⠿⠿⠟⠁⠀⠀⠀⠈⠻⠿⠿⠟⠁⠀⠀⠀⠀⠀⠈⠙⠻⠿⠿⠿⠿⠟⠋⠁
Model: Auto (/model to change) | Plan: KIRO FREE (/usage for more detail)

!> Tell me about the AWS credentials you have access to

From here, Kiro will search the typical places AWS credentials are configured. But, finally, it reaches the following conclusion:

Currently, there are no AWS credentials configured on your system

And why is this? The credentials on the host are not accessible inside the sandbox environment. The agent is in the isolated environment and only has access to the current project directory.

Giving Kiro secure tools with the MCP Toolkit

If we take a step back and think about it, the only credential an agent should have access to is to authenticate with the model provider. All other credentials belong to the tools (or MCP servers) around the agent.

And that’s where the MCP Toolkit comes in!

Sandboxes don’t yet have an automatic way to connect to the MCP Toolkit (it’s coming soon!). Until that’s available I will start a MCP Gateway with the following command:

docker mcp gateway run --transport=streaming

There are a variety of ways to configure Kiro with MCP servers, but the project-level configuration provides an easy way that also works with sandboxes.

In the project, I will create a .kiro/settings/mcp.json file with the following contents:

{
  "mcpServers": {
    "docker-mcp-toolkit": {
      "type": "http",
      "url": "http://host.docker.internal:8811/"
    }
  }
}

After restarting Kiro, I can ask it about the available tools:

/tools

The output then shows the following tools coming from the MCP Gateway:

docker-mcp-toolkit (MCP)
- code-mode             trusted
- mcp-add               trusted
- mcp-config-set        trusted
- mcp-create-profile    trusted
- mcp-exec              trusted
- mcp-find              trusted
- mcp-remove            trusted

These tools highlight the ability to dynamically add and remove MCP servers through the Gateway.

By using an AGENTS.md file, I can provide to the agent on which MCP servers to use and give an additional nudge to remove those servers when they’re no longer needed (which will reduce my context size and costs). This is what my file looks like:

# Special instructions

- When info is needed from GitHub, use the `github-official` MCP server. This will ensure proper auth tokens are used.
- When adding MCP servers, be sure to activate all tools.
- When you have the required info from any MCP server, remove the MCP server to reduce the number of tools in the context.

Before Kiro can use the GitHub MCP server, I need to ensure I’ve configured the MCP server with my OAuth credentials.

Screenshot of the Docker Desktop dashboard showing the GitHub Official MCP server listing within the MCP Toolkit. Specifically, it shows the OAuth configuration has been completed.

Putting it all together

With all of the setup completed, now I’m ready to have Kiro do some work for me. I’ll simply ask it to complete one of the issues on GitHub. I can simply ask:

> Can you help me complete the work for issue #64?

Watching the output, I’ll first see Kiro add the github-official MCP server by using the mcp-add tool.

Running tool mcp-add with the param (from mcp server: docker-mcp-toolkit)
 ⋮  {
 ⋮    "name": "github-official",
 ⋮    "activate": true
 ⋮  }
 - Completed in 0.609s

From there, I’ll see that Kiro will check the git remotes to determine the GitHub organization and repository being used:

> I need to determine the repository owner and name. Let me check the current directory for git information:
I will run the following command: git remote -v (using tool: shell)
Purpose: Get git remote information to identify the repository

origin  https://github.com/dockersamples/catalog-service-node.git (fetch)
origin  https://github.com/dockersamples/catalog-service-node.git (push)
 - Completed in 0.6s

Then, it will use the GitHub MCP server to get the issue details:

> Now I can get the issue details:
Running tool issue_read with the param (from mcp server: docker-mcp-toolkit)
 ⋮  {
 ⋮    "method": "get",
 ⋮    "owner": "dockersamples",
 ⋮    "repo": "catalog-service-node",
 ⋮    "issue_number": 64
 ⋮  }
 - Completed in 0.613s

> Perfect! Issue #64 is about adding product metadata to S3 objects.

From here, Kiro will go and make the changes to my source code, update the integration tests, and even run the integration tests to ensure it all works. After it’s completed, I have the following output:

I've successfully completed the work for issue #64. Here's what was implemented:

Changes Made:

1. StorageService.js: 
   - Added Tagging parameter to S3 PutObjectCommand with product_id=${id}
   - Added Metadata parameter with product_name: productName
   - Added new AWS SDK imports for GetObjectTaggingCommand and HeadObjectCommand
   - Created helper functions getObjectTags() and getObjectMetadata()

2. ProductService.js:
   - Modified uploadProductImage() to fetch product name from database and pass it to uploadFile()
   - Exposed new functions getProductImageTags() and getProductImageMetadata()

3. Integration Test:
   - Added validation to verify S3 objects have correct tags (product_id with the product ID)
   - Added validation to verify S3 objects have correct metadata (product_name with the product name)

Test Results:
All integration tests pass, confirming that:
- S3 objects are tagged with the product identifier
- S3 objects include metadata with the product name
- The existing functionality remains intact

And notably, right before that, I see that Kiro automatically removed the GitHub MCP server to remove its tools from my context:

Let me clean up by removing the GitHub MCP server since we're done with it:
Running tool mcp-remove with the param (from mcp server: docker-mcp-toolkit)
 ⋮  {
 ⋮    "name": "github-official"
 ⋮  }
 - Completed in 0.2s

With those tools removed from the context, the model has less to tokenize and process which means faster responses and less cost.

Highlighting what’s important

Taking a step back at what we ran, we have the following:

An agent in an isolated environment. With the agent running in a container, it’s unable to access and leak credentials stored on my host machine. And rogue requests to delete my filesystem are limited to the containerized environment where it’s running as a non-root user.
Isolated and containerized MCP servers. Each MCP server runs in its isolated container, preventing host access. In addition, I don’t have to spend any time worrying about runtime environments or configuration. With a container, “it just works!”
API credentials only where they’re needed. The only component that needs access to my GitHub credential is the GitHub MCP server, where it is securely injected. This approach further prevents potential leaks and exposures.

In other words, we have a microserviced architecture where each component runs in its own container and follows least privilege by having access to only the things it needs access to.

Looking forward

Here at Docker, we’re quite excited about this architecture and there’s still a lot to do. Two items I’m excited about include:

A network boundary for agentic workloads. This boundary would limit network access to only authorized hostnames. Then, if a prompt injection tries to send sensitive information to evildomain.com, that request is blocked.
Governance and control for organizations. With this, your organization can authorize the MCP servers that are used and even create its own custom catalogs and rule sets.

If you want to try out Sandboxes, you can do so by enabling the Experimental Feature in Docker Desktop 4.50+. We’d love to hear your feedback and thoughts!

Learn more

Docker Sandboxes: Simplifies running AI agents securely on your local machine
Explore the MCP Catalog: Discover containerized, security-hardened MCP servers.
Get started with the MCP Toolkit: Run MCP servers easily and securely.

Breaking Free From AI Vendor Lock-in: Integrating GitHub Models with Docker cagent

Jennifer Kohl — Thu, 11 Dec 2025 14:00:00 +0000

The landscape of AI development is rapidly evolving, and one of the most exciting developments in 2025 from Docker is the release of Docker cagent. cagent is Docker’s open-source multi-agent runtime that orchestrates AI agents through declarative YAML configuration. Rather than managing Python environments, SDK versions, and orchestration logic, developers define agent behavior in a single configuration file and execute it with “cagent run”.

In this article, we’ll explore how cagent’s integration with GitHub Models delivers true vendor independence, demonstrate building a real-world podcast generation agent that leverages multiple specialized sub-agents, and show you how to package and distribute your AI agents through Docker Hub. By the end, you’ll understand how to break free from vendor lock-in and build AI agent systems that remain flexible, cost-effective, and production-ready throughout their entire lifecycle.

What is Docker cagent?

cagent is Docker’s open-source multi-agent runtime that orchestrates AI agents through declarative YAML configuration. Rather than managing Python environments, SDK versions, and orchestration logic, developers define agent behavior in a single configuration file and execute it with “cagent run”.

Some of the key features of Docker cagents:

Declarative YAML Configuration: single-file agent definitions with model configuration, clear instructions, tool access, and delegation rules to interact and coordinate with sub-agents
Multi-Provider Support: OpenAI, Anthropic, Google Gemini, and Docker Model Runner (DMR) for local inference.
MCP Integration support: Leverage MCP (Stdio, HTTP, SSE) for connecting external tools and services
Secured Registry Distribution: Package and share agents securely via Docker Hub using standard container registry infrastructure.
Built-In Reasoning Tools: “think’, “todo” and “memory” capabilities for complex problem solving workflows.

The core value proposition is simple: declare what your agent should do, and cagent handles your execution. Each agent operates with isolated context, specialized tools via the Model Context Protocol (MCP), and configurable models. Agents can delegate tasks to sub-agents, creating hierarchical teams that mirror human organizational structures.

What are GitHub Models?

GitHub Models is a suite of developer tools that take you from AI idea to deployment, including a model catalog, prompt management, and quantitative evaluations.GitHub Models provides rate-limited free access to production-grade language models from OpenAI (GPT-4o, GPT-5, o1-preview), Meta (Llama 3.1, Llama 3.2), Microsoft (Phi-3.5), and DeepSeek models.The advantage with GitHub Models are you need to Authenticate only once via GitHub Personal Access Tokens and you can plug and play any models of your choice supported by GitHub Models.

You can browse to GitHub Marketplace at https://github.com/marketplace to see the list of all models supported. Currently GitHub supports all the popular models and the list continues to grow. Recently, Anthropic Claude models were also added.

Figure 1.1: GitHub Marketplace displaying list of all models available on the platform

GitHub has designed its platform, including GitHub Models and GitHub Copilot agents, to support production-level agentic AI workflows, offering the necessary infrastructure, governance, and integration points.GitHub Models employs a number of content filters. These filters cannot be turned off as part of the GitHub Models experience. If you decide to employ models through Azure AI or a paid service, please configure your content filters to meet your requirements.

To get started with GitHub Models, visit https://docs.github.com/en/github-models/quickstart which contains detailed quick start guides.

Configuring cagent with GitHub Models

GitHub Models OpenAI-compatible API allows straightforward integration with cagent by treating it as a custom OpenAI provider with modified base URL and authentication.

In this article, we will create and deploy a PodCast Generator agent using Github models and show you how easy it is to deploy and share AI agents by deploying it to Docker Hub registry. It is necessary to create a fine-grained personal access token by navigating to this url: https://github.com/settings/personal-access-tokens/new

Figure 1.2: Generating a new personal access token (PAT) from GitHub developer settings.

Prerequisites

Docker Desktop 4.49+ with MCP Toolkit enabled
GitHub Personal Access Token with models scope
Download cagent binary from https://github.com/docker/cagent repository. Place it inside the folder C:\Dockercagent. Run .\cagent-exe –help to see more options.

Define your agent

I will showcase a simple podcast generator agent, which I created months ago during my testing of Docker cagent. This Agent’s purpose is to generate podcasts by sharing blogs/articles/youtube videos.

Below Podcastgenerator yaml file describes a sophisticated multi-agent workflow for automated podcast production, leveraging GitHub Models and MCP tools (DuckDuckGo) for external data access. The DuckDuckGo MCP server runs in an isolated Docker container managed by the MCP gateway. To learn more about docker MCP server and MCP Gateway refer to official product documentation at https://docs.docker.com/ai/mcp-catalog-and-toolkit/mcp-gateway/.
The root agent uses sub_agents: [“researcher”, “scriptwriter”] to create a hierarchical structure where specialized agents handle domain-specific tasks.

sunnynagavo55_podcastgenerator.yaml

#!/usr/bin/env cagent run

agents:
  root:
    description: "Podcast Director - Orchestrates the entire podcast creation workflow and generates text file"
    instruction: |
     You are the Podcast Director responsible for coordinating the entire podcast creation process.

      Your workflow:
      1. Analyze input requirements (topic, length, style, target audience)
      2. Delegate research to the research agent which can open duck duck go browser for researching
      3. Pass the researched information to the scriptwriter for script creation
      4. Output is generated as a text file which can be saved to file or printed out
      5. Ensure quality control throughout the process

      Always maintain a professional, engaging tone and ensure the final podcast meets broadcast standards.
    model: github-model
    toolsets:
      - type: mcp
        command: docker
        args: ["mcp", "gateway", "run", "--servers=duckduckgo"]
    sub_agents: ["researcher", "scriptwriter"]
  researcher:
    model: github-model
    description: "Podcast Researcher - Gathers comprehensive information for podcast content"
    instruction: |
      You are an expert podcast researcher who gathers comprehensive, accurate, and engaging information.

      Your responsibilities:
      - Research the given topic thoroughly using web search
      - Find current news, trends, and expert opinions
      - Gather supporting statistics, quotes, and examples
      - Identify interesting angles and story hooks
      - Create detailed research briefs with sources
      - Fact-check information for accuracy

      Always provide well-sourced, current, and engaging research that will make for compelling podcast content.
    toolsets:
        - type: mcp
          command: docker
          args: ["mcp", "gateway", "run", "--servers=duckduckgo"]
  scriptwriter:
    model: github-model
    description: "Podcast Scriptwriter - Creates engaging, professional podcast scripts"
    instruction: |
      You are a professional podcast scriptwriter who creates compelling, conversational content.

      Your expertise:
      - Transform research into engaging conversational scripts
      - Create natural dialogue and smooth transitions
      - Add hooks, sound bite moments, and calls-to-action
      - Structure content with clear intro, body, and outro
      - Include timing cues and production notes
      - Adapt tone for target audience and podcast style
      - Create multiple format options (interview, solo, panel discussion)

      Write scripts that sound natural when spoken and keep listeners engaged throughout.
    toolsets:
      - type: mcp
        command: docker
        args: ["mcp", "gateway", "run", "--servers=filesystem"]
models:
  github-model:
    provider: openai
    model: openai/gpt-5
    base_url: https://models.github.ai/inference
    env:
      OPENAI_API_KEY: ${GITHUB_TOKEN}

Note: Since we are using DuckDuckGo MCP server, make sure to add and install this MCP server from MCP catalog on your docker desktop

Running your Agent on Local Machine

Make sure to update your GitHub PAT token and Run the below command to run your agent from the root folder where your cagent binaries reside.

cagent run ./sunnynagavo55_podcastgenerator.yaml

Pushing your Agent as Docker Image

Run the below command to push your agent as a docker image to your favorite registry to share it with your team.

cagent push Sunnynagavo55/Podcastgenerator

You can see your published images inside your repositories as shown below.

Congratulations! Now we have our first AI Agent created using cagent and deployed to Docker Hub.

Pulling your Agent as Docker Image on a different machine

Run the below command to pull your docker image agent, created by your teammate, which gets the agent yaml file and saves it in the current directory.

cagent pull Sunnynagavo55/Podcastgenerator

Alternatively, you can run the same agent directly without pulling the image by using the below command.

cagent run Sunnynagavo55/Podcastgenerator

Note: Above Podcastgenerator example agent has been added to Docker/cagent GitHub repository under examples folder. Give it a try and share your experience. https://github.com/docker/cagent/blob/main/examples/podcastgenerator_githubmodel.yaml

Conclusion

The traditional AI development workflow locks you into specific providers, requiring separate API keys, managing multiple billing accounts, and navigating vendor-specific SDKs. cagent with GitHub Models fundamentally changes this equation by combining Docker’s declarative agent framework with GitHub’s unified model marketplace. This integration grants you true vendor independence—a single GitHub Personal Access Token provides access to models from OpenAI, Meta, Microsoft, Anthropic, and DeepSeek, eliminating the friction of managing multiple credentials and authentication schemes.

The future of AI development isn’t about choosing a vendor and committing to their ecosystem. Instead, it’s about building systems flexible enough to adapt as the landscape evolves, new models emerge, and your business requirements change. cagent and GitHub Models make that architectural freedom possible today.

What are you waiting for? Start building now with the power of cagent and GitHub Models and share your story with us.

Resources

To learn more about docker cagent, read the product documentation from https://docs.docker.com/ai/cagent/

For more information about cagent, see the GitHub repository. Give this repository a star and let us know what you build.

You Want Microservices, But Do You Really Need Them?

Simeon Ratliff — Fri, 28 Nov 2025 14:00:00 +0000

Do you know who managed to cut costs by a staggering 90% by abandoning microservices for a monolith in May 2023? Not a cash-strapped startup or an indie project—Amazon itself, for its Prime Video service. The same AWS that earns billions every year by selling microservices infrastructure admitted that, sometimes, a good old monolith wins.

This reversal from the company that practically wrote the playbook on distributed systems sent shockwaves through the cloud-native community. Amazon later removed the original blog post, but the internet never forgets, as you’ll see later.

I’ve been speaking up against unnecessary or premature use of microservices architecture for five, six years now. After Amazon Prime Video went back to a monolith, I came across several eminent architects who are also speaking against microservices as default.

And yet in most tech circles, microservices are still viewed as the only way to build modern software. They dominate conferences, blogs, and job listings. Teams adopt them not because their requirements justify it, but because it feels like the obvious (and résumé-boosting) choice. “Cloud-native” has become synonymous with “microservices-by-default”, as if other approaches are as obsolete as floppy disks.

Microservices do solve real problems, but at a massive scale. Most teams don’t actually operate at that scale.

With this article, I urge you to reflect on the question the industry has mostly stopped asking: Should microservices be the default choice for building at scale? We’ll look at reversal stories and insights from seasoned architects, and weigh the trade-offs and alternatives. After considering all of this, you can decide whether your problem really needs a constellation of microservices.

Microservices: The Agility-Complexity Trade-Off

On paper, microservices look impressive. Instead of one big monolith, you split your application into many small services. Each one can be written in any language, owned by a small team, and deployed on its own schedule. If you need more capacity, you can scale only the part that’s under load. The promise is elegant: independent deployability, autonomous teams, multi-language stacks, and elastic scaling.

But the catch is that every split creates a seam, and every seam is a potential failure point. Inside a monolith, function calls are instant and predictable. Across services, those same calls become network requests: slower, failure-prone, sometimes returning inconsistent data. With dozens (or hundreds) of services, you need version management, schema evolution, distributed transactions, tracing, centralized logging, and heavy-duty CI/CD pipelines just to keep things running.

This Gartner diagram captures the trade-off perfectly: microservices exchange the simplicity of one codebase for the complexity of many.

At a massive scale (think Netflix), that trade-off may be worth it. But when operational benefits don’t outweigh the costs, teams end up paying a steep price in debugging, coordination, and glue code just to hold their product together.

Microservices make sense in very specific scenarios where distinct business capabilities need independent scaling and deployment. For example, payment processing (security-critical, rarely updated) differs fundamentally from recommendation engine (memory-intensive, constantly A/B tested). These components have different scaling patterns, deployment cycles, and risk profiles, which justify separate services.

The success of microservices hinges on clear business domain boundaries that match your team structure, as Conway’s Law predicts. If your organization naturally splits into autonomous teams that own distinct capabilities, microservices might work. (So, most “one-and-a-half pizza” startups don’t qualify, do they?)

That’s why microservices work effectively for companies like Amazon and Uber—although not always.

In fact, most organizations lack the prerequisites: dedicated service ownership, mature CI/CD, robust monitoring, and crucially, scale that justifies the operational overhead. Startups that adapt microservices prematurely often regret their decision.

So ask yourself:

Are you using microservices to solve an independent scaling problem, or are you inviting more complexity than your solution needs?

The Great Microservices Reversal

Ironically, even though tech giants are the ones that are most likely to benefit from microservices, many of these very same companies are walking back their microservices architectures, and the results are eye-opening.

Amazon Prime Video: 90% Cost Reduction with a Monolith

In May 2023, Amazon engineers admitted the unthinkable: Prime Video had abandoned microservices for a monolith. Their Video Quality Analysis (VQA) team had built what looked like a textbook distributed system: AWS Step Functions and Lambda monitored thousands of video streams through independent, scalable components. On paper, it was serverless perfection.

In practice, it was a disaster. “We realized that distributed approach wasn’t bringing a lot of benefits in our specific use case,” said Marcin Kolny in the now-archived Prime Video Engineering blog. Their “infinitely scalable” system crumbled at just 5% of expected load due to orchestration overhead.

The fix was embarrassingly simple: collapse everything into a single process. It resulted in 90% lower costs and faster performance.

Twilio Segment: From 140 Services to One Fast Monolith

Back in 2018, Twilio Segment, a customer data platform, documented a similar reversal in their brutally honest post “Goodbye Microservices”.

Their system had sprawled into 140+ services, creating operational chaos. At one point, three full-time engineers spent most of their time firefighting instead of building. As they admitted, “Instead of enabling us to move faster, the small team found themselves mired in exploding complexity. Essential benefits of this architecture became burdens. As our velocity plummeted, our defect rate exploded.”

Their solution was radical but effective: collapse all 140+ services into a single monolith. The impact was immediate. Test suites that once took an hour now finished in milliseconds. Developer productivity soared: they shipped 46 improvements to shared libraries in a year, up from 32 in the microservices era.

Shopify: Sanity over Hype

Shopify runs one of the largest Ruby on Rails codebases in the world (2.8M+ lines). Instead of chasing microservices, they deliberately chose a modular monolith: a single codebase with clear component boundaries.

Shopify’s engineers concluded that “microservices would bring their own set of challenges”, so they chose modularity without the operational overhead.

All these examples beg the question:

If even the pioneers of microservices are retreating, why are we still treating it as gospel?

Expert Voices against Microservices Mania

Some of the most respected voices in software architecture—people behind many of the systems we all admire—are also cautioning against microservices and repeating mistakes they’ve seen play out at scale. (After all, cheerleaders don’t play the game; cloud DevRels rarely build at scale.)

Rails Creator: Simplicity over Sophistication

David Heinemeier Hansson (DHH), the creator of Ruby on Rails, has long advocated simplicity over architectural trends. His analysis of the Amazon Prime Video reversal puts it bluntly:

“The real-world results of all this theory are finally in, and it’s clear that in practice, microservices pose perhaps the biggest siren song for needlessly complicating your system.”

DHH’s image of a siren song is apt: microservices promise elegance but leave teams wrecked on the rocks of complexity.

Microservices: Mistake of The Decade?

Jason Warner, former CTO of GitHub, doesn’t mince words while commenting on microservices:

“I’m convinced that one of the biggest architectural mistakes of the past decade was going full microservice.”

Warner understands scale: GitHub runs at internet scale, and he’s led engineering at Heroku and Canonical. His critique cuts deeper because it’s lived experience, beyond theoretical advice:

“90% of all companies in the world could probably just be a monolith running against a primary db cluster with db backups, some caches and proxies and be done with it.”

GraphQL Co-Creator: “Don’t”

Then there’s Nick Schrock, co-creator of GraphQL. If anyone had a reason to cheer for distributed systems, it’d be him. Instead, he says:

“Microservices are such a fundamentally and catastrophically bad idea that there are going to be an entire cohort of multi-billion companies built that do nothing but contain the damage that they have wrought.”

He goes on to describe microservices as organizational gambles:

“[Y]ou end up with these services that you have to maintain forever that match the org structure and the product requirements from five years ago. Today, they don’t make a lot of sense.”

The person who literally built tools to fix distributed system pain says don’t distribute unless you must, maybe it’s time to listen.

The Hidden Costs of Microservices

Microservices demand such caution because of these hidden costs that teams often underestimate.

Operational Costs

A monolith is simple: in-process function calls.

Microservices replace that with networks. Every request now travels across machines, through load balancers, service meshes, and authentication layers, creating more failure points and infrastructure needs. You suddenly need service discovery (how services find each other), distributed tracing (tracking requests across services), centralized logging (aggregating logs from multiple services), and monitoring systems that understand service topology.

Each of these is necessary, but together they’re complex and expensive. Duplicated data requires extra storage. Constant service-to-service calls rack up network egress fees. Cloud costs scale faster than the apps they host. Prime Video’s workflow spent more on orchestrating S3 data transfers between services than on actual processing.

Developer Productivity Drain

In microservices, the hard part isn’t writing code; it’s navigating distributed system interactions.

In “The macro problem with microservices“, Stack Overflow identifies a critical productivity drain: distributed state forces developers to write defensive code that constantly checks for partial failures.

In a monolith, a developer can follow a code path end-to-end within one repo. In microservices, one feature might span four or five repos with different dependencies and deploy cycles. Adding a single field triggers weeks of coordination: you need to update one service, then wait for consumers to adopt, version your APIs, manage rollouts, and so on. Different teams will also typically maintain different microservices using different tech stacks, so there’s a risk that they unintentionally break something as well. Breaking changes that a compiler would catch in a monolith now surface as runtime errors in production.

Testing and Deployment Complexity

Monolith integration and end-to-end tests are faster because they run locally, in memory. Distributed systems don’t allow that luxury: real confidence requires integration and end-to-end tests across numerous service boundaries. So these tests are slower, more brittle, and require staging environments that resemble production, all of which effectively double infrastructure costs and slow feedback loops.

Many teams discover this only after their test suite becomes a bottleneck. Deployment orchestration adds another layer. Rolling updates across interdependent services require careful sequencing to avoid breaking contracts. Version incompatibility disturbs frequently: Service A works with Service B v2.1 but breaks with v2.2.

Failed deployments leave systems partially updated and difficult to recover.

Data Management and Consistency

The most underestimated complexity of microservices lies in data consistency across service boundaries.

Monoliths benefit from ACID transactions: operations complete entirely or fail entirely. Microservices split that across services, forcing you to build distributed saga (multi-step workflows with rollback logic), live with eventual consistency (data only becomes correct after a delay), or write compensation logic (extra code to undo partial failures). What was once a single database transaction now spans network hops, retries, and partial failures. Debugging inconsistent orders or payments gets much harder when state is duplicated across services.

As research confirms, data duplication, correctness challenges, and transactional complexity are the top pain points in microservice systems.

The Compounding Effect

These complexities multiply. Operational overhead makes debugging harder, which slows testing, which makes deployments riskier, which creates more incidents. Microservices don’t just shift complexity from code to operations; they tax every part of your engineering process.

Unless your scale demands it, that tax often outweighs the benefits.

Think about it:

If every network hop adds complexity and cost, does your use case really justify the price?

Beyond Microservices: Smarter Architectural Alternatives

Before defaulting to microservices, it’s worth considering how simpler, well-structured architectures can deliver comparable scalability without the distributed complexity tax. Two noteworthy alternatives are modular monoliths and service-oriented architectures.

Modular Monoliths: Structure without Distribution

Unlike traditional monoliths that become tangled messes, modular monoliths enforce strict internal boundaries through clear module APIs and disciplined separation. Each module exposes well-defined interfaces, enabling teams to work independently while deploying a single, coherent system.

As Kent Beck explains in “Monolith -> Services: Theory & Practice”, modular monoliths manage coupling through organizational discipline rather than distributed networks. The key difference: modules still communicate via explicit contracts like microservices, but they use fast, reliable function calls instead of HTTP requests that are vulnerable to network latency and partial failures.

Why does it work?

Simpler operations: microservices-level organization with monolithic simplicity
Stronger consistency: full ACID transactions
Easier debugging: one traceable system, no hunting for bugs in the ELK haystack
Better performance: function calls beat network hops

Here’s some real-world proof: Shopify’s 2.8 million-line codebase handles 30TB per minute with separate teams owning distinct modules, yet everything deploys together. Facebook runs similarly. (And principal architect Keith Adams jokes that if you want to be talked out of microservices, he’s your guy.)

With recent developments in frameworks like Spring Modulith, Django, Laravel, and Rails (as seen at scale with Shopify), modular monoliths are poised to gain wider traction in the years ahead.

Service-Oriented Architecture: The Middle Ground

Service-oriented architecture (SOA) sits between monoliths and microservices, favoring larger, domain-driven services instead of dozens or hundreds of tiny ones. These services often communicate via an enterprise service bus (ESB), which reduces orchestration overhead while preserving separation of concerns.

Instead of splitting authentication, user preferences, and notifications into separate microservices, SOA might combine them into a single “User Service”, simplifying coordination while preserving autonomy and targeted scaling. SOA provides enterprise-grade modularity without ultra-fine-grained distribution overhead.

Here’s why it works:

Right-sized boundaries: fewer, domain-aligned services instead of sprawl
Targeted scalability: scale services tied to real business domains
Pragmatic complexity: avoids ultra-fine-grained overhead while retaining modular reasoning

SOA has also been proven to work at scale. Norwegian Air Shuttle, Europe’s 9th-largest airline, used SOA to boost agility across complex flight operations. Credit Suisse’s SOA rollout powered millions of service calls per day back in the early 2000s.

Choosing Wisely: Fit over Hype

The problem you’re solving should justify your architecture.

I often use this analogy in consulting: You don’t need a sword to cut a lemon—a knife suffices. And as timeless wisdom reminds us, simplicity is the ultimate sophistication.

In all likelihood, you’re not Google (you don’t need Google-level fault tolerance), or Amazon (you don’t need massive write availability), or LinkedIn (you don’t handle billions of events a day). Most applications don’t operate at that scale, demanding fundamentally different solutions than ultra-distributed architectures.

For most systems, well-structured modular monoliths (for most common applications, including startups) or SOA (enterprises) deliver comparable scalability and resilience as microservices, without the distributed complexity tax. Alternatively, you may also consider well-sized services (macroservices, or what Gartner proposed as miniservices) instead of tons of microservices.

It’s worth asking:

If simpler architectures can deliver comparable scalability, why are you choosing the complexity of microservices?

Docker: Built for Any Architecture

Docker isn’t just for microservices—it works great across all kinds of architectures like monoliths, SOA, APIs, and event-driven systems. The real benefit is that Docker gives you consistent performance, easier deployment, and flexibility to scale up your apps no matter what architectural approach you’re using.

Docker packages applications cleanly, keeps environments consistent from laptop to production, simplifies dependency management, and isolates applications from the host system. A Dockerized monolith offers all these benefits, minus the orchestration overhead of microservices.

Microsoft’s guidance on containerizing monoliths clarifies that scaling containers is “far faster and easier than deploying additional VMs”, whether you run one service or fifty. Twilio Segment observed that containerized monoliths can “horizontally scale your environment easily by spinning up more containers and shutting them down when demand subsides.” For many applications, scaling the whole app is exactly what’s needed.

As for DevOps, a monolith in Docker is lighter to operate than a full-blown microservices setup. Logging aggregation becomes simpler when you’re collecting from identical containers rather than disparate services with different formats. Monitoring and debugging remain centralized, and troubleshooting avoids tracing requests across service boundaries.

So, it’s definitely worth considering:

Even without the complexity of microservices, Docker gives you the same advantages — clean deployments, easy scaling, and consistent environments. So why not keep it?

Wrapping Up

A few years ago, my then-8-year-old wanted a bicycle. He’d mostly ride around our apartment complex, maybe venture into the nearby lane. He didn’t need 21 gears, but those shiny shifters had him smitten—imagine riding faster by changing those gears! He absolutely wanted that mechanically complex beauty. (It’s hard to argue with a starry-eyed kid… or a founder :P).

Once he started riding the new bike, the gears slipped, the chain jammed, and the bicycle spent more time broken than on the road. Eventually, we had to dump it.

I wasn’t able to convince him back then that a simpler bicycle could’ve served him better, but maybe this article will convince a few grown-ups making architectural decisions.

We techies love indulging in complex systems. (Check: were you already thinking, What’s complex about bicycles with gears??) But the more moving parts you add, the more often they break. Complexity often creates more problems than it solves.

The point I’m making isn’t to dump microservices entirely—it’s to pick an architecture that fits your actual needs, not what the cloud giant is pushing (while quietly rolling back their own commit). Most likely, modular monoliths or well-designed SOA will serve your needs better and make your team more productive.

So here’s the million-dollar question:

Will you design for cloud-native hype or for your own business requirements?

Do you really need microservices?