AI Agent – Docker https://www.docker.com Thu, 12 Mar 2026 12:58:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 https://www.docker.com/app/uploads/2024/02/cropped-docker-logo-favicon-32x32.png AI Agent – Docker https://www.docker.com 32 32 Building AI Teams: How Docker Sandboxes and Docker Agent Transform Development https://www.docker.com/blog/building-ai-teams-docker-sandboxes-agent/ Wed, 11 Mar 2026 13:00:00 +0000 https://www.docker.com/?p=86679 It’s 11 PM. You’ve got a JIRA ticket open, an IDE with three unsaved files, a browser tab on Stack Overflow, and another on documentation. You’re context-switching between designing UI, writing backend APIs, fixing bugs, and running tests. You’re wearing all the hats, product manager, designer, engineer, QA specialist, and it’s exhausting.

What if instead of doing it all yourself, you could describe the goal and have a team of specialized AI agents handle it for you?

One agent breaks down requirements, another designs the interface, a third builds the backend, a fourth tests it, and a fifth fixes any issues. Each agent focuses on what it does best, working together autonomously while you sip your coffee.That’s not sci-fi, it’s what Agent + Docker Sandboxes delivers today.

image5

What is Docker Agent?

Docker Agent is an open source tool for building teams of specialized AI agents. Instead of prompting one general-purpose model to do everything, you define agents with specific roles that collaborate to solve complex problems.

Here’s a typical dev-team configuration:

agents:
 root:
   model: openai/gpt-5
   description: Product Manager - Leads the development team and coordinates iterations
   instruction: |
     Break user requirements into small iterations. Coordinate designer → frontend → QA.
     - Define feature and acceptance criteria
     - Ensure iterations deliver complete, testable features
     - Prioritize based on value and dependencies
   sub_agents: [designer, awesome_engineer, qa, fixer_engineer]
   toolsets:
     - type: filesystem
     - type: think
     - type: todo
     - type: memory
       path: dev_memory.db
​
 designer:
   model: openai/gpt-5
   description: UI/UX Designer - Creates user interface designs and wireframes
   instruction: |
     Create wireframes and mockups for features. Ensure responsive, accessible designs.
     - Use consistent patterns and modern principles
     - Specify colors, fonts, interactions, and mobile layout
   toolsets:
     - type: filesystem
     - type: think
     - type: memory
       path: dev_memory.db
       
 qa:
   model: openai
   description: QA Specialist - Analyzes errors, stack traces, and code to identify bugs
   instruction: |
     Analyze error logs, stack traces, and code to find bugs. Explain what's wrong and why it's happening.
     - Review test results, error messages, and stack traces
   .......
​
 awesome_engineer:
   model: openai
   description: Awesome Engineer - Implements user interfaces based on designs
   instruction: |
     Implement responsive, accessible UI from designs. Build backend APIs and integrate.
   ..........

 fixer_engineer:
   model: openai
   description: Test Integration Engineer - Fixes test failures and integration issues
   instruction: |
     Fix test failures and integration issues reported by QA.
     - Review bug reports from QA

The root agent acts as product manager, coordinating the team. When a user requests a feature, root delegates to designer for wireframes, then awesome_engineer for implementation, qa for testing, and fixer_engineer for bug fixes. Each agent uses its own model, has its own context, and accesses tools like filesystem, shell, memory, and MCP servers.

Agent Configuration

Each agent is defined with five key attributes:

  • model: The AI model to use (e.g., openai/gpt-5, anthropic/claude-sonnet-4-5). Different agents can use different models optimized for their tasks.
  • description: A concise summary of the agent’s role. This helps Docker Agent understand when to delegate tasks to this agent.
  • instruction: Detailed guidance on what the agent should do. Includes workflows, constraints, and domain-specific knowledge.
  • sub_agents: A list of agents this agent can delegate work to. This creates the team hierarchy.
  • toolsets: The tools available to the agent. Built-in options include filesystem (read/write files), shell (run commands), think (reasoning), todo (task tracking), memory (persistent storage), and mcp (external tool connections).

This configuration system gives you fine-grained control over each agent’s capabilities and how they coordinate with each other.

Why Agent Teams Matter

One agent handling complex work means constant context-switching. Split the work across focused agents instead, each handles what it’s best at. Docker Agent manages the coordination.

The benefits are clear:

  • Specialization: Each agent is optimized for its role (design vs. coding vs. debugging)
  • Parallel execution: Multiple agents can work on different aspects simultaneously
  • Better outcomes: Focused agents produce higher quality work in their domain
  • Maintainability: Clear separation of concerns makes teams easier to debug and iterate
image3

The Problem: Running AI Agents Safely

Agent teams are powerful, but they come with a serious security concern. These agents need to:

  • Read and write files on your system
  • Execute shell commands (npm install, git commit, etc.)
  • Access external APIs and tools
  • Run potentially untrusted code

Giving AI agents full access to your development machine is risky. A misconfigured agent could delete files, leak secrets, or run malicious commands. You need isolation, agents should be powerful but contained.

Traditional virtual machines are too heavy. Chroot jails are fragile. You need something that provides:

  • Strong isolation from your host machine
  • Workspace access so agents can read your project files
  • Familiar experience with the same paths and tools
  • Easy setup without complex networking or configuration

Docker Sandboxes: The Secure Foundation

Docker Sandboxes solves this by providing isolated environments for running AI agents. As of Docker Desktop 4.60+, sandboxes run inside dedicated microVMs, providing a hard security boundary beyond traditional container isolation. When you run docker sandbox run <agent>, Docker creates an isolated microVM workspace that:

  • Mounts your project directory at the same absolute path (on Linux and macOS)
  • Preserves your Git configuration for proper commit attribution
  • Does not inherit environment variables from your current shell session
  • Gives agents full autonomy without compromising your host
  • Provides network isolation with configurable allow/deny lists

Docker Sandboxes now natively supports six agent types: Claude Code, Gemini, Codex, Copilot, Agent, and Kiro (all experimental). Agent can be launched directly as a sandbox agent:

# Run Agent natively in a sandbox
docker sandbox create agent ~/path/to/workspace
docker sandbox run agent ~/path/to/workspace

Or, for more control, use a detached sandbox:

# Create a sandbox
docker sandbox run -d --name my-agent-sandbox claude
​
# Copy agent into the sandbox
docker cp /usr/bin/agent <container-id>:/usr/bin/agent
​
# Run your agent team
docker exec -it <container-id> bash -c "cd /path/to/workspace && agent run dev-team.yaml"

Your workspace /Users/alice/projects/myapp on the host is also /Users/alice/projects/myapp inside the microVM. Error messages, scripts with hard-coded paths, and relative imports all work as expected. But the agent is contained in its own microVM, it can’t access files outside the mounted workspace, and any damage it causes is limited to the sandbox.

image1

Why Docker Sandboxes Matter

The combination of agents and Docker Sandboxes gives you something powerful:

  • Full agent autonomy: Agents can install packages, run tests, make commits, and use tools without constant human oversight
  • Complete safety: Even if an agent makes a mistake, it’s contained within the microVM sandbox
  • Hard security boundary: MicroVM isolation goes beyond containers, each sandbox runs in its own virtual machine
  • Network control: Allow/deny lists let you restrict which external services agents can access
  • Familiar experience: Same paths, same tools, same workflow as working directly on your machine
  • Workspace persistence: Changes sync between host and microVM, so your work is always available

Here’s how the workflow looks in practice:

  1. User requests a feature to the root agent: “Create a bank app with Gradio”
  2. Root creates a todo list and delegates to the designer
  3. Designer generates wireframes and UI specifications
  4. Awesome_engineer implements the code, running pip install gradio and python app/main.py
  5. QA runs tests, finds bugs, and reports them
  6. Fixer_engineer resolves the issues
  7. Root confirms all tests pass and marks the feature complete

All of this happens autonomously inside a sandboxed environment. The agents can install dependencies, modify files, and execute commands, but they’re isolated from your host machine.

image2 1

Try It Yourself

Let’s walk through setting up a simple agent team in a Docker Sandbox.

Prerequisites

  • Docker Desktop 4.60+ with sandbox support (microVM-based isolation)
  • agent (included in Docker Desktop 4.49+)
  • API key for your model provider (Anthropic, OpenAI, or Google)

Step 1: Create Your Agent Team

Save this configuration as dev-team.yaml:

models:
 openai:
   provider: openai
   model: gpt-5
​
agents:
 root:
   model: openai
   description: Product Manager - Leads the development team
   instruction: |
     Break user requirements into small iterations. Coordinate designer → frontend → QA.
   sub_agents: [designer, awesome_engineer, qa]
   toolsets:
     - type: filesystem
     - type: think
     - type: todo
​
 designer:
   model: openai
   description: UI/UX Designer - Creates designs and wireframes
   instruction: |
     Create wireframes and mockups for features. Ensure responsive designs.
   toolsets:
     - type: filesystem
     - type: think
​
 awesome_engineer:
   model: openai
   description: Developer - Implements features
   instruction: |
     Build features based on designs. Write clean, tested code.
   toolsets:
     - type: filesystem
     - type: shell
     - type: think
​
 qa:
   model: openai
   description: QA Specialist - Tests and identifies bugs
   instruction: |
     Test features and identify bugs. Report issues to fixer.
   toolsets:
     - type: filesystem
     - type: think

Step 2: Create a Docker Sandbox

The simplest approach is to use agent as a native sandbox agent:

# Run agent directly in a sandbox (experimental)
docker sandbox run agent ~/path/to/your/workspace

Alternatively, use a detached Claude sandbox for more control:

# Start a detached sandbox
docker sandbox run -d --name my-dev-sandbox claude
​
# Copy agent into the sandbox
which agent  # Find the path on your host
docker cp $(which agent) $(docker sandbox ls --filter name=my-dev-sandbox -q):/usr/bin/agent

Step 3: Set Environment Variables

# Run agent with your API key (passed inline since export doesn't persist across exec calls)
docker exec -it -e OPENAI_API_KEY=your_key_here my-dev-sandbox bash

Step 4: Run Your Agent Team

# Mount your workspace and run agent
docker exec -it my-dev-sandbox bash -c "cd /path/to/your/workspace && agent run dev-team.yaml"

Now you can describe what you want to build, and your agent team will handle the rest:

User: Create a bank application using Python. The bank app should have basic functionality like account savings, show balance, withdraw, add money, etc. Build the UI using Gradio. Create a directory called app, and inside of it, create all of the files needed by the project
​
Agent (root): I'll break this down into iterations and coordinate with the team...

Watch as the designer creates wireframes, the engineer builds the Gradio app, and QA tests it, all autonomously in a secure sandbox.

Final result from a one shot prompt

image2

Step 5: Clean Up

When you’re done:

# Remove the sandbox
docker sandbox rm my-dev-sandbox

Docker enforces one sandbox per workspace. Running docker sandbox run in the same directory reuses the existing container. To change configuration, remove and recreate the sandbox.

Current Limitations

Docker Sandboxes and Docker Agent are evolving rapidly. Here are a few things to know:

  • Docker Sandboxes now supports six agent types natively: Claude Code, Gemini, Codex, Copilot, agent, and Kiro.  All are experimental and breaking changes may occur between Docker Desktop versions.
  • Custom Shell that doesn’t include a pre-installed agent binary. Instead, it provides a clean environment where you can install and configure any agent or tool
  • MicroVM sandboxes require macOS or Windows. Linux users can use legacy container-based sandboxes with Docker Desktop 4.57+
  • API keys may still need manual configuration depending on the agent type
  • Sandbox templates are optimized for certain workflows; custom setups may require additional configuration

Why This Matters Now

AI agents are becoming more capable, but they need infrastructure to run safely and effectively. The combination of agent and Docker Sandboxes addresses this by:

Feature

Traditional Approach

With agent + Docker Sandboxes

Autonomy

Limited – requires constant oversight

High – agents work independently

Security

Risky – agents have host access

Isolated – agents run in microVMs

Specialization

One model does everything

Multiple agents with focused roles

Reproducibility

Inconsistent across machines

MicroVM-isolated, version-controlled

Scalability

Manual coordination

Automated team orchestration

This isn’t just about convenience, it’s about enabling AI agents to do real work in production environments, with the safety guarantees that developers expect.

What’s Next

Conclusion

We’re moving from “prompting AI to write code” to “orchestrating AI teams to build software.” agent gives you the team structure; Docker Sandboxes provides the secure foundation.

The days of wearing every hat as a solo developer are numbered. With specialized AI agents working in isolated containers, you can focus on what matters, designing great software, while your AI team handles the implementation, testing, and iteration.

Try it out. Build your own agent team. Run it in a Docker Sandbox. See what happens when you have a development team at your fingertips, ready to ship features while you grab lunch.

]]>
Run OpenClaw Securely in Docker Sandboxes https://www.docker.com/blog/run-openclaw-securely-in-docker-sandboxes/ Mon, 23 Feb 2026 14:00:00 +0000 https://www.docker.com/?p=85410 Docker Sandboxes is a new primitive in the Docker’s ecosystem that allows you to run AI agents or any other workloads in isolated micro VMs. It provides strong isolation, convenient developer experience and a strong security boundary with a network proxy configurable to deny agents connecting to arbitrary internet hosts. The network proxy will also conveniently inject the API keys, like your ANTHROPIC_API_KEY, or OPENAI_API_KEY in the network proxy so the agent doesn’t have access to them at all and cannot leak them. 

In a previous article I showed how Docker Sandboxes lets you install any tools an AI agent might need, like a JDK for Java projects or some custom CLIs, into a container that’s isolated from the host. Today we’re going a step further: we’ll run OpenClaw, an open-source AI coding agent, on a local model via Docker Model Runner.

No API keys, no cloud costs, fully private. And you can do it in 2-ish commands.

Quick Start

Make sure you have Docker Desktop and that Docker Model Runner is enabled (Settings → Docker Model Runner → Enable), then pull a model:

docker model pull ai/gpt-oss:20B-UD-Q4_K_XL

Now create and run the sandbox:

docker sandbox create --name openclaw -t olegselajev241/openclaw-dmr:latest shell .
docker sandbox network proxy openclaw --allow-host localhost
docker sandbox run openclaw

Inside the sandbox:

~/start-openclaw.sh
Running OpenClaw inside a Docker Sandbox

And that’s it. You’re in OpenClaw’s terminal UI, talking to a local gpt-oss model on your machine. The model runs in Docker Model Runner on your host, and OpenClaw runs completely isolated in the sandbox: it can only read and write files in the workspace you give it, and there’s a network proxy to deny connections to unwanted hosts. 

Cloud models work too

The sandbox proxy will automatically inject API keys from your host environment. If you have ANTHROPIC_API_KEY or OPENAI_API_KEY set, OpenClaw can run cloud models,  just specify them in OpenClaw settings. The proxy takes care of credential injection, so your keys will never be exposed inside the sandbox.

This means you can use free local models for experimentation, then switch to cloud models for serious work all in the same sandbox. With cloud models you don’t even need to allow to proxy to host’s localhost, so don’t run docker sandbox network proxy openclaw --allow-host localhost.

Choose Your Model

The startup script automatically discovers models available in your Docker Model Runner. List them:

~/start-openclaw.sh list

Use a specific model:

~/start-openclaw.sh ai/qwen2.5:7B-Q4_K_M

Any model you’ve pulled with docker model pull is available.

How it works (a bit technical)

The pre-built image (olegselajev241/openclaw-dmr:latest) is based on the shell sandbox template with three additions: Node.js 22, OpenClaw, and a tiny networking bridge.

The bridge is needed because Docker Model Runner runs on your host and binds to localhost:12434. But localhost inside the sandbox means the sandbox itself, not your host. The sandbox does have an HTTP proxy, at host.docker.internal:3128, that can reach host services, and we allow it to reach localhost with docker sandbox network proxy --allow-host localhost.

The problem is OpenClaw is Node.js, and Node.js doesn’t respect HTTP_PROXY environment variables. So we wrote a ~20-line bridge script that OpenClaw connects to at 127.0.0.1:54321, which explicitly forwards requests through the proxy to reach Docker Model Runner on the host:

OpenClaw → bridge (localhost:54321) → proxy (host.docker.internal:3128) → Model Runner (host localhost:12434)

The start-openclaw.sh script starts the bridge, starts OpenClaw’s gateway (with proxy vars cleared so it hits the bridge directly), and runs the TUI.

Build Your Own

Want to customize the image or just see how it works? Here’s the full build process. 

1. Create a base sandbox and install OpenClaw

docker sandbox create --name my-openclaw shell .
docker sandbox network proxy my-openclaw --allow-host localhost
docker sandbox run my-openclaw

Now let’s install OpenClaw in the sandbox:

# Install Node 22 (OpenClaw requires it)
npm install -g n && n 22
hash -r

# Install OpenClaw
npm install -g openclaw@latest

# Run initial setup
openclaw setup

2. Create the Model Runner bridge

This is the magic piece — a tiny Node.js server that forwards requests through the sandbox proxy to Docker Model Runner on your host:

cat > ~/model-runner-bridge.js << 'EOF'
const http = require("http");
const { URL } = require("url");

const PROXY = new URL(process.env.HTTP_PROXY || "http://host.docker.internal:3128");
const TARGET = "localhost:12434";

http.createServer((req, res) => {
  const proxyReq = http.request({
    hostname: PROXY.hostname,
    port: PROXY.port,
    path: "http://" + TARGET + req.url,
    method: req.method,
    headers: { ...req.headers, host: TARGET }
  }, proxyRes => {
    res.writeHead(proxyRes.statusCode, proxyRes.headers);
    proxyRes.pipe(res);
  });
  proxyReq.on("error", e => { res.writeHead(502); res.end(e.message); });
  req.pipe(proxyReq);
}).listen(54321, "127.0.0.1");
EOF

3. Configure OpenClaw to use Docker Model Runner

Now merge the Docker Model Runner provider into OpenClaw’s config:

python3 -c "
import json
p = '$HOME/.openclaw/openclaw.json'
with open(p) as f: cfg = json.load(f)
cfg['models'] = cfg.get('models', {})
cfg['models']['mode'] = 'merge'
cfg['models']['providers'] = cfg['models'].get('providers', {})
cfg['models']['providers']['docker-model-runner'] = {
    'baseUrl': 'http://127.0.0.1:54321/engines/llama.cpp/v1',
    'apiKey': 'not-needed',
    'api': 'openai-completions',
    'models': [{
        'id': 'ai/qwen2.5:7B-Q4_K_M',
        'name': 'Qwen 2.5 7B (Docker Model Runner)',
        'reasoning': False, 'input': ['text'],
        'cost': {'input': 0, 'output': 0, 'cacheRead': 0, 'cacheWrite': 0},
        'contextWindow': 32768, 'maxTokens': 8192
    }]
}
cfg['agents'] = cfg.get('agents', {})
cfg['agents']['defaults'] = cfg['agents'].get('defaults', {})
cfg['agents']['defaults']['model'] = {'primary': 'docker-model-runner/ai/qwen2.5:7B-Q4_K_M'}
cfg['gateway'] = {'mode': 'local'}
with open(p, 'w') as f: json.dump(cfg, f, indent=2)
"

4. Save and share

Exit the sandbox and save it as a reusable image:

docker sandbox save my-openclaw my-openclaw-image:latest

Push it to a registry so anyone can use it:

docker tag my-openclaw-image:latest yourname/my-openclaw:latest
docker push yourname/my-openclaw:latest

Anyone with Docker Desktop (with the modern sandboxes includes) can spin up the same environment with:

docker sandbox create --name openclaw -t yourname/my-openclaw:latest shell .

What’s next

Docker Sandboxes make it easy to run any AI coding agent in an isolated, reproducible environment. With Docker Model Runner, you get a fully local AI coding setup: no cloud dependencies, no API costs, and complete privacy.

Try it out and let us know what you think.

]]>
The Multi-Model Database for AI Agents: Deploy SurrealDB with Docker Extension https://www.docker.com/blog/deploy-surrealdb-docker-desktop-extension/ Tue, 17 Feb 2026 14:00:00 +0000 https://www.docker.com/?p=85256 When it comes to building dynamic and real-work solutions, developers need to stitch multiple databases (relational, document, graph, vector, time-series, search) together and build complex API layers to integrate them. This generates significant complexity, cost, and operational risk, and reduces speed of innovation. More often than not, developers end up focusing on building glue code and managing infrastructure rather than building application logic. For AI use cases, using multiple databases means AI Agents have fragmented data, context and memory, producing bad outputs at high latency.

Enter SurrealDB.

SurrealDB is a multi-model database built in Rust that unifies document, graph, relational, time-series, geospatial, key-value, and vector data into a single engine. Its SQL-like query language, SurrealQL, lets you traverse graphs, perform vector search, and query structured data – all in one statement.

Designed for data-intensive workloads like AI agent memory, knowledge graphs, real-time applications, and edge deployments, SurrealDB runs as a single binary anywhere: embedded in your app, in the browser via WebAssembly, at the edge, or as a distributed cluster.

What problem does SurrealDB solve?

Modern AI systems place very different demands on data infrastructure than traditional applications. SurrealDB addresses these pressures directly:

  • Single runtime for multiple data models – AI systems frequently combine vector search, graph traversal, document storage, real-time state, and relational data in the same request path. SurrealDB supports these models natively in one engine, avoiding brittle cross-database APIs, ETL pipelines, and consistency gaps.
  • Low-latency access to changing context – Voice agents, interactive assistants, and stateful agents are sensitive to both latency and data freshness. SurrealDB’s query model and real-time features serve up-to-date context without polling or background sync jobs.
  • Reduced system complexity – Replacing multiple specialized databases with a single multi-model store reduces services, APIs, and failure modes. This simplifies deployment, debugging, and long-term maintenance.
  • Faster iteration on data-heavy features – Opt in schemas definitions and expressive queries let teams evolve data models alongside AI features without large migrations. This is particularly useful when experimenting with embeddings, relationships, or agent memory structures.
  • Built-in primitives for common AI patterns – Native support for vectors, graphs, and transactional consistency enables RAG, graph-augmented retrieval, recommendation pipelines, and agent state management – without external systems or custom glue code.

In this article, you’ll see how to build a WhatsApp RAG chatbot using SurrealDB Docker Extension. You’ll learn how SurrealDB Docker Extension powers an intelligent WhatsApp chatbot that turns your chat history into searchable, AI-enhanced conversations with vector embeddings and precise source citations.

Understanding SurrealDB Architecture

SurrealDB’s architecture unifies multiple data models within a single database engine, eliminating the need for separate systems and synchronization logic (figure below).

SurrealDB Architecture diagram

Caption: SurrealDB Architecture diagram

Architecture diagram of SurrealDB showing a unified multi-model database with real-time capabilities

Caption: Architecture diagram of SurrealDB showing a unified multi-model database with real-time capabilities. (more information at https://surrealdb.com/docs/surrealdb/introduction/architecture)

With SurrealDB, you can:

  • Model complex relationships using graph traversal syntax (e.g., ->bought_together->product)
  • Store flexible documents alongside structured relational tables
  • Subscribe to real-time changes with LIVE SELECT queries that push updates instantly
  • Ensure data consistency with ACID-compliant transactions across all models

Learn more about SurrealDB’s architecture and key features on the official documentation.

How does Surreal work?

SurrealDB diagram

SurrealDB separates storage from compute, enabling you to scale these independently without the need to manually shard your data.

The query layer (otherwise known as the compute layer) handles queries from the client, analyzing which records need to be selected, created, updated, or deleted.

The storage layer handles the storage of the data for the query layer. By scaling storage nodes, you are able to increase the amount of supported data for each deployment.

SurrealDB supports all the way from single-node to highly scalable fault-tolerant deployments with large amounts of data.

For more information, see https://surrealdb.com/docs/surrealdb/introduction/architecture

Why should you run SurrealDB as a Docker Extension

For developers already using Docker Desktop, running SurrealDB as an extension eliminates friction. There’s no separate installation, no dependency management, no configuration files – just a single click from the Extensions Marketplace.

Docker provides the ideal environment to bundle and run SurrealDB in a lightweight, isolated container. This encapsulation ensures consistent behavior across macOS, Windows, and Linux, so what works on your laptop works identically in staging.

The Docker Desktop Extension includes:

  • Visual query editor with SurrealQL syntax highlighting
  • Real-time data explorer showing live updates as records change
  • Schema visualization for tables and relationships
  • Connection management to switch between local and remote instances
  • Built-in backup/restore for easy data export and import

With Docker Desktop as the only prerequisite, you can go from zero to a running SurrealDB instance in under a minute.

Getting Started

To begin, download and install Docker Desktop on your machine. Then follow these steps:

  1. Open Docker Desktop and select Extensions in the left sidebar
  2. Switch to the Browse tab
  3. In the Filters dropdown, select the Database category
  4. Find SurrealDB and click Install
Docker Desktop browse extensions
SurrealDB in Docker Desktop extensions
Installing SurrealDB in Docker Desktop extensions

Caption: Installing the SurrealDB Extension from Docker Desktop’s Extensions Marketplace.

SurrealDB Docker Extension
SurrealDB Docker Extension manager
SurrealDB Docker Extension help

Real-World Example

Smart Team Communication Assistant

Imagine searching through months of team WhatsApp conversations to answer the question: “What did we decide about the marketing campaign budget?”

Traditional keyword search fails, but RAG with SurrealDB and LangChain solves this by combining semantic vector search with relationship graphs.

This architecture analyzes group chats (WhatsApp, Instagram, Slack) by storing conversations as vector embeddings while simultaneously building a knowledge graph linking conversations through extracted keywords like “budget,” “marketing,” and “decision.” When queried, the system retrieves relevant context using both similarity matching and graph traversal, delivering accurate answers about past discussions, decisions, and action items even when phrased differently than the original conversation.

This project is inspired by Multi-model RAG with LangChain | GitHub Example

1. Clone the repository:

git clone https://github.com/Raveendiran-RR/surrealdb-rag-demo 

2. Enable Docker Model Runner by visiting  Docker Desktop  > Settings > AI

Enable Docker Model Runner

Caption: Enable Docker Model Runner in Docker Desktop > settings > AI

3. Pull llama3.2 model from Docker Hub

Search for llama 3.2 under Models > Docker Hub and pull the right model.

llama3.2 in Docker Hub

Caption:  Pull the Docker model llama3.2

llama3.2 in Docker Hub

4. Download the embeddinggemma model from Docker Hub

embeddinggemma model in DMR

Caption: Click on Models > Search for embeddinggemma > download the model

5. Run this command to connect to the persistent surrealDB container

  • Browse to the directory where you have cloned the repository
  • Create directory “mydata”
mkdir -p mydata

6. Run this command:

docker run -d --name demo_data \
  -p 8002:8000 \
  -v "$(pwd)/mydata:/mydata" \
  surrealdb/surrealdb:latest \
  start --log debug --user root --pass root \
  rocksdb://mydata

Note: use the path based on the operating system. 

  • For windows , use rocksdb://mydata
  • For linux and macOS, use rocksdb:/mydata

7. Open SurrealDB Docker Extension and connect with SurrealDB.

Connecting to SurrealDB through Docker Desktop Extension

Caption: Connecting to SurrealDB through Docker Desktop Extension

  • Connection name: RAGBot
  • Remote address: http://localhost:8002
  • Username: root | password: root
  • Click on Create Connection

8. Run the setup instructions 

9. Upload the whatsapp chat

Create connection to the SurrealDB Docker container

Caption: Create connection to the SurrealDB Docker container

10. Start chatting with the RAG bot and have fun 

11. We can verify the correctness data in SurrealDB list 

  • Ensure that you connect to the right namespace (whatsapp) and database (chats)
python3 load_whatsapp.py
python3 rag_chat_ui.py
SurrealDB namespace query

Caption: connect to the “whatsapp” namespace and “chats” database

SurrealDB namespace query
Data stored as vectors in SurrealDB

Caption: Data stored as vectors in SurrealDB

Interact with the RAG bot UI

Caption: Interact with the RAG bot UI where it gives you the answer and exact reference for it 

Using this chat bot, now you can get information about the chat.txt file that was ingested. You can also verify the information in the query editor as shown below when you can run custom queries to validate the results from the chat bot. You can ingest new messages through the load_whatsapp.py file, please ensure that the message format is same as in the sample whatsChatExport.txt file.

Learn more about SurrealQL here.

SurrealDB Query editor in the Docker Desktop Extension

Caption: SurrealDB Query editor in the Docker Desktop Extension

Conclusion

The SurrealDB Docker Extension offers an accessible and powerful solution for developers building data-intensive applications – especially those working with AI agents, knowledge graphs, and real-time systems. Its multi-model architecture eliminates the need to stitch together separate databases, letting you store documents, traverse graphs, query vectors, and subscribe to live updates from a single engine.

With Docker Desktop integration, getting started takes seconds rather than hours. No configuration files, no dependency management – just install the extension and start building. The visual query editor and real-time data explorer make it easy to prototype schemas, test queries, and inspect data as it changes.

Whether you’re building agent memory systems, real-time recommendation engines, or simply looking to consolidate a sprawling database stack, SurrealDB’s Docker Extension provides an intuitive path forward. Install it today and see how a unified data layer can simplify your architecture.

If you have questions or want to connect with other SurrealDB users, join the SurrealDB community on Discord.

Learn More

]]>
Running NanoClaw in a Docker Shell Sandbox https://www.docker.com/blog/run-nanoclaw-in-docker-shell-sandboxes/ Mon, 16 Feb 2026 14:00:00 +0000 https://www.docker.com/?p=85239 Ever wanted to run a personal AI assistant that monitors your WhatsApp messages 24/7, but worried about giving it access to your entire system? Docker Sandboxes’ new shell sandbox type is the perfect solution. In this post, I’ll show you how to run NanoClaw, a lightweight Claude-powered WhatsApp assistant, inside a secure, isolated Docker sandbox.

What is the Shell Sandbox?

Docker Sandboxes provides pre-configured environments for running AI coding agents like Claude Code, Gemini CLI, and others. But what if you want to run a different agent or tool that isn’t built-in?
That’s where the shell sandbox comes in. It’s a minimal sandbox that drops you into an interactive bash shell inside an isolated microVM. No pre-installed agent, no opinions — just a clean Ubuntu environment with Node.js, Python, git, and common dev tools. You install whatever you need.

Why Run NanoClaw in a Sandbox?

NanoClaw already runs its agents in containers, so it’s security-conscious by design. But running the entire NanoClaw process inside a Docker sandbox adds another layer:

  1. Filesystem isolation – NanoClaw can only see the workspace directory you mount, not your home directory
  2. Credential management – API keys are injected via Docker’s proxy, never stored inside the sandbox
  3. Clean environment – No conflicts with your host’s Node.js version or global packages
  4. Disposability – Nuke it and start fresh anytime with docker sandbox rm

Prerequisites

  • Docker Desktop installed and running
  • Docker Sandboxes CLI (docker sandbox command available) (v.0.12.0 available in the nightly build as of Feb 13)
  • An Anthropic API key in an env variable

Setting It Up

Create the sandbox

Pick a directory on your host that will be mounted as the workspace inside the sandbox. This is the only part of your filesystem the sandbox can see:

mkdir -p ~/nanoclaw-workspace
docker sandbox create --name nanoclaw shell ~/nanoclaw-workspace

Connect to it

docker sandbox run nanoclaw

You’re now inside the sandbox – an Ubuntu shell running in an isolated VM. Everything from here on happens inside the sandbox.

Install Claude Code

The shell sandbox comes with Node.js 20 pre-installed, so we can install Claude Code directly via npm:

npm install -g @anthropic-ai/claude-code

Configure the API key

This is the one extra step needed in a shell sandbox. The built-in claude sandbox type does this automatically, but since we’re in a plain shell, we need to tell Claude Code to get its API key from Docker’s credential proxy:

mkdir -p ~/.claude && cat > ~/.claude/settings.json << 'EOF'
{
  "apiKeyHelper": "echo proxy-managed",
  "defaultMode": "bypassPermissions",
  "bypassPermissionsModeAccepted": true
}
EOF

What this does: apiKeyHelper tells Claude Code to run echo proxy-managed to get its API key. The sandbox’s network proxy intercepts outgoing API calls and swaps this sentinel value for your real Anthropic key, so the actual key never exists inside the sandbox.

Clone NanoClaw and install dependencies

cd ~/workspace
git clone https://github.com/qwibitai/nanoclaw
cd nanoclaw
npm install

Run Claude and set up NanoClaw

NanoClaw uses Claude Code for its initial setup – configuring WhatsApp authentication, the database, and the container runtime:

claude

Once Claude starts, run /setup and follow the prompts. Claude will walk you through scanning a WhatsApp QR code and configuring everything else.

Start NanoClaw

After setup completes, start the assistant:

npm start

NanoClaw is now running and listening for WhatsApp messages inside the sandbox.

Managing the Sandbox

# List all sandboxes
docker sandbox ls

# Stop the sandbox (stops NanoClaw too)
docker sandbox stop nanoclaw

# Start it again
docker sandbox start nanoclaw

# Remove it entirely
docker sandbox rm nanoclaw

What Else Could You Run?

The shell sandbox isn’t specific to NanoClaw. Anything that runs on Linux and talks to AI APIs is a good fit:

  • Custom agents built with the Claude Agent SDK or any other AI agent: Claude code, Codex, Github Copilot, OpenCode, Kiro, and more. 
  • AI-powered bots and automation scripts
  • Experimental tools you don’t want running on your host

The pattern is always the same: create a sandbox, install what you need, configure credentials via the proxy, and run it.

docker sandbox create --name my-shell shell ~/my-workspace
docker sandbox run my-shell

]]>
Permission-Aware RAG: End-to-End Testing with the SpiceDB Testcontainer https://www.docker.com/blog/rag-permission-testing-testcontainers-spicedb/ Thu, 15 Jan 2026 14:00:00 +0000 https://www.docker.com/?p=84621 We use GenAI in every facet of technology now – internal knowledge bases, customer support systems, and code review bots, to name just a few use cases. And in nearly every one of these, someone eventually asks:

What stops the model from returning something the user shouldn’t see?

This is a roadblock that companies building RAG features or AI Agents eventually hit – the moment where an LLM returns data from a document that the user was not authorized to access, introducing potential legal, financial, and reputational risk to all parties. Unfortunately, traditional methods of authorization are not suited for the hierarchical, dynamic nature of access control in RAG. This is exactly where modern authorization permissioning systems such as SpiceDB shine: in building fine-grained authorization for filtering content in your AI-powered applications.

In fact, OpenAI uses SpiceDB to secure 37 Billion documents for 5 Million users who use ChatGPT Connectors – a feature where you bring your data from different sources such as Google Drive, Dropbox, GitHub etc. into ChatGPT.

This blog post shows how you can pair SpiceDB with Testcontainers to give you the ability to test your permission logic inside your RAG pipeline, end-to-end, automatically, with zero infrastructure dependencies.The example repo can be found here.

Quick Primer on Authorization

Before diving into implementation, let’s clarify two foundational concepts: Authentication (verifying who a user is) and Authorization (deciding what they can access).

Authorization is commonly implemented via techniques such as:

  • Access Control Lists (ACLs)
  • Role-Based Access Control (RBAC)
  • Attribute-Based Access Control (ABAC)

However, for complex, dynamic, and context-rich applications like RAG pipelines, traditional methods such as RBAC or ABAC fall short. The new kid on the block – ReBAC (Relationship-Based Access Control) is ideal as it models access as a graph of relationships rather than fixed rules, providing the necessary flexibility and scalability required.

ReBAC was popularized in Google Zanzibar, the internal authorization system Google built to manage permissions across all its products (e.g., Google Docs, Drive). Zanzibar systems are optimized for low-latency, high-throughput authorization checks, and global consistency – requirements that are well-suited for RAG systems.

SpiceDB is the most scalable open-source implementation of Google’s Zanzibar authorization model. It stores access as a relationship graph, where the fundamental check reduces to: 

Is this actor allowed to perform this action on this resource?

For a Google Docs-style example:

definition user {}
definition document {
  relation reader: user
  relation writer: user

  permission read = reader + writer
  permission write = writer
}

This schema defines object types (user and document), explicit Relations between the objects (reader, writer), and derived Permissions (read, write). SpiceDB evaluates the relationship graph in microseconds, enabling real-time authorization checks at massive scale.

Access Control for RAG 

RAG (Retrieval-Augmented Generation) is an architectural pattern that enhances Large Language Models (LLMs) by letting them consult an external knowledge base, typically involving a Retriever component finding document chunks and the LLM generating an informed response.

This pattern is now used by businesses and enterprises for apps like chatbots that query sensitive data such as customer playbooks or PII – all stored in a vector database for performance. However, the fundamental risk in this flow is data leakage: the Retriever component ignores permissions, and the LLM will happily summarize unauthorized data. In fact, OWASP has a Top 10 Risks for Large Language Model Applications list which includes Sensitive Information Disclosure, Excessive Agency & Vector and Embedding Weaknesses. The consequences of this leakage can be severe, ranging from loss of customer trust to massive financial and reputational damage from compliance violations.

This setup desperately needs fine-grained authorization, and that’s where SpiceDB comes in. SpiceDB can post-filter retrieved documents by performing real-time authorization checks, ensuring the model only uses data the querying user is permitted to see. The only requirement is that the documents have metadata that indicates where the information came from.But testing this critical permission logic without mocks, manual Docker setup, or flaky Continuous Integration (CI) environments is tricky. Testcontainers provides the perfect solution, allowing you to spin up a real, production-grade, and disposable SpiceDB instance inside your unit tests to deterministically verify that your RAG pipeline respects permissions end-to-end.

Spin Up Real Authorization for Every Test

Instead of mocking your authorization system or manually running it on your workstation, you can add this line of code in your test:

container, _ := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")

And Testcontainers will:

  • Pull the real SpiceDB image
  • Start it in a clean, isolated environment
  • Assign it dynamic ports
  • Wait for it to be ready
  • Hand you the gRPC endpoint
  • Clean up afterwards

Because Testcontainers handles the full lifecycle – from pulling the container, exposing dynamic ports, and tearing it down automatically, you eliminate manual processes such as running Docker commands, and writing cleanup scripts. This isolation ensures that every single test runs with a fresh, clean authorization graph, preventing data conflicts, and making your permission tests completely reproducible in your IDE and across parallel Continuous Integration (CI) builds.

Suddenly you have a real, production-grade, Zanzibar-style permissions engine inside your unit test. 

Using SpiceDB & Testcontainers

Here’s a walkthrough of how you can achieve end-to-end permissions testing using SpiceDB and Testcontainers. The source code for this tutorial can be found here.

1. Testing Our RAG 

For the sake of simplicity, we have a minimal RAG and the retrieval mechanism is trivial too. 

We’re going to test three documents which have doc_ids (doc1 doc2 ..) that act as metadata. 

  • doc1: Internal roadmap
  • doc2: Customer playbook
  • doc3: Public FAQ

And three users:

  • Emilia owns doc1
  • Beatrice can view doc2
  • Charlie (or anyone) can view doc3

This SpiceDB schema defines a user and a document object type. A user has read permission on a document if they are the direct viewer or the owner of the document.

definition user {}

definition document {
  relation owner: user
  relation viewer: user | owner
  permission read = owner + viewer
}

2. Starting the Testcontainer 

Here’s how a line of code can start a test to launch the disposable SpiceDB instance:

container, err := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")
require.NoError(t, err)

Next, we connect to the running containerized service:

host, _ := container.Host(ctx)
port, _ := container.MappedPort(ctx, "50051/tcp")
endpoint := fmt.Sprintf("%s:%s", host, port.Port())

client, err := authzed.NewClient(
    endpoint,
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpcutil.WithInsecureBearerToken("somepresharedkey"),
)

This is now a fully-functional SpiceDB instance running inside your test runner.

3. Load the Schema + Test Data

The test seeds data the same way your application would:

_, err := client.WriteSchema(ctx, &apiv1.WriteSchemaRequest{Schema: schema})
require.NoError(t, err)

Then:

rel("document", "doc1", "owner", "user", "emilia")
rel("document", "doc2", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "emilia")
rel("document", "doc3", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "charlie")

We now have a predictable, reproducible authorization graph for every test run.

4. Post-Filtering With SpiceDB

Before the LLM sees anything, we check permissions with SpiceDB which acts as the source of truth of the permissions in the documents.

resp, err := r.spiceClient.CheckPermission(ctx, &apiv1.CheckPermissionRequest{
    Resource:   docObject,
    Permission: "read",
    Subject:    userSubject,
})

If SpiceDB says no, the doc is never fed into the LLM, thereby ensuring the user gets an answer to their query only based on what they have permissions to read.

This avoids:

  • Accidental data leakage
  • Overly permissive vector search
  • Compliance problems

Traditional access controls break down when data becomes embeddings hence having guardrails prevents this from happening. 

End-to-End Permission Checks in a Single Test

Here’s what the full test asserts:

Emilia queries “roadmap” → gets doc1
Because they’re the owner.

Beatrice queries “playbook” → gets doc2
Because she’s a viewer.

Charlie queries “public” → gets doc3
Because it’s the only doc he can read, as it’s a public doc

If there is a single failing permission rule, the end-to-end test will immediately fail, which is critical given the constant changes in RAG pipelines (such as new retrieval modes, embeddings, document types, or permission rules). 

What If Your RAG Pipeline Isn’t in Go?

First, a shoutout to Guillermo Mariscal for his original contribution to the SpiceDB Go Testcontainers module

What if your RAG pipeline is written in a different language such as Python? Not to worry, there’s also a community Testcontainers module written in Python that you can use similarly. The module can be found here.

Typically, you would integrate it in your integration tests like this:

 # Your RAG pipeline test
  def test_rag_pipeline_respects_permissions():
      with SpiceDBContainer() as spicedb:
          # Set up permissions schema
          client = create_spicedb_client(
              spicedb.get_endpoint(),
              spicedb.get_secret_key()
          )

          # Load your permissions model
          client.WriteSchema(your_document_permission_schema)

          # Write test relationships
          # User A can access Doc 1
          # User B can access Doc 2

          # Test RAG pipeline with User A
          results = rag_pipeline.search(query="...", user="A")
          assert "Doc 1" in results
          assert "Doc 2" not in results  # Should be filtered out!

Similar to the Go module, this container gives you a clean, isolated SpiceDB instance for every test run.

Why This Approach Matters

Authorization testing in RAG pipelines can be tricky, given the scale and latency requirement and it can get trickier in systems handling sensitive data. By integrating the flexibility and scale of SpiceDB with the automated, isolated environments of Testcontainers, you shift to a completely reliable, deterministic approach to authorization. 

Every time your code ships, a fresh, production-grade authorization engine is spun up, loaded with test data, and torn down cleanly, guaranteeing zero drift between your development machine and CI. This pattern can ensure that your RAG system is safe, correct, and permission-aware as it scales from three documents to millions.

Try It Yourself

The complete working example in Go along with a sample RAG pipeline is here:
https://github.com/sohanmaheshwar/spicedb-testcontainer-rag
Clone it.
Run go test -v.
Watch it spin up a fresh SpiceDB instance, load permissions, and assert RAG behavior.
Also, find the community modules for the SpiceDB testcontainer in Go and Python.

]]>
Develop and deploy voice AI apps using Docker https://www.docker.com/blog/develop-deploy-voice-ai-apps/ Tue, 16 Dec 2025 14:00:00 +0000 https://www.docker.com/?p=83888 Voice is the next frontier of conversational AI. It is the most natural modality for people to chat and interact with another intelligent being. However, the voice AI software stack is complex, with many moving parts. Docker has emerged as one of the most useful tools for AI agent deployment.

In this article, we’ll explore how to use open-source technologies and Docker to create voice AI agents that utilize your custom knowledge base, voice style, actions, fine-tuned AI models, and run on your own computer. It is based on a talk I recently gave at the Docker Captains Summit in Istanbul.

Docker and AI

Most developers consider Docker the “container store” for software. The Docker container provides a reliable and reproducible environment for developing software locally on your own machine and then shipping it to the cloud. It also provides a safe sandbox to isolate, run, and scale user-submitted software in the cloud. For complex AI applications, Docker provides a suite of tools that makes it easy for developers and platform engineers to build and deploy.

  • The Docker container is a great tool for running software components and functions in an AI agent system. It can run web servers, API servers, workflow orchestrators, LLM actions or tool calls, code interpreters, simulated web browsers, search engines, and vector databases.
  • With the NVIDIA Container Toolkit, you can access the host machine’s GPU from inside Docker containers, enabling you to run inference applications such as LlamaEdge that serve open-source AI models inside the container.
  • The Docker Model Runner runs OpenAI-compatible API servers for open-source LLMs locally on your own computer.
  • The Docker MCP Toolkit provides an easy way to run MCP servers in containers and make them available to AI agents.

The EchoKit platform provides a set of Docker images and utilizes Docker tools to simplify the deployment of complex AI workflows.

image2 3

EchoKit

The EchoKit consists of a server and a client. The client could be an ESP32-based hardware device that can listen for user voices using a microphone, stream the voice data to the server, receive and play the server’s voice response through a speaker. EchoKit provides the device hardware specifications and firmware under open-source licenses. To see it in action, check out the following video demos.

You can check out the GitHub repo for EchoKit.

The AI agent orchestrator

The EchoKit server is an open-source AI service orchestrator focused on real-time voice use cases. It starts up a WebSocket server that listens for streaming audio input and returns streaming audio responses. It ties together multiple AI models, including voice activity detection (VAD), automatic speech recognition (ASR), large language models (LLM), and text-to-speech (TTS), using one model’s output as the input for the next model.

You can start an EchoKit server on your local computer and configure the EchoKit device to access it over the local WiFi network. The “edge server” setup reduces network latency, which is crucial for voice AI applications.

The EchoKit team publishes a multi-platform Docker image that you can use directly to start an EchoKit server. The following command starts the EchoKit server with your own config.toml file and runs in the background.

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server &amp;

The config.toml file is mapped into the container to configure how the EchoKit server utilizes various AI services in its voice response workflow. The following is an example of config.toml. It starts the WebSocket server on port 8080. That’s why in the Docker command, we map the container’s port 8080 to the same port on the host. That allows the EchoKit server to be accessible through the host computer’s IP address. The rest of the config.toml specifies how to access the ASR, LLM, and TTS models to generate a voice response for the input voice data.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hello\n你好\n(noise)\n(bgm)\n(silence)\n"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

The AI services configured for the above EchoKit server are as follows.

  • It utilizes Groq for ASR (voice-to-text) and LLM tasks. You will need to fill in your own Groq API key.
  • It utilizes ElevenLabs for streaming TTS (text-to-speech). You will need to fill in your own ElevenLabs API key.

Then, in the EchoKit device setup, you just need to point your device to the local EchoKit server.

ws://local-network-ip.address:8080/ws/

For more options on the EchoKit server configuration, please refer to our documentation!

The VAD server

The voice-to-text ASR is not sufficient by itself. It could hallucinate and generate nonsensical text if the input voice is not human speech (e.g., background noise, street noise, or music). It also would not know when the user has finished speaking, and the EchoKit server needs to ask the LLM to start generating a response.

A VAD model is used to detect human voice and conversation turns in the voice stream. The EchoKit team has a multi-platform Docker image that incorporates the open-source Silero VAD model. The image is much larger than the plain EchoKit server, and it requires more CPU resources to run. But it delivers substantially better voice recognition results. Here is the Docker command to start the EchoKit server with VAD in the background.

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server-vad &amp;

The config.toml file for this Docker container also needs an additional line in the ASR section, so that the EchoKit server knows to stream incoming audio data to the local VAD service and act on the VAD signals. The Docker container runs the Silero VAD model as a WebSocket service inside the container on port 8000. There is no need to expose the container port 8000 to the host.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hello\n你好\n(noise)\n(bgm)\n(silence)\n"
vad_url = "http://localhost:9093/v1/audio/vad"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

We recommend using the VAD enabled EchoKit server whenever possible.

MCP services

A key feature of AI agents is to perform actions, such as making web-based API calls, on behalf of LLMs. For example, the “US civics test prep” example for EchoKit requires the agent to get exam questions from a database, and then generate responses that guide the user toward the official answer.

The MCP protocol is the industry standard for providing tools (function calls) to LLM agents. For example, the DuckDuckGo MCP server provides a search tool for LLMs to search the internet if the user asks for current information that is not available in the LLM’s pre-training data. The Docker MCP Toolkit provides a set of tools that make it easy to run MCP servers that can be utilized by EchoKit.

image1 4

The command below starts a Docker MCP gateway server. The MCP protocol defines several ways for agents or LLMs to access MCP tools. Our gateway server is accessible through the streaming HTTP protocol at port 8011.

docker mcp gateway run --port 8011 --transport streaming

Next, you can add the DuckDuckGo MCP server to the gateway. The search tool provided by the DuckDuckGo MCP server is now available on HTTP port 8011.

docker mcp server enable duckduckgo

You can simply configure the EchoKit server to use the DuckDuckGo MCP tools in the config.toml file.

[[llm.mcp_server]]
server = "http://localhost:8011/mcp"
type = "http_streamable"
call_mcp_message = "Please hold on a few seconds while I am searching for an answer!"

Now, when you ask EchoKit a current event question, such as “What is the latest Tesla stock price?”, it will first call the DuckDuckGo MCP’s search tool to retrieve this information and then respond to the user.

The call_mcp_message field is a message the EchoKit device will read aloud when the server calls the MCP tool. It is needed since the MCP tool call could introduce significant latency in the response.

Docker Model Runner

The EchoKit server orchestrates multiple AI services. In the examples in this article so far, the EchoKit server is configured to use cloud-based AI services, such as Groq and ElevenLabs. However, many applications—especially in the voice AI area—require the AI models to run locally or on-premises for security, cost, and performance reasons.

Docker Model Runner is Docker’s solution to run LLMs locally. For example, the following command downloads and starts OpenAI’s open-source gpt-oss-20b model on your computer.

docker model run ai/gpt-oss

The Docker Model Runner starts an OpenAI-compatible API server at port 12434. It could be directly utilized by the EchoKit server via config.toml.

[llm]
platform = "openai_chat"
url = "http://localhost:12434/engines/llama.cpp/v1/chat/completions"
model = "ai/gpt-oss"
history = 20

At the time of this writing, the Docker Model Runner only supports LLMs. The EchoKit server still relies on cloud services, or local AI solutions such as LlamaEdge, for other types of AI services.

Conclusion

The complexity of the AI agent software stack has created new challenges in software deployment and security. Docker is a proven and extremely reliable tool for delivering software to production. Docker images are repeatable and cross-platform deployment packages. The Docker container isolates software execution to eliminate large categories of security issues.

With new AI tools, such as the Docker Model Runner and MCP Toolkit, Docker continues to address emerging challenges in AI portability, discoverability, and security.

The easiest, most reliable, and most secure way to set up your own EchoKit servers is to use Docker.

Learn more

]]>
How to Add MCP Servers to ChatGPT with Docker MCP Toolkit https://www.docker.com/blog/add-mcp-server-to-chatgpt/ Fri, 12 Dec 2025 13:48:14 +0000 https://www.docker.com/?p=83522 ChatGPT is great at answering questions and generating code. But here’s what it can’t do: execute that code, query your actual database, create a GitHub repo with your project, or scrape live data from websites. It’s like having a brilliant advisor who can only talk, never act.

Docker MCP Toolkit changes this completely. 

Here’s what that looks like in practice: You ask ChatGPT to check MacBook Air prices across Amazon, Walmart, and Best Buy. If competitor prices are lower than yours, it doesn’t just tell you, it acts: automatically adjusting your Stripe product price to stay competitive, logging the repricing decision to SQLite, and pushing the audit trail to GitHub. All through natural conversation. No manual coding. No copy-pasting scripts. Real execution.

“But wait,” you might say, “ChatGPT already has a shopping research feature.” True. But ChatGPT’s native shopping can only lookup prices. Only MCP can execute: creating payment links, generating invoices, storing data in your database, and pushing to your GitHub. That’s the difference between an advisor and an actor.

By the end of this guide, you’ll build exactly this: a Competitive Repricing Agent that checks competitor prices on demand, compares them to yours, and automatically adjusts your Stripe product prices when competitors are undercutting you.

Here’s how the pieces fit together:

  • ChatGPT provides the intelligence: understanding your requests and determining what needs to happen
  • Docker MCP Gateway acts as the secure bridge: routing requests to the right tools
  • MCP Servers are the hands: executing actual tasks in isolated Docker containers

The result? ChatGPT can query your SQL database, manage GitHub repositories, scrape websites, process payments, run tests, and more—all while Docker’s security model keeps everything contained and safe.

In this guide, you’ll learn how to add seven MCP servers to ChatGPT by connecting to Docker MCP Toolkit. We’ll use a handful of must-have MCP servers: Firecrawl for web scraping, SQLite for data persistence, GitHub for version control, Stripe for payment processing, Node.js Sandbox for calculations, Sequential Thinking for complex reasoning, and Context7 for documentation. Then, you’ll build the Competitive Repricing Agent shown above, all through conversation.

Caption: YouTube video where Ajeet demonstrates building the competitive repricing agent

What is Model Context Protocol (MCP)?

Before we dive into the setup, let’s clarify what MCP actually is.

Model Context Protocol (MCP) is the standardized way AI agents like ChatGPT and Claude connect to tools, APIs, and services. It’s what lets ChatGPT go beyond conversation and perform real-world actions like querying databases, deploying containers, analyzing datasets, or managing GitHub repositories.

In short: MCP is the bridge between ChatGPT’s reasoning and your developer stack. And Docker? Docker provides the guardrails that make it safe.

Why Use Docker MCP Toolkit with ChatGPT?

I’ve been working with AI tools for a while now, and this Docker MCP integration stands out for one reason: it actually makes ChatGPT productive.

Most AI integrations feel like toys: impressive demos that break in production. Docker MCP Toolkit is different. It creates a secure, containerized environment where ChatGPT can execute real tasks without touching your local machine or production systems.

Every action happens in an isolated container. Every MCP server runs in its own security boundary. When you’re done, containers are destroyed. No residue, no security debt, complete reproducibility across your entire team.

What ChatGPT Can and Can’t Do Without MCP

Let’s be clear about what changes when you add MCP.

Without MCP

You ask ChatGPT to build a system to regularly scrape product prices and store them in a database. ChatGPT responds with Python code, maybe 50 lines using BeautifulSoup and SQLite. Then you must copy the code, install dependencies, create the database schema, run the script manually, and set up a scheduler if you want it to run regularly.

Yes, ChatGPT remembers your conversation and can store memories about you. But those memories live on OpenAI’s servers—not in a database you control.

With MCP

You ask ChatGPT the same thing. Within seconds, it calls Firecrawl MCP to actually scrape the website. It calls SQLite MCP to create a database on your machine and store the data. It calls GitHub MCP to save a report to your repository. The entire workflow executes in under a minute.

Real data gets stored in a real database on your infrastructure. Real commits appear in your GitHub repository. Close ChatGPT, come back tomorrow, and ask “Show me the price trends.” ChatGPT queries your SQLite database and returns results instantly because the data lives in a database you own and control, not in ChatGPT’s conversation memory.

The data persists in your systems, ready to query anytime; no manual script execution required.

Why This Is Different from ChatGPT’s Native Shopping

ChatGPT recently released a shopping research feature that can track prices and make recommendations. Here’s what it can and cannot do:

What ChatGPT Shopping Research can do:

  • Track prices across retailers
  • Remember price history in conversation memory
  • Provide comparisons and recommendations

What ChatGPT Shopping Research cannot do:

  • Automatically update your product prices in Stripe
  • Execute repricing logic based on competitor changes
  • Store pricing data in your database (not OpenAI’s servers)
  • Push audit trails to your GitHub repository
  • Create automated competitive response workflows

With Docker MCP Toolkit, ChatGPT becomes a competitive pricing execution system. When you ask it to check prices and competitors are undercutting you, it doesn’t just inform you, it acts: updating your Stripe prices to match or beat competitors, logging decisions to your database, and pushing audit records to GitHub. The data lives in your infrastructure, not OpenAI’s servers.

Setting Up ChatGPT with Docker MCP Toolkit

Prerequisites

Before you begin, ensure you have:

  • A machine with 8 GB RAM minimal, ideally 16GB
  • Install Docker Desktop
  • A ChatGPT Plus, Pro, Business, or Enterprise Account
  • ngrok account (free tier works) – For exposing the Gateway publicly

Step 1. Enable ChatGPT developer mode

  • Head over to ChatGPT and create a new account. 
  • Click on your profile icon at the top left corner of the ChatGPT page and select “Settings”. Select “Apps and Connectors” and scroll down to the end of the page to select “Advanced Settings.”
image1

SettingsApps & ConnectorsAdvancedDeveloper Mode (ON)

image10

ChatGPT Developer Mode provides full Model Context Protocol (MCP) client support for all tools, both read and write operations. This feature was announced in the first week of September 2025, marking a significant milestone in AI-developer integration. ChatGPT can perform write actions—creating repositories, updating databases, modifying files—all with proper confirmation modals for safety.

Key capabilities:

  • Full read/write MCP tool support
  • Custom connector creation
  • OAuth and authentication support
  • Explicit confirmations for write operations
  • Available on Plus, Pro, Business, Enterprise, and Edu plans

Step 2. Create MCP Gateway

This creates and starts the MCP Gateway container that ChatGPT will connect to.

docker mcp server init --template=chatgpt-app-basic test-chatgpt-app

Successfully initialized MCP server project in test-chatgpt-app (template: chatgpt-app-basic)
Next steps:
  cd test-chatgpt-app
  docker build -t test-chatgpt-app:latest .

Step 3. List out all the project files

ls -la
total 64
drwxr-xr-x@   9 ajeetsraina  staff   288 16 Nov 16:53 .
drwxr-x---+ 311 ajeetsraina  staff  9952 16 Nov 16:54 ..
-rw-r--r--@   1 ajeetsraina  staff   165 16 Nov 16:53 catalog.yaml
-rw-r--r--@   1 ajeetsraina  staff   371 16 Nov 16:53 compose.yaml
-rw-r--r--@   1 ajeetsraina  staff   480 16 Nov 16:53 Dockerfile
-rw-r--r--@   1 ajeetsraina  staff    88 16 Nov 16:53 go.mod
-rw-r--r--@   1 ajeetsraina  staff  2576 16 Nov 16:53 main.go
-rw-r--r--@   1 ajeetsraina  staff  2254 16 Nov 16:53 README.md
-rw-r--r--@   1 ajeetsraina  staff  6234 16 Nov 16:53 ui.html

Step 4. Examine the Compose file

services:
  gateway:
    image: docker/mcp-gateway                # Official Docker MCP Gateway image
    command:
      - --servers=test-chatgpt-app           # Name of the MCP server to expose
      - --catalog=/mcp/catalog.yaml          # Path to server catalog configuration
      - --transport=streaming                # Use streaming transport for real-time responses
      - --port=8811                           # Port the gateway listens on
    environment:
      - DOCKER_MCP_IN_CONTAINER=1            # Tells gateway it's running inside a container
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # Allows gateway to spawn sibling containers
      - ./catalog.yaml:/mcp/catalog.yaml           # Mount local catalog into container
    ports:
      - "8811:8811"                           # Expose gateway port to host


Step 5. Bringing up the compose services

docker compose up -d
[+] Running 2/2
 &#x2714; Network test-chatgpt-app_default      Created                                            0.0s
 &#x2714; Container test-chatgpt-app-gateway-1  Started  

docker ps | grep test-chatgpt-app
eb22b958e09c   docker/mcp-gateway   "/docker-mcp gateway…"   21 seconds ago   Up 20 seconds   0.0.0.0:8811->8811/tcp, [::]:8811->8811/tcp   test-chatgpt-app-gateway-1

Step 6. Verify the MCP session

curl http://localhost:8811/mcp
GET requires an active session

Step 7. Expose with Ngrok

Install ngrok and expose your local gateway. You will need to sign up for an ngrok account to obtain an auth token.

brew install ngrok
ngrok config add-authtoken <your_token_id>
ngrok http 8811

Note the public URL (like https://91288b24dc98.ngrok-free.app). Keep this terminal open.

Step 8. Connect ChatGPT

In ChatGPT, go to SettingsApps & ConnectorsCreate.

image11

Step 9. Create connector:

SettingsApps & ConnectorsCreate

- Name: Test MCP Server
- Description: Testing Docker MCP Toolkit integration
- Connector URL: https://[YOUR_NGROK_URL]/mcp
- Authentication: None
- Click "Create"

Test it by asking ChatGPT to call the greet tool. If it responds, your connection works. Here’s how it looks:

image7
image6

Real-World Demo: Competitive Repricing Agent

Now that you’ve connected ChatGPT to Docker MCP Toolkit, let’s build something that showcases what only MCP can do—something ChatGPT’s native shopping feature cannot replicate.

We’ll create a Competitive Repricing Agent that checks competitor prices on demand, and when competitors are undercutting you, automatically adjusts your Stripe product prices to stay competitive, logs the repricing decision to SQLite, and pushes audit records to GitHub.

Time to build: 15 minutes  

Monthly cost: Free Stripe (test mode) + $1.50-$15 (Firecrawl API)

Infrastructure: $0 (SQLite is free)

The Challenge

E-commerce businesses face a constant dilemma:

  • Manual price checking across multiple retailers is time-consuming and error-prone
  • Comparing competitor prices and calculating optimal repricing requires multiple tools
  • Executing price changes across your payment infrastructure requires context-switching
  • Historical trend data is scattered across spreadsheets
  • Strategic insights require manual analysis and interpretation

Result: Missed opportunities, delayed reactions, and losing sales to competitors with better prices.

The Solution: On-Demand Competitive Repricing Agent

Docker MCP Toolkit transforms ChatGPT from an advisor into an autonomous agent that can actually execute. The architecture routes your requests through a secure MCP Gateway that orchestrates specialized tools: Firecrawl scrapes live prices, Stripe creates payment links and invoices, SQLite stores data on your infrastructure, and GitHub maintains your audit trail. Each tool runs in an isolated Docker container: secure, reproducible, and under your control.

The 7 MCP Servers We’ll Use

Server

Purpose

Why It Matters

Firecrawl

Web scraping

Extracts live prices from any website

SQLite

Data persistence

Stores 30+ days of price history

Stripe

Payment management

Updates your product prices to match or beat competitors

GitHub

Version control

Audit trail for all reports

Sequential Thinking

Complex reasoning

Multi-step strategic analysis

Context7

Documentation

Up-to-date library docs for code generation

Node.js Sandbox

Calculations

Statistical analysis in isolated containers

The Complete MCP Workflow (Executes in under 3 minutes)

image10 1

Step 1. Scrape and Store (30 seconds)

  • Agent scrapes live prices from Amazon, Walmart, and Best Buy 
  • Compares against your current Stripe product price

Step 2: Compare Against Your Price (15 seconds) 

  • Best Buy drops to $509.99—undercutting your $549.99
  • Agent calculates optimal repricing strategy
  • Determines new competitive price point

Step 3: Execute Repricing (30 seconds)

  • Updates your Stripe product with the new competitive price
  • Logs repricing decision to SQLite with full audit trail
  • Pushes pricing change report to GitHub

Step 4: Stay Competitive (instant)

  • Your product now priced competitively
  • Complete audit trail in your systems
  • Historical data ready for trend analysis

The Demo Setup: Enable Docker MCP Toolkit

Open Docker Desktop and enable the MCP Toolkit from the Settings menu.

To enable:

  1. Open Docker Desktop
  2. Go to SettingsBeta Features
  3. Toggle Docker MCP Toolkit ON
  4. Click Apply
image8

Click MCP Toolkit in the Docker Desktop sidebar, then select Catalog to explore available servers.

For this demonstration, we’ll use seven MCP servers:

  • SQLiteRDBMS with advanced analytics, text and vector search, geospatial capabilities, and intelligent workflow automation
  • Stripe –  Updates your product prices to match or beat competitors for automated repricing workflows
  • GitHub – Handles version control and deployment
  • Firecrawl – Web scraping and content extraction
  • Node.js Sandbox – Runs tests, installs dependencies, validates code (in isolated containers)
  • Sequential Thinking – Debugs failing tests and optimizes code
  • Context7 – Provides code documentation for LLMs and AI code editors

Let’s configure each one step by step.

1. Configure SQLite MCP Server

The SQLite MCP Server requires no external database setup. It manages database creation and queries through its 25 built-in tools.

To setup the SQLite MCP Server, follow these steps:

  1. Open Docker Desktop → access MCP Toolkit → Catalog
  2. Search “SQLite”
  3. Click + Add
  4. No configuration needed, just click Start MCP Server
docker mcp server ls
# Should show sqlite-mcp-server as enabled

That’s it. ChatGPT can now create databases, tables, and run queries through conversation.

2. Configure Stripe MCP Server

The Stripe MCP server gives ChatGPT full access to payment infrastructure—listing products, managing prices, and updating your catalog to stay competitive.

Get Stripe API Key

  1. Go to dashboard.stripe.com
  2. Navigate to Developers → API Keys
  3. Copy your Secret Key:
    • Use sk_test_... for sandbox/testing
    • Use sk_live_... for production

Configure in Docker Desktop

  1. Open Docker Desktop → MCP Toolkit → Catalog
  2. Search for “Stripe”
  3. Click + Add
  4. Go to the Configuration tab
  5. Add your API key:
    • Field: stripe.api_key
    • Value: Your Stripe secret key
  6. Click Save and Start Server

Or via CLI:

docker mcp secret set STRIPE.API_KEY="sk_test_your_key_here"
docker mcp server enable stripe

3. Configure GitHub Official MCP Server

The GitHub MCP server lets ChatGPT create repositories, manage issues, review pull requests, and more.

Option 1: OAuth Authentication (Recommended)

OAuth is the easiest and most secure method:

  1. In MCP ToolkitCatalog, search “GitHub Official”
  2. Click + Add
  3. Go to the OAuth tab in Docker Desktop
  4. Find the GitHub entry
  5. Click “Authorize”
  6. Your browser opens GitHub’s authorization page
  7. Click “Authorize Docker” on GitHub
  8. You’re redirected back to Docker Desktop
  9. Return to the Catalog tab, find GitHub Official
  10. Click Start Server

Advantage: No manual token creation. Authorization happens through GitHub’s secure OAuth flow with automatic token refresh.

Option 2: Personal Access Token

If you prefer manual control or need specific scopes:

Step 1: Create GitHub Personal Access Token

  1. Go to https://github.com and sign in
  2. Click your profile picture → Settings
  3. Scroll to “Developer settings” in the left sidebar
  4. Click “Personal access tokens”“Tokens (classic)”
  5. Click “Generate new token”“Generate new token (classic)”
  6. Name it: “Docker MCP ChatGPT”
  7. Select scopes:
    • repo (Full control of repositories)
    • workflow (Update GitHub Actions workflows)
    • read:org (Read organization data)
  8. Click “Generate token”
  9. Copy the token immediately (you won’t see it again!)

Step 2: Configure in Docker Desktop

In MCP Toolkit → Catalog, find GitHub Official:

  1. Click + Add (if not already added)
  2. Go to the Configuration tab
  3. Select “Personal Access Token” as the authentication method
  4. Paste your token
  5. Click Start Server

Or via CLI:

docker mcp secret set GITHUB.PERSONAL_ACCESS_TOKEN="github_pat_YOUR_TOKEN_HERE"

Verify GitHub Connection

docker mcp server ls

# Should show github as enabled

4. Configure Firecrawl MCP Server

The Firecrawl MCP server gives ChatGPT powerful web scraping and search capabilities.

Get Firecrawl API Key

  1. Go to https://www.firecrawl.dev
  2. Create an account (or sign in)
  3. Navigate to API Keys in the sidebar
  4. Click “Create New API Key”
  5. Copy the API key
image13

Configure in Docker Desktop

  1. Open Docker DesktopMCP ToolkitCatalog
  2. Search for “Firecrawl”
  3. Find Firecrawl in the results
  4. Click + Add
  5. Go to the Configuration tab
  6. Add your API key:
    • Field: firecrawl.api_key
    • Value: Your Firecrawl API key
  7. Leave all other entries blank
  8. Click Save and Add Server

Or via CLI:

docker mcp secret set FIRECRAWL.API_KEY="fc-your-api-key-here"
docker mcp server enable firecrawl

What You Get

6+ Firecrawl tools, including:

  • firecrawl_scrape – Scrape content from a single URL
  • firecrawl_crawl – Crawl entire websites and extract content
  • firecrawl_map – Discover all indexed URLs on a site
  • firecrawl_search – Search the web and extract content
  • firecrawl_extract – Extract structured data using LLM capabilities
  • firecrawl_check_crawl_status – Check crawl job status

5. Configure Node.js Sandbox MCP Server

The Node.js Sandbox enables ChatGPT to execute JavaScript in isolated Docker containers.

Note: This server requires special configuration because it uses Docker-out-of-Docker (DooD) to spawn containers.

Understanding the Architecture

The Node.js Sandbox implements the Docker-out-of-Docker (DooD) pattern by mounting /var/run/docker.sock. This gives the sandbox container access to the Docker daemon, allowing it to spawn ephemeral sibling containers for code execution.

When ChatGPT requests JavaScript execution:

  1. Sandbox container makes Docker API calls
  2. Creates temporary Node.js containers (with resource limits)
  3. Executes code in complete isolation
  4. Returns results
  5. Auto-removes the container

Security Note: Docker socket access is a privilege escalation vector (effectively granting root-level host access). This is acceptable for local development but requires careful consideration for production use.

Add Via Docker Desktop

  1. MCP ToolkitCatalog
  2. Search “Node.js Sandbox”
  3. Click + Add

Unfortunately, the Node.js Sandbox requires manual configuration that can’t be done entirely through the Docker Desktop UI. We’ll need to configure ChatGPT’s connector settings directly.

Prepare Output Directory

Create a directory for sandbox output:

# macOS/Linux
mkdir -p ~/Desktop/sandbox-output

# Windows
mkdir %USERPROFILE%\Desktop\sandbox-output

Configure Docker File Sharing

Ensure this directory is accessible to Docker:

  1. Docker DesktopSettingsResourcesFile Sharing
  2. Add ~/Desktop/sandbox-output (or your Windows equivalent)
  3. Click Apply & Restart

6. Configure Sequential Thinking MCP Server

The Sequential Thinking MCP server gives ChatGPT the ability for dynamic and reflective problem-solving through thought sequences. Adding the Sequential Thinking MCP server is straightforward –  it doesn’t require any API key. Just search for Sequential Thinking in the Catalog and get it to your MCP server list.

In Docker Desktop:

  1. Open Docker DesktopMCP ToolkitCatalog
  2. Search for “Sequential Thinking”
  3. Find Sequential Thinking in the results
  4. Click “Add MCP Server” to add without any configuration

The Sequential Thinking MCP server should now appear under “My Servers” in Docker MCP Toolkit.

What you get:

  • A single Sequential Thinking tool that includes:
    • sequentialthinking – A detailed tool for dynamic and reflective problem-solving through thoughts. This tool helps analyze problems through a flexible thinking process that can adapt and evolve. Each thought can build on, question, or revise previous insights as understanding deepens.
image2

7. Configure Context7 MCP Server

The Context7 MCP enables ChatGPT to access the latest and up-to-date code documentation for LLMs and AI code editors. Adding the Context7 MCP server is straightforward. It doesn’t require any API key. Just search for Context7 in the Catalog and get it added to the MCP server lists.

In Docker Desktop:

  1. Open Docker DesktopMCP ToolkitCatalog
  2. Search for “Context7”
  3. Find Context7 in the results
  4. Click “Add MCP Server” to add without any configuration
image3

The Context7 MCP server should now appear under “My Servers” in Docker MCP Toolkit

What you get:

  • 2 Context7 tools including:
    • get-library-docs – Fetches up-to-date documentation for a library.
    • resolve-library-id – Resolves a package/product name to a Context7-compatible library ID and returns a list of matching libraries. 

Verify if all the MCP servers are available and running.

docker mcp server ls

MCP Servers (7 enabled)

NAME                   OAUTH        SECRETS      CONFIG       DESCRIPTION
------------------------------------------------------------------------------------------------
context7               -            -            -            Context7 MCP Server -- Up-to-da...
fetch                  -            -            -            Fetches a URL from the internet...
firecrawl              -            ✓ done       partial    Official Firecrawl MCP Server...
github-official        ✓ done       ✓ done       -            Official GitHub MCP Server, by ...
node-code-sandbox      -            -            -            A Node.js–based Model Context P...
sequentialthinking     -            -            -            Dynamic and reflective problem-...
sqlite-mcp-server      -            -            -            The SQLite MCP Server transform...
stripe                 -            ✓ done       -            Interact with Stripe services o...

Tip: To use these servers, connect to a client (IE: claude/cursor) with docker mcp client connect <client-name>

Configuring ChatGPT App and Connector

Use the following compose file in order to let ChatGPT discover all the tools under Docker MCP Catalog:

services:
  gateway:
    image: docker/mcp-gateway
    command:
      - --catalog=/root/.docker/mcp/catalogs/docker-mcp.yaml
      - --servers=context7,firecrawl,github-official,node-code-sandbox,sequentialthinking,sqlite-mcp-server,stripe
      - --transport=streaming
      - --port=8811
    environment:
      - DOCKER_MCP_IN_CONTAINER=1
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ~/.docker/mcp:/root/.docker/mcp:ro
    ports:
      - "8811:8811"


By now, you should be able to view all the MCP tools under ChatGPT Developer Mode.

Chatgpt MCP image

Let’s Test it Out

Now we give ChatGPT its intelligence. Copy this system prompt and paste it into your ChatGPT conversation:

You are a Competitive Repricing Agent that monitors competitor prices, automatically adjusts your Stripe product prices, and provides strategic recommendations using 7 MCP servers: Firecrawl (web scraping), SQLite (database), Stripe (price management), GitHub (reports), Node.js Sandbox (calculations), Context7 (documentation), and Sequential Thinking (complex reasoning).

DATABASE SCHEMA

Products table: id (primary key), sku (unique), name, category, brand, stripe_product_id, stripe_price_id, current_price, created_at
Price_history table: id (primary key), product_id, competitor, price, original_price, discount_percent, in_stock, url, scraped_at
Price_alerts table: id (primary key), product_id, competitor, alert_type, old_price, new_price, change_percent, created_at
Repricing_log table: id, product_name, competitor_triggered, competitor_price, old_stripe_price, new_stripe_price, repricing_strategy, stripe_price_id, triggered_at, status

Indexes: idx_price_history_product on (product_id, scraped_at DESC), idx_price_history_competitor on (competitor)

WORKFLOW

On-demand check: Scrape (Firecrawl) → Store (SQLite) → Analyze (Node.js) → Report (GitHub)
Competitive repricing: Scrape (Firecrawl) → Compare to your price → Update (Stripe) → Log (SQLite) → Report (GitHub)

STRIPE REPRICING WORKFLOW

When competitor price drops below your current price:
1. list_products - Find your existing Stripe product
2. list_prices - Get current price for the product
3. create_price - Create new price to match/beat competitor (prices are immutable in Stripe)
4. update_product - Set the new price as default
5. Log the repricing decision to SQLite

Price strategies:
- "match": Set price equal to lowest competitor
- "undercut": Set price 1-2% below lowest competitor
- "margin_floor": Never go below your minimum margin threshold

Use Context7 when: Writing scripts with new libraries, creating visualizations, building custom scrapers, or needing latest API docs

Use Sequential Thinking when: Making complex pricing strategy decisions, planning repricing rules, investigating market anomalies, or creating strategic recommendations requiring deep analysis

EXTRACTION SCHEMAS

Amazon: title, price, list_price, rating, reviews, availability
Walmart: name, current_price, was_price, availability  
Best Buy: product_name, sale_price, regular_price, availability

RESPONSE FORMAT

Price Monitoring: Products scraped, competitors covered, your price vs competitors
Repricing Triggers: Which competitor triggered, price difference, strategy applied
Price Updated: New Stripe price ID, old vs new price, margin impact
Audit Trail: GitHub commit SHA, SQLite log entry, timestamp

TOOL ORCHESTRATION PATTERNS

Simple price check: Firecrawl → SQLite → Response
Trend analysis: SQLite → Node.js → Response
Strategy analysis: SQLite → Sequential Thinking → Response
Competitive repricing: Firecrawl → Compare → Stripe → SQLite → GitHub
Custom tool development: Context7 → Node.js → GitHub
Full intelligence report: Firecrawl → SQLite → Node.js → Sequential Thinking → GitHub

KEY USAGE PATTERNS

Use Stripe for: Listing products, listing prices, creating new prices, updating product default prices

Use Sequential Thinking for: Pricing strategy decisions (match, undercut, or hold), market anomaly investigations (why did competitor prices spike), multi-factor repricing recommendations

Use Context7 for: Getting documentation before coding, learning new libraries on-the-fly, ensuring code uses latest API conventions

Use Node.js for: Statistical calculations (moving averages, standard deviation, volatility), chart generation, margin calculations

BEST PRACTICES

Space web scraping requests 2 seconds apart to respect rate limits
Calculate price difference as (your_price - competitor_price)
Trigger repricing when competitor drops below your current price
Log all repricing decisions to SQLite with Stripe IDs for audit trail
Push pricing reports to GitHub for compliance
Always use Context7 before writing code with unfamiliar libraries
Respect margin floors—never reprice below minimum acceptable margin

COMMAND RECOGNITION PATTERNS

"Check X prices and stay competitive" → Full repricing pipeline: scrape → compare → if competitor lower: Stripe update + SQLite + GitHub
"Match competitor price for X" → Stripe: list_products → list_prices → create_price (matching)
"Undercut competitors on X" → Stripe: create_price (1-2% below lowest)
"Show price history" → SQLite query → format results
"Analyze pricing strategy for X" → Sequential Thinking analysis
"Why did competitor prices change" → Query data → Sequential Thinking investigation

CORE PRINCIPLE

When competitors change prices, don't just report—ACT. Update your Stripe prices to stay competitive, log decisions to SQLite, push audit records to GitHub. Transform competitor intelligence into automated repricing. All data stays on YOUR infrastructure.



Here’s the prompt:

Set up a competitive repricing agent:
1. I sell MacBook Air M3 on my store - current Stripe price is $549.99
2. Monitor competitor prices on Amazon, Walmart, Best Buy
3. When ANY competitor drops below my price:
   - Automatically update my Stripe product price to match or beat them
   - Use "undercut" strategy (price 1% below lowest competitor)
   - Log the repricing decision to SQLite
   - Push pricing change report to GitHub

Check prices now and reprice if needed.

Here’s how ChatGPT responds – orchestrating all 7 MCP servers in a single workflow:

Competitive Repricing Agent – Execution Complete

Within 3 minutes, ChatGPT orchestrated all 7 MCP servers to analyse the market and automatically reprice your product.

Repricing Triggered and Executed:

Metrics

Before

After

Your Price

$549.99

$504.99

Price Change

-$45.00 (-8.2%)

Market Position

3rd (behind Best Buy)

#1 Lowest

Competitor Price Scan Results:

Retailer

Price

Vs. Your New Price

Your Store

$504.99

Market Leader

Best Buy

$509.99

+$5.00 (you beat by 1%)

Walmart

$669.00

+$164.01 higher

Amazon

$699.00

+$194.01 higher

What the Agent did (6 Steps):

  1. Installed SQLite3 and created database schema with 4 tables
  2. Created Stripe product (prod_TZaK0ARRJ5OJJ8) with initial $549.99 price 
  3. Scraped live competitor prices via Firecrawl from Amazon, Best Buy, and Walmart 
  4. Analysed pricing strategy with Sequential Thinking — detected Best Buy at $509.99 below your price
  5. Executed repricing — created new Stripe price at $504.99 (price_1ScRCVI9l1vmUkzn0hTnrLmW)
  6. Pushed audit report to GitHub (commit `64a488aa`)

All data stored on your infrastructure – not OpenAI’s servers. 

To check prices again, simply ask ChatGPT to ‘check MacBook Air M3 competitor prices’—it will scrape, compare, and reprice automatically. Run this check daily, weekly, or whenever you want competitive intelligence

Explore the Full Demo

View the complete repricing report and audit trail on GitHub: https://github.com/ajeetraina/competitive-repricing-agent-mcp

Want true automation? This demo shows on-demand repricing triggered by conversation. For fully automated periodic checks, you could build a simple scheduler that calls the OpenAI API every few hours to trigger the same workflow—turning this into a hands-free competitive intelligence system.Default houston Paragraph Text

Wrapping Up

You’ve just connected ChatGPT to Docker MCP Toolkit and configured multiple MCP servers. What used to require context-switching between multiple tools, manual query writing, and hours of debugging now happens through natural conversation, safely executed in Docker containers.

This is the new paradigm for AI-assisted development. ChatGPT isn’t just answering questions anymore. It’s querying your databases, managing your repositories, scraping data, and executing code—all while Docker ensures everything stays secure and contained.

Ready to try it? Open Docker Desktop and explore the MCP Catalog. Start with SQLite, add GitHub, experiment with Firecrawl. Each server unlocks new capabilities.

The future of development isn’t writing every line of code yourself. It’s having an AI partner that can execute tasks across your entire stack securely, reproducibly, and at the speed of thought.

Learn More

]]>
Building AI agents shouldn’t be hard. According to theCUBE Research, Docker makes it easy https://www.docker.com/blog/building-ai-agents-shouldnt-be-hard-according-to-thecube-research-docker-makes-it-easy/ Tue, 02 Dec 2025 15:00:00 +0000 https://www.docker.com/?p=83488 For most developers, getting started with AI is still too complicated. Different models, tools, and platforms don’t always play nicely together. But with Docker, that’s changing fast.

Docker is emerging as essential infrastructure for standardized, portable, and scalable AI environments. By bringing composability, simplicity, and GPU accessibility to the agentic era, Docker is helping developers and the enterprises they support move faster, safer, and with far less friction. 

Real results: Faster AI delivery with Docker

The platform is accelerating innovation: According to the latest report from theCUBE Research, 88% of respondents reported that Docker reduced the time-to-market for new features or products, with nearly 40% achieving efficiency gains of more than 25%. Docker is playing an increasingly vital role in AI development as well. 52% of respondents cut AI project setup time by over 50%, while 97% report increased speed for new AI product development.

Reduced AI project failures and delays

Reliability remains a key performance indicator for AI initiatives, and Docker is proving instrumental in minimizing risk. 90% of respondents indicated that Docker helped prevent at least 10% of project failures or delays, while 16% reported prevention rates exceeding 50%. Additionally, 78% significantly improved testing and validation of AI models. These results highlight how Docker’s consistency, isolation, and repeatability not only speed development but also reduce costly rework and downtime, strengthening confidence in AI project delivery.

Build, share, and run agents with Docker, easily and securely

Docker’s mission for AI is simple: make building and running AI and agentic applications as easy, secure, and shareable as any other kind of software.

Instead of wrestling with fragmented tools, developers can now rely on Docker’s trusted, container-based foundation with curated catalogs of verified models and tools, and a clean, modular way to wire them together. Whether you’re connecting an LLM to a database or linking services into a full agentic workflow, Docker makes it plug-and-play.

With Docker Model Runner, you can pull and run large language models locally with GPU acceleration. The Docker MCP Catalog and Toolkit connect agents to over 300 MCP servers from partners like Stripe, Elastic, and GitHub. And with Docker Compose, you can define the whole AI stack of models, tools, and services in a single YAML file that runs the same way locally or in the cloud. Cagent, our open-source agent builder, lets you easily build, run, and share AI agents, with behavior, tools, and persona all defined in a single YAML file. And with Docker Sandboxes, you can run coding agents like Claude Code in a secure, local environment, keeping your workflows isolated and your data protected.

Conclusion 

Docker’s vision is clear: to make AI development as simple and powerful as the workflows developers already know and love. And it’s working: theCUBE reports 52% of users cut AI project setup time by more than half, while 87% say they’ve accelerated time-to-market by at least 26%.

Learn more

theCUBE docker banner
]]>
Launch a Chat UI Agent with Docker and the Vercel AI SDK https://www.docker.com/blog/launch-chat-ui-agent-docker-vercel-ai-sdk/ Tue, 18 Nov 2025 14:00:00 +0000 https://www.docker.com/?p=82502 Running a Chat UI Agent doesn’t have to involve a complicated setup. By combining Docker with the Vercel AI SDK, it’s possible to build and launch a conversational interface in a clean, reproducible way. Docker ensures that the environment is consistent across machines, while the Vercel AI SDK provides the tools for handling streaming responses and multi-turn interactions. Using Docker Compose, the entire stack can be brought online with a single command, making it easier to experiment locally or move toward production.

The Vercel AI SDK gives you a simple yet powerful framework for building conversational UIs, handling streaming responses, and managing multi-turn interactions. Pair it with Docker, and you’ve got a portable, production-ready Chat UI Agent that runs the same way on your laptop, staging, or production.

We’ll start with the Next.js AI Chatbot template from Vercel, then containerize it using a battle-tested Dockerfile from demo repo. This way, you don’t just get a demo — you get a production-ready deployment.

One command, and your Chat UI is live.

Why this setup works

  • Next.js 15: Modern App Router, API routes, and streaming.
  • Vercel AI SDK: Simple React hooks and streaming utilities for chat UIs.
  • Docker (standalone build): Optimized for production — lean image size, fast startup, and reliable deployments.

This stack covers both developer experience and production readiness.

Step 1: Clone the template

Start with the official Vercel chatbot template:

npx create-next-app@latest chat-ui-agent -e https://vercel.com/templates/ai/nextjs-ai-chatbot

This scaffolds a full-featured chatbot using the Vercel AI SDK.

Step 2: Configure API keys

Create a .env.local file in the root:

OPENAI_API_KEY=your_openai_key_here

Swap in your provider key if you’re using Anthropic or another backend.

Step 3: Add the production Dockerfile

Instead of writing your own Dockerfile, grab the optimized version from Kristiyan Velkov’s repo:

Next.js Standalone Dockerfile

Save it as Dockerfile in your project root.

This file:

  • Uses multi-stage builds.
  • Creates a standalone Next.js build.

Keeps the image lightweight and fast for production.

Step 4: Docker Compose Setup

Here’s a simple docker-compose.yml:

services:
  chat-ui-agent:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}

This ensures your API key is passed securely into the container.

Step 5: Build and Run

Spin up your chatbot:

docker-compose up --build

Open http://localhost:3000, and your Chat UI Agent is ready to roll.

Why the standalone Dockerfile matters

Using the standalone Next.js Dockerfile instead of a basic one gives you real advantages:

  • Production-grade: Optimized builds, smaller image sizes, faster deploys.
  • Best practices baked in: No need to reinvent Docker configs.
  • Portable: Same setup runs on local dev, staging, or production servers.

This is the kind of Dockerfile you’d actually ship to production, not just test locally.

Final Thoughts

With the Next.js AI Chatbot template, the Vercel AI SDK, and a production-ready Dockerfile, spinning up a Chat UI Agent is not just quick — it’s deployment-ready from day one.

If you want to move fast without cutting corners, this setup strikes the perfect balance: modern frameworks, clean developer experience, and a solid production pipeline.

]]>
Docker at AI Engineer Paris: Build and Secure AI Agents with Docker https://www.docker.com/blog/ai-engineer-paris-build-secure-ai-agents/ Mon, 06 Oct 2025 13:00:00 +0000 https://www.docker.com/?p=78463 Last week, Docker was thrilled to be part of the inaugural AI Engineer Paris, a spectacular European debut that brought together an extraordinary lineup of speakers and companies. The conference, organized by the Koyeb team, made one thing clear: the days of simply sprinkling ‘AI dust’ on applications are over. Meaningful results demand rigorous engineering, complex data pipelines, focus on distributed systems, understanding compliance and supply chain security of AI.

image2

But the industry’s appetite for automation and effectively working with natural language and unstructured data isn’t going anywhere. It’s clear that AI Agents represent the next, inevitable wave of application development. 

At Docker, we’re dedicated to ensuring that building, sharing, and securing these new AI-powered applications is as simple and portable as containerizing microservices. That was the core message we shared at the event, showcasing how our tools simplify the entire agent lifecycle from local development to secure deployment at scale.

Keynote on democratizing AI Agents

Tushar Jain, Docker’s EVP Engineering & Product, joined a powerful line-up of Europe’s top AI engineering thought leaders including speakers from Mistral, Google DeepMind, Hugging Face, and Neo4j.

image1

Tushar’s session, “Democratizing AI Agents: Building, Sharing, and Securing Made Simple,” focused on a critical challenge: AI agent development can’t stay locked away with a few specialists. To drive real innovation and productivity across an organization, building agents must be democratized. We believe agents need standardized packaging and developers need a simple, secure way to discover and run MCP servers.

image4

Tushar spoke about how over the last decade, Docker made containers and microservices accessible to every developer. Now we see agents following the same trajectory. Just as containers standardized microservices, we need new tooling and trusted ecosystems to standardize agents. By developing standardized agent packaging and building the MCP Toolkit & Catalog for secure, discoverable tools, Docker is laying the groundwork for the next era of agent-based development.

image3

Hands-On: Building Collaborative Multi-Agent Teams

To guide attendees to understanding this in practice, we followed this vision with a hands-on workshop, Building Intelligent Multi-Agent Systems with Docker cagent: From Solo AI to Collaborative Teams. And it was a massive hit! Attendees had a perfect way to connect with the cagent team and to learn how to package and distribute agents as easily as building and pushing Docker images. 

The workshop focused on recently open-sourced cagent and how to use it for common tasks in agent development: 

  • Orchestrate specialized AI agent teams that collaborate and delegate tasks intelligently.
  • using cagent to easily package, share, and run existing multi-agent systems created by community 
  • and of course how to integrate external tools through the Model Context Protocol (MCP), ensuring agents have access to the data and can affect changes in the real world. 

If you want to try it yourself, the self-paced version of the workshop is available online: https://cagent-workshop.rumpl.dev/README.html

image5

At the end of the day during a breakout session, we followed that up with another reality-inspired message in my talk Building AI workflows: from local experiments to serving users. Whatever technologies you pick for your AI agent implementation: AI applications are distributed systems. They are a combination of the model, external tools, and your prompts. This means that if you ever aim to move from prototypes to production, you shouldn’t develop agents as simple prompts in AI assistants UI. Instead, treat them as you would any other complex architecture: containerize the individual components, factor in security and compliance, and architect for deployment complexity from the start.

Next Steps: Build and Secure Your Agents Today!

All in all, we had plenty of fantastic conversations with the AI Engineer community, which reinforced that developers are looking for tools that offer simplicity, portability, and security for this new wave of applications.

Docker is committed to simplifying agent development and securing MCP deployments at scale.

Learn More

]]>
AI Agent Archives | Docker nonadult