Security

Docker Defaults Are an Escalation Ladder

A default Docker container runs as root, has 14+ Linux capabilities, allows privilege escalation, mounts a writable filesystem, shares a network with your database, and has unlimited resources. Every default is a rung. Six hardening layers remove them all.

Jyhad Aamri, Architect of Decision Systems10 min read

The Problem With Docker Defaults

Your AI agent executes code. The user asks it to analyze a dataset. The agent writes Python. That Python runs inside a Docker container. This is standard. This is expected. And if the container is running with default settings, it is running with the privileges of a root user on a shared network with access to your database.

Docker defaults exist for developer convenience, not production security. Every default assumes the container is trusted. An AI agent's code execution container is the opposite of trusted. It runs arbitrary code generated by an LLM that accepts arbitrary input. It is, by design, the least trustworthy process in the system.

Here is what a default Docker container gives an attacker who achieves code execution:

// Docker defaults: an escalation ladder

          6
          Unlimited resources
          ← Denial of service against host
        

          5
          Shared network with infrastructure
          ← Reach database, cache, APIs
        

          4
          Writable root filesystem
          ← Install backdoors, modify binaries
        

          3
          Privilege escalation allowed
          ← Setuid binaries grant root access
        

          2
          14+ Linux capabilities active
          ← Raw sockets, trace processes, bypass permissions
        

          1
          Running as root
          ← Full control inside container
        

Each rung builds on the one below it. Root gives access to capabilities. Capabilities enable privilege escalation. Privilege escalation enables filesystem modification. Filesystem modification enables persistence. Network access enables lateral movement. Unlimited resources enable denial of service.

A prompt injection that achieves code execution inside a default Docker container does not start at the bottom. It starts with root. Everything above it is already available.

Docker defaults are not a neutral starting point. They are maximum privilege. Every hardening layer you skip is a privilege you grant to untrusted code.

Six Layers. Every Rung Removed.

HeartBeatAgents hardens execution containers with six layers that systematically remove every default privilege. Each layer addresses a specific escalation vector. Each layer is enforced at the kernel level. Container code cannot disable, circumvent, or modify any of them.

// Hardened execution container

          1
          Isolated network
          Kernel iptables. Only proxy reachable.
        

          2
          Non-root execution
          Unprivileged user. No root to escalate from.
        

          3
          Zero capabilities
          Every Linux capability dropped.
        

          4
          No privilege escalation
          Kernel-enforced. Setuid ignored.
        

          5
          Read-only root filesystem
          Ephemeral scratch only.
        

          6
          Resource limits
          Memory + CPU capped. Kernel-enforced.
        

Layer 1: Network Isolation

Execution containers run on an isolated Docker network. The isolation is enforced by iptables rules in the host kernel. These rules block all traffic from the isolated network to any other network interface and all traffic from any other interface to the isolated network.

This is not application-level filtering. This is kernel-level packet dropping. The rules run in the host kernel, outside the container. Container code cannot modify them. Container code cannot see them. Even root inside the container (if it were running as root, which it is not) cannot modify iptables rules on the host.

The result: the execution container cannot reach the database. Cannot reach the cache. Cannot reach the application server. Cannot reach any other container in the deployment. The only service reachable from the isolated network is the egress proxy, which is dual-homed on both the isolated and main networks. All internet access goes through the proxy, which enforces its own access controls (read operations allowed, write operations denied, encrypted tunneling restricted to package repositories).

If the proxy is down, the container has zero internet access. Not degraded access. Zero. The isolated network has no gateway. Without the proxy, there is no path out.

Layer 2: Non-Root Execution

The container runs as an unprivileged user. Not root. Not a user with sudo access. An ordinary, non-root user with no special permissions.

This matters because most container escape vulnerabilities require root inside the container. A container escape from a non-root user is significantly harder. The attacker must first escalate to root inside the container (blocked by Layer 4) before attempting to escape to the host.

Non-root execution also limits what code can access inside the container. System directories, configuration files, and binaries owned by root are read-only to the non-root user. The code can only write to explicitly writable locations (temporary space and mounted workspace). It cannot modify system libraries. It cannot install binaries in system paths. It cannot alter the container's own configuration.

Layer 3: Zero Capabilities

Linux capabilities are fine-grained subdivisions of root privilege. Instead of all-or-nothing root, capabilities allow specific privileges to be granted individually. A default Docker container has over 14 capabilities enabled.

HeartBeatAgents drops every capability. Zero remain. The container process has no elevated privileges of any kind.

What this prevents:

No raw sockets. Code cannot create raw network sockets. This blocks ARP spoofing, network scanning, and packet capture. Without raw sockets, the code cannot impersonate other hosts on the network or intercept traffic.

No process tracing. Code cannot attach to or inspect other processes. This blocks debugging, memory inspection, and process manipulation attacks that could target other processes in the container or (in certain configurations) on the host.

No permission bypass. Code cannot override file permission checks. Files that are not readable by the non-root user remain unreadable. Files that are not writable remain unwritable. The permission model is enforced without exceptions.

No service impersonation. Code cannot bind to privileged ports (below 1024). It cannot impersonate system services that listen on well-known ports. A piece of code that tries to start a service on port 80 or 443 will fail.

No filesystem ownership override. Code cannot bypass file ownership checks. Files owned by root stay accessible only to root. The non-root user cannot access them regardless of what the code attempts.

With zero capabilities, the code runs with the exact privileges of an ordinary user on an ordinary Linux system. No more. No less. Every capability that Docker grants by default has been explicitly removed.

Layer 4: No Privilege Escalation

Even without root and without capabilities, privilege escalation is still possible in a default container. Setuid binaries are executables that run with the file owner's permissions regardless of who executes them. Standard Linux distributions include several setuid binaries that run as root.

HeartBeatAgents blocks privilege escalation at the kernel level. The kernel is instructed to prevent any process in the container from gaining privileges it did not start with. This means:

Setuid bits are ignored. A setuid binary that would normally run as root runs as the current user instead. The setuid bit has no effect.

Capability acquisition is blocked. A process cannot gain new capabilities through any mechanism. Not through setuid binaries. Not through capability inheritance. Not through any system call.

All child processes inherit the restriction. This is not a per-process setting. It applies to the initial process and every process it spawns, recursively. A child process cannot escape the restriction by spawning another child process.

This is enforced by a kernel-level flag that, once set, cannot be unset by the restricted process. The container code cannot disable it. It is set before the container's process starts and remains in effect for the lifetime of the container.

Layer 5: Read-Only Root Filesystem

The container's root filesystem is mounted read-only. The code cannot write to any system path. It cannot modify binaries, libraries, or configuration files. It cannot create files in system directories. It cannot install persistent backdoors.

Code still needs to write data. Analysis results, temporary files, downloaded packages. This is handled through explicitly writable locations:

Ephemeral scratch space. A size-limited temporary filesystem that is cleared on container restart. Suitable for temporary files and intermediate results. Nothing persists across restarts.

Persistent workspace. A mounted volume for package installations and working files. This persists across container restarts so that packages installed in one tool call are available in subsequent calls without reinstallation.

Shared folders. Organization-specific folder mounts for reading and writing business data. These are the folders the user has explicitly shared with the agent. Read-write or read-only, per folder, configured at the organization level.

The writable locations are explicit and bounded. The code can write to temporary space, its workspace, and the folders the organization has shared. It cannot write anywhere else. There is no path from "write to temporary space" to "modify a system binary." The two are on different filesystems with different mount options.

Layer 6: Resource Limits

Memory and CPU are capped per container. These limits are enforced by the Linux kernel's cgroup mechanism. The container cannot override them. If the container exceeds its memory limit, the kernel kills the process. If the container attempts to use more CPU than allocated, the kernel throttles it.

This prevents denial of service against the host. Without resource limits, a single agent running a memory-intensive computation (or a prompt-injected infinite loop) could consume all host memory and crash every other process. With limits, the damage is contained to the container. Other agents, the database, the application server, and every other process on the host continue running normally.

Containers that sit idle beyond a threshold are automatically removed. This prevents resource accumulation from abandoned sessions. The cleanup runs on a schedule, checks each container's last activity time, and removes containers that have been idle too long. This is housekeeping, not security, but it reduces the platform's resource footprint and limits the window during which an idle container could be targeted.

Per-Organization Container Isolation

Each organization gets its own execution container. Organization A's container is a separate process with separate filesystem mounts from Organization B's container.

// Per-organization container isolation
Organization A Container
Mounts: /shared/org-a/reports, /shared/org-a/data
Visibility: Org A folders only
Org B folders: do not exist in this container
Organization B Container
Mounts: /shared/org-b/contracts, /shared/org-b/analytics
Visibility: Org B folders only
Org A folders: do not exist in this container
Cross-organization file access: physically impossible. The paths do not exist.

This is not permission-based isolation. Organization A's container does not have a "deny" rule preventing access to Organization B's folders. Organization B's folders are not mounted in Organization A's container. The paths do not exist. The files are not visible. There is no permission to misconfigure because there is no permission. There is no mount.

Code running in Organization A's container cannot access Organization B's data through any mechanism. Not by guessing paths. Not by traversing directories. Not by exploiting a permission vulnerability. The filesystem namespace of Organization A's container simply does not include Organization B's folders. They are as invisible as files on a different machine.

Each container mounts only the folders its organization has explicitly shared. Read-write folders allow the agent to write results. Read-only folders allow the agent to read reference data without modification risk. The mount configuration is per-organization, set at provisioning time, and enforced by the Docker runtime.

Container Lifecycle

Execution containers are not created per-request. That would add seconds of overhead to every code execution call. Instead, the lifecycle is optimized for both security and performance:

First call: container creation. When an organization's agent first needs to execute code, a new container is created with all six hardening layers applied. This takes a few seconds. It is a one-time cost per organization per session.

Subsequent calls: reuse. After the first call, code execution happens inside the existing container. This is fast, typically under 100 milliseconds. The container maintains its state: installed packages persist, workspace files persist, the environment is ready.

Idle cleanup: automatic removal. Containers that have been idle beyond a threshold are automatically removed. This recovers resources and limits the window during which an idle container exists on the host. When the organization's agent needs code execution again, a fresh container is created.

The lifecycle balances performance (reuse avoids creation overhead) with security (idle cleanup limits exposure window) and resource efficiency (unused containers are removed automatically).

What Remains Even in Development Mode

In development mode, network isolation is relaxed. Execution containers join the standard Docker network instead of the isolated network. This allows direct communication with the application server during development without requiring the full proxy infrastructure.

Every other hardening layer remains active in development:

Non-root execution. Active in dev. Code runs as an unprivileged user.

Zero capabilities. Active in dev. All capabilities dropped.

No privilege escalation. Active in dev. Setuid ignored.

Read-only root filesystem. Active in dev. Only writable locations are accessible.

Resource limits. Active in dev. Memory and CPU capped.

This is a deliberate decision. Network isolation requires the proxy infrastructure, which is complex to set up for local development. The other five layers have zero infrastructure requirements. They are container runtime flags. They cost nothing to apply. There is no reason to relax them, even in development.

Five of six layers active in development means developers experience the same hardening that production enforces. A skill that works in development will work in production. There are no "worked in dev, fails in prod because of security" surprises for five of the six layers.

Why Each Layer Matters Independently

Defense in depth is often cited as a principle. It is less often practiced correctly. True defense in depth means each layer provides meaningful security even if every other layer fails. Here is how each layer stands on its own:

If network isolation fails (container somehow joins the main network): non-root execution, zero capabilities, and no privilege escalation still limit what the code can do. It can reach the database but cannot authenticate as root. It can reach the cache but cannot use raw sockets to intercept traffic.

If non-root execution fails (code somehow runs as root): zero capabilities still restrict what root can do. Root without capabilities cannot create raw sockets, trace processes, or override permissions. The read-only filesystem prevents root from installing persistence. Network isolation prevents root from reaching infrastructure.

If capabilities are restored (somehow capabilities are granted): no-privilege-escalation still prevents setuid exploitation. The read-only filesystem still prevents binary modification. Network isolation still limits where traffic can go.

If the filesystem becomes writable (mount option somehow changes): non-root execution and zero capabilities still prevent writing to system paths. Network isolation still prevents exfiltration. Resource limits still prevent denial of service.

No single layer is the security boundary. Each layer reduces the impact of the others failing. An attacker who defeats one layer faces five more. An attacker who defeats two faces four. The security posture degrades linearly, not catastrophically. Defeating all six simultaneously, from inside a container, against kernel-enforced restrictions, is a qualitatively different challenge than defeating any one of them.

The Standard to Demand

If your AI agent platform executes code in containers, these are the six questions to ask. Each one maps to a hardening layer. A "no" on any of them is a privilege granted to untrusted code.

"Is the execution container on an isolated network?" If it shares a network with your database and cache, code can reach them directly.

"Does the container run as a non-root user?" If it runs as root, every other hardening layer is weakened. Root inside a container is the starting point for most container escape techniques.

"Are all Linux capabilities dropped?" If any capabilities remain, the code has elevated privileges. Even seemingly harmless capabilities can be chained into escalation paths.

"Is privilege escalation blocked at the kernel level?" If setuid binaries work, a non-root user can become root. The non-root execution layer is only as strong as the privilege escalation prevention.

"Is the root filesystem read-only?" If the filesystem is writable, code can install backdoors, modify binaries, and persist across container restarts. A writable filesystem is a persistence mechanism.

"Are memory and CPU limits enforced?" If resources are unlimited, a single container can starve the host. Every other container, the database, the application server: all competing for resources against untrusted code with no cap.

Six questions. Six layers. Six kernel-enforced restrictions that container code cannot bypass. This is the minimum standard for running untrusted code in a production environment. Anything less is running untrusted code with unnecessary privilege.