SWARM Distributed Multi-Agent System on AWS EKS

"Most agent demos are two LLM calls in a for-loop. This was not that."

https://www.loom.com/share/79f1f2384c9140c2b7a28cb615eea878

What this was

A production-grade distributed multi-agent system. Built from scratch during a 4-month internship at Boring Workflows, a 4-person telecom AI startup.

Every enterprise task a Jira ticket, a stuck order, an email thread needing investigation runs inside its own isolated Kubernetes pod on AWS EKS. Tasks don't share state. One failing investigation can't touch another. When the task is done, the pod terminates and disappears.

The ask wasn't "build a demo." It was "build something a real engineering team could deploy."

The problem I was solving

Enterprise agents have a common failure mode. They run in a single process, share global state, and collapse when real workloads hit them. You can get away with this in a notebook. You can't get away with it in production.

The system needed four things that most agents skip:

Task isolation. One broken investigation can't affect another.
No cold start penalty. Pods need to be warm and waiting, not spinning up when a ticket arrives.
Real system connections. Jira, email, internal queues. No mock data.
One-engineer operability. One command to bring everything up. One command to tear it all down.

How it works

The pod sandbox model

Each task gets its own Kubernetes pod. The pod is created from a template, runs the agent reasoning loop, writes its output, and terminates. Nothing leaks between tasks. This sounds simple. Getting it to work correctly, with proper lifecycle management and state cleanup, took real iteration.

What this was

The problem I was solving

How it works

The pod sandbox model

Warm pool