If you've deployed Phoenix applications with Cowboy, you've probably seen Plug.Cowboy.Drainer in tutorials and production configs. When we switched to Bandit, we wondered: where's the drainer?
Turns out, you don't need one. Here's why.
When your deployment platform (Fly.io, Kubernetes, etc.) wants to stop your app, it sends a SIGTERM signal (a `kill -15` signal). The Erlang VM receives this and begins shutting down the supervision tree in reverse order.
By default, each process gets 5 seconds to finish. If you have a request that takes 10 seconds, it gets killed mid-execution. In practice, most processes respond to the shutdown signal almost instantaneously—a GenServer with no cleanup work exits in microseconds.
The problem is HTTP connections. A request handler might be waiting on a database query, an external API call, or file processing. These need time to complete. And the default 5 seconds often isn't enough.
Cowboy is a battle-tested HTTP server, but it wasn't designed with OTP shutdown semantics in mind. When the Erlang VM sends a shutdown signal to a Cowboy listener, it closes immediately, including any active connections.
Plug.Cowboy.Drainer exists to fill this gap. It's a GenServer that:
# The Cowboy setup
children = [
MyAppWeb.Endpoint,
{Plug.Cowboy.Drainer, refs: [MyAppWeb.Endpoint.HTTP], shutdown: 30_000}
]
The drainer must be placed after the endpoint in the supervision tree. Shutdown happens in reverse order, so the drainer terminates first, triggering the drain before the endpoint dies.
It works well, but it's a bolt-on solution and you have to know that it exists in the first place to configure it.
Bandit is built on ThousandIsland, a socket server designed around OTP principles. The difference shows in shutdown behavior.
ThousandIsland's architecture:
ThousandIsland.Server (Supervisor)
├── ThousandIsland.ShutdownListener
├── ThousandIsland.AcceptorPoolSupervisor
│ └── ThousandIsland.Acceptor (multiple)
└── ThousandIsland.ConnectionsSupervisor
└── Handler processes (your requests)
When shutdown begins:
ShutdownListener closes the listening socket immediatelyAcceptor processes exit (no new connections)ConnectionsSupervisor waits for handler processesHandler processes are supervised children with configurable shutdown timeouts. Standard OTP. No special drainer needed.
# Bandit config - shutdown_timeout flows to ThousandIsland
config :my_app, MyAppWeb.Endpoint,
adapter: Bandit.PhoenixAdapter,
http: [thousand_island_options: [shutdown_timeout: :timer.seconds(30)]]
Oban follows a similar pattern. Each queue runs a producer that fetches jobs and a set of worker processes that execute them.
On shutdown:
shutdown_grace_period, remaining jobs are abandoned[:oban, :queue, :shutdown] telemetry event fires with orphaned job IDsconfig :my_app, Oban,
repo: MyApp.Repo,
queues: [default: 10],
shutdown_grace_period: :timer.seconds(30)
Orphaned jobs aren't lost. On next startup, Oban's rescue mechanism picks them up.
For a Mix app without a web server, the equivalent is the shutdown option in child specs, which defaults to 5 seconds.
# In your Supervisor's child spec
children = [
%{
id: MyWorker,
start: {MyWorker, :start_link, []},
shutdown: :timer.seconds(30)
}
]
# Or with the shorthand tuple syntax
children = [
Supervisor.child_spec({MyWorker, []}, shutdown: :timer.seconds(30))
]
Your app's shutdown timeout means nothing if the platform kills you first.
Fly.io sends SIGTERM, then waits kill_timeout seconds before SIGKILL:
# fly.toml
kill_signal = 'SIGTERM'
kill_timeout = 35
Kubernetes uses terminationGracePeriodSeconds:
spec:
terminationGracePeriodSeconds: 35
Set these higher than your app timeout. We use 35 seconds for a 30-second app timeout.
Shutdown happens in reverse child order. Given:
children = [
MyApp.Repo, # 1st to start, last to stop
{Oban, config}, # 2nd to start, 2nd-to-last to stop
MyAppWeb.Endpoint # last to start, first to stop
]
This ordering is correct:
If you put Repo last in the children list, it would close first during shutdown, and Oban jobs would crash.
# config/config.exs
config :my_app, MyAppWeb.Endpoint,
adapter: Bandit.PhoenixAdapter,
http: [thousand_island_options: [shutdown_timeout: :timer.seconds(30)]]
config :my_app, Oban,
repo: MyApp.Repo,
queues: [default: 10],
shutdown_grace_period: :timer.seconds(30)
# fly.toml
kill_signal = 'SIGTERM'
kill_timeout = 35
Three lines of config. Zero custom code. Deploys your users won't notice.