RONIN // ZERO-DOWNTIME POSTGRES MIGRATION

"Maintenance Windows" are an admission of defeat. If you have to take your site offline to upgrade the database, your architecture is flawed.

Last week, our primary Postgres cluster hit 90% IOPS saturation. We needed to move 2TB of live user data to a new NVMe storage array. We did it while processing 15,000 transactions per second. Nobody noticed.

The Problem: Connection Draining

The standard enterprise solution (AWS RDS Blue/Green) creates a replica, waits for sync, and then flips DNS. The problem? DNS propagation is slow. Clients hang. Connections time out. You get 30 seconds of "502 Bad Gateway" errors.

We refuse to accept 30 seconds of downtime. We wanted zero dropped packets.

The Strategy: Logical Replication Slots

We didn't use `pg_dump`. We tapped directly into the Postgres **Replication Protocol**.

We wrote a custom Rust binary that pretends to be a Postgres Replica. It connects to the primary DB, creates a logical_replication_slot, and streams the Write-Ahead Log (WAL) in real-time to the new server.

The Atomic Cutover

The magic happens at the Proxy layer. Because Ronin controls the ingress, we don't rely on DNS.

When the New DB caught up to the Old DB (0 byte lag), we issued the switch command. Here is exactly what our Rust Proxy did in 4 milliseconds:

PAUSE: Stop sending query bytes to the Old DB. Buffer them in RAM.
SYNC: Wait for the final WAL LSN (Log Sequence Number) to appear on the New DB.
SWAP: Update the connection pool pointer to the New DB IP.
RESUME: Flush the RAM buffer to the New DB.

            // Inside the Rust Proxy (tokio-postgres)

            async fn perform_cutover() {

                // 1. Pause ingress traffic

                let buffer = connection.pause_read();

                // 2. Wait for LSN catchup

                let final_lsn = primary.get_current_wal_lsn().await?;

                replica.await_lsn(final_lsn).await?;

                // 3. Atomic Pointer Swap

                GLOBAL_DB_POOL.store(new_pool, Ordering::SeqCst);

                // 4. Replay buffer to new target

                connection.replay(buffer).await;

            }

THE RESULT

The client application didn't even know the database changed. It saw a 4ms latency spike (imperceptible), and the next query response came from the new NVMe drive.

Why This Matters

Cloud providers sell you "Zero Downtime" as a premium feature that costs $5,000/month. It isn't magic. It's just understanding TCP buffers and the Postgres Replication Protocol.

We built Ronin so you don't have to write this Rust code yourself. It's built into the platform.

DEPLOY ON A PLATFORM THAT UNDERSTANDS DATA

Your database is your business. Don't trust it to a black box.

START DEPLOYING

SURGERY ON A BEATING HEART

The Problem: Connection Draining

The Strategy: Logical Replication Slots

The Atomic Cutover

THE RESULT

Why This Matters

DEPLOY ON A PLATFORM THAT UNDERSTANDS DATA

SURGERY ON A
BEATING HEART