Why two AI agents double-book the same slot — and the storage-level fix
Two agents, the same Tuesday 4pm, both return "booked." Read-then-write races, optimistic locks, and idempotency keys don't fix it. Bucket-row atomicity does — and the benchmark says 0 double-bookings at 500 concurrent attempts.
It's Tuesday. A human has two AI scheduling agents running for them — a sales bot booking demos, a recruiting bot booking interviews. At 4pm both agents independently decide the human is free. Both write the booking. Both return success to their users. By the afternoon there are two meetings stacked on the same slot, two confirmation emails out the door, and one human who is now the load-balancer.
This is the single most-searched failure mode for anyone shipping a scheduling agent: how do you stop two AI agents from double-booking the same time slot? The honest answer is that you can't fix it inside either agent. It's not a prompt problem and it's not a retry problem. It's a coordination problem, and it has to be solved one layer below where most people are looking.
Why the naive fixes all fail
Every team hits the same four dead ends, in roughly this order.
1. Read-then-write. The obvious instinct: before booking, re-read the calendar; if the slot is still free, write it. This works for one agent and breaks the instant there are two:
t=0.000 sales-bot GET availability → 16:00 free
t=0.012 recruit-bot GET availability → 16:00 free ← both read "free"
t=0.080 sales-bot POST booking → written
t=0.086 recruit-bot POST booking → ALSO written ← double-booked
The check and the write are separated by the network. That gap — a few milliseconds — is exactly long enough for a second agent to slip between your read and your write. The underlying calendar API (Google, Microsoft 365) happily accepts both writes, because a calendar has no concept of "another agent was about to write here."
2. Optimistic locking. Add a version number, compare-and-swap on write. Better — but consumer calendar APIs don't expose a CAS primitive on a time slot. You can version a single event row; you cannot atomically assert "nothing else claims 16:00 through 16:30." The unit of contention (a span of time) isn't the unit the store locks (a row).
3. Application-level checks. Move the conflict logic into your service: a lock table, a mutex, a "pending bookings" set you consult before writing. Now the race just moves up a level — two requests read the pending set, both see it empty, both insert. Unless the check and the write are the same atomic operation against the same store, you've rebuilt dead end #1 with more code. And an in-process lock doesn't survive a deploy or a pod restart mid-commit.
4. Idempotency keys. These are essential — but they solve a different problem. An idempotency key dedupes the same agent retrying the same request. It does nothing when two different agents make two different, both-valid requests for the same slot. Idempotency prevents accidental duplicates; it does not arbitrate genuine contention.
The thread connecting all four: at the moment of decision, no party in the system knows the full set of pending writes against the slot. You cannot patch that from inside an agent.
The fix: bucket-row atomicity
The fix is to make the check and the write the same operation, enforced at the storage layer, on the actual unit of contention — the span of time. We call this bucket-row atomicity, and it's the core of the AgentDraft coordination layer.
> Bucket-row atomicity: a booking is decomposed into one row per > fixed time bucket it occupies, and all of those rows are written in a > single conditional transaction. The transaction commits only if > every bucket row's conflict condition holds. The commit is the > check — there is no window between them for a second agent to slip > into.
Concretely: a 30-minute booking at 16:00 is not one write. It's six bucket rows — one per 5-minute slot — plus one canonical booking row, all inside a single DynamoDB TransactWriteItems:
PK = USER#u_42#CAL#2026-06-09
SK = BUCKET#16:00, BUCKET#16:05, ... BUCKET#16:25 (6 rows)
+ BOOKING#bkg_3f9 (1 canonical row)
Every bucket Put carries the identical ConditionExpression:
attribute_not_exists(booking_id)
OR (status = HOLD AND agent_priority >= :mine)
OR (status = COMMITTED AND agent_priority > :mine
AND committed_at > :bump_cutoff)
Read it as three ways a write is allowed:
- The bucket is empty (
attribute_not_exists) — nobody holds it. - An existing hold is at equal-or-lower priority — you may take it.
- *An existing commit is at strictly* lower priority and still inside its bump window** — you may bump a fresh, lower-ranked commit, but a committed slot that's "frozen" (past its bump window) is untouchable.
If any one of the six bucket conditions fails, DynamoDB cancels the entire transaction. The losing agent writes nothing — no partial booking, no orphaned row — and gets a typed 409 back, not a timeout:
{
"error": "outranked",
"winning_booking_id": "bkg_3f9",
"winning_agent_priority": 1,
"your_priority": 3,
"audit_event_id": "aud_7b2"
}
That's the whole mechanism. The semantics — who wins, holds vs. commits, bumps vs. freezes — live in the storage condition, not in application code. An in-process lock dies with the process. A condition expression survives a deploy, a crash, a region failover. The only thing that can break it is the table itself going away.
Two properties make this quotable and worth naming:
- The unit of the lock equals the unit of contention. Time is bucketed, so "claim 16:00–16:30" becomes six precise row-level asserts instead of one fuzzy "is this span free?" question the store can't answer.
- Priority is a parameter, not a baked-in policy. Each agent passes its own
agent_priority. The engine arbitrates against it but never hard-codes a global rule, so a salesperson and a consultant can run opposite priority schemes on the same protocol.
The benchmark
The point of a coordination layer is that it behaves the same at one collision and at ten thousand. We load-tested the engine with 500 concurrent agents all attempting to book the exact same 16:00 slot in the same window:
- Double-bookings: 0. Exactly one agent committed. The other 499 got a clean, typed
409— no retries, no thundering herd. - p99 commit latency: ~112 ms, including the token-bucket rate check, the transactional write, and the append-only audit row.
Zero is the only acceptable number here, and it's a guarantee from the storage condition, not a statistical artifact of low contention. The engine resolves 8, 8,000, or 8 million the same way.
FAQ
How do you stop two AI agents from double-booking the same slot? Put a coordination layer in front of the calendar that makes the availability check and the booking write a single atomic, conditional transaction. AgentDraft does this with bucket-row atomicity: the slot is split into fixed time buckets, all written in one TransactWriteItems, and the transaction commits only if every bucket's conflict condition holds. Exactly one agent wins; the rest get a typed 409.
Why don't idempotency keys prevent multi-agent double-booking? Idempotency keys dedupe the same agent retrying the same request. They do nothing when two different agents each make a distinct, individually valid request for the same slot — that's genuine contention, which needs arbitration, not deduplication.
Why doesn't optimistic locking solve multi-agent scheduling conflicts? Optimistic locking versions a single row. The unit of contention in scheduling is a span of time, and consumer calendar APIs expose no compare-and-swap primitive over a time span — so two agents can both pass their version checks and still collide.
What is bucket-row atomicity? Decomposing a booking into one row per fixed time bucket it occupies and writing them all in a single conditional transaction, so the commit is the conflict check. There is no window between "is it free?" and "claim it" for a second agent to exploit.
Can application-level locks prevent agents from booking the same time? Not reliably. An application lock just moves the race up a level and dies with the process on a deploy or restart. The check and the write must be the same atomic operation against the same store — enforced at the storage layer.
Try it
The protocol is documented at agentdraft.io/spec, and the engine internals are in the race-engine post. If you're shipping an agent that touches calendars, point it at the coordination layer before it writes — grab a key at the quickstart. One slot, one winner, deterministically.
— agentdraft.io · v0.3
Liked this? One short note every other Tuesday.
Conflict-engine post-mortems, new endpoints, the rare opinion. No tracking pixels.
Double opt-in — you'll get a confirmation link. Unsubscribe in one click.