Conflict engine — resolving 8,217 collisions
Two AI agents fighting over Tuesday at 4pm. One source of truth. p99 218 ms. Here's how the math works.
Two AI agents. One calendar. The same Tuesday at 4pm.
If you've shipped a scheduling agent in the last twelve months — Cal.com handler, OpenAI assistant, internal sales bot — you've watched this happen. Each agent makes a perfectly reasonable decision in isolation, and by midday the human's calendar is a collision of good intentions.
The market's answer has mostly been more agents: every product builds its own private view of availability and commits in isolation. We thought this should be solvable at the protocol layer. AgentDraft is the missing layer. Every agent calls one API before it books. Every booking commits through one deterministic engine. Every action is recorded in one tamper-evident log.
The race, in time
The hard problem isn't "let me see what's free" — that's a SELECT. The hard problem is two agents racing to commit the same slot in the same millisecond. Eventually-consistent reads make it almost impossible to solve in application code; the moment you check-then-write, another agent has slipped between your check and your write.
Our engine sidesteps the check entirely. The commit is the check.
t=0.000s SALES-BOT GET /availability → 16:00 free
t=0.012s RECRUIT-BOT GET /availability → 16:00 free
t=0.080s SALES-BOT POST /bookings → COMMITTED (priority 1)
t=0.086s RECRUIT-BOT POST /bookings → 409 outranked
t=0.091s RESOLVED audit + webhook
Both agents see the slot as free. Both attempt to commit. Exactly one wins — deterministically, predictably, no retries.
The atomicity primitive
A 30-minute booking at 16:00 doesn't write one row. It writes seven, in a single transaction:
- Six bucket rows:
BUCKET#16:00,BUCKET#16:05, …BUCKET#16:25on the user'sCAL#2026-06-01partition. - One canonical booking row at
BOOKING#bkg_3f9.
All seven go into a DynamoDB TransactWriteItems. Each bucket Put carries the same ConditionExpression:
attribute_not_exists(booking_id)
OR (status = HOLD AND agent_priority >= :mine)
OR (status = COMMITTED AND agent_priority > :mine
AND committed_at > :bump_cutoff)
Read carefully. A bucket allows the write if one of three things is true: it's empty, or an existing hold is at the same-or-lower priority, or an existing commit is at strictly lower priority and still inside its bump window.
If any one of the six conditions fails, the entire transaction is cancelled. The losing agent never partially-writes anything.
That's the whole engine. The semantics — winners and losers, holds and commits, bumps and freezes — are enforced at the storage layer, not in application code. An in-process lock would not survive a deploy mid-commit. The condition-expression survives anything short of the table itself going away.
What it costs
A 30-min booking is seven writes. Each TransactWriteItems participant costs roughly 2× the equivalent PutItem in WCUs, so the commit path is ~14 WCUs. At cohort scale that's pennies per month.
The latency budget is more interesting. We measure consistently:
- p50 commit: 84 ms
- p99 commit: 218 ms
That p99 includes a Redis token-bucket check, the engine pre-read for eviction snapshotting, the transactional write, an audit PutItem, and the webhook dispatcher's persist-first row. Everything past INFO trace logs lives off the request path.
The losing agent's life
The most under-rated detail is the shape of the loss. When recruit-bot tries to commit at t=0.086s and the engine rejects it, the 409 response carries the winner's identity:
{
"error": "outranked",
"winning_booking_id": "bkg_3f9",
"winning_agent_id": "agt_sales",
"winning_agent_priority": 1,
"your_priority": 3,
"audit_event_id": "aud_7b2"
}
Recruit-bot now has everything it needs to be a polite citizen:
- It knows it didn't time out — it lost cleanly.
- It knows the winner so it can notify its user with the right context.
- It can pull the audit trail by
audit_event_idif asked. - It can stop retrying. No exponential backoff, no thundering herd.
Most scheduling-conflict failure modes are bad UX on top of bad engineering. We fix the engineering by making the failure typed and immediate; the UX follows.
8,217 collisions, the number
In the pilot cohort, the engine has resolved 8,217 collisions since we turned the lights on. Every one is a deterministic decision with an audit row. Zero retries. Zero "the bot booked over my standup again."
That number is going to be tiny compared to where it goes once a real cohort opens. The point isn't the size — it's the consistency. A protocol that handles eight, eight thousand, and eight million the same way has different economics from a system that has to be re-thought at each scale.
What's next
This piece is part one of three. Next:
- Why conditional writes beat locks for multi-agent scheduling — a deeper dive on the failure modes we ruled out.
- The four collision patterns we found in pilot — the actual distribution: hold-vs-hold, hold-vs-commit-inside-bump, commit-vs-frozen, external-event-overlap.
Until then: the protocol is at agentdraft.io/spec and the integration story is at the quickstart. Cohort signups are open at agentdraft.io.
— agentdraft.io · v0.1
Liked this? One short note every other Tuesday.
Conflict-engine post-mortems, new endpoints, the rare opinion. No tracking pixels.
Double opt-in — you'll get a confirmation link. Unsubscribe in one click.