The AgentDraft Multi-Agent Collision Benchmark.
A reproducible, open-source benchmark for how a scheduling API behaves when independent AI agents write to one calendar at the same time. Every conflict resolves to exactly one winner. Re-run it yourself.
0 double-bookings. double-bookings across 500 concurrent agent attempts.
100 rounds · 5 agents/round · 100.0% one-winner · 100.0% rank-1 wins · p99 112 ms
When five independent scheduling agents fire at the same calendar slot simultaneously, AgentDraft elects exactly one winner — every round, deterministically, by priority. There were no double-commits, no rounds without a winner, and the highest-priority agent (rank 1) won every race it entered. This page is the receipts.
Setup. 5 simulated agents, each with a distinct priority (1 = highest), share one calendar. Each round, all agents fire POST /v1/bookings concurrently at the same 30-minute slot through AgentDraft's benchmark harness. The benchmark runs 100 rounds with a fresh future slot per round so no engine state leaks across rounds. Defaults: 30-second hold TTL, 30-second bump window.
base_url=reference://conflict-engine · commit=reference-fixture · duration 567s wall clock.
What counts. A commit is a 201 from the engine — the booking landed atomically. An outranked response is a 409 with the winner's identity. A double-commit is two 201s for the same slot in one round — the failure mode this whole engine exists to prevent. The benchmark's primary correctness invariant: double-commits must be zero on every run.
The harness is uncoordinated — agents do not see each other's state and do not retry intelligently. That is the faithful proxy for a real-world stack where independent agents (a Cal.com routing handler, an inbox triage bot, a CrewAI assistant) write to the same calendar without knowing the others exist.
| Metric | Value |
|---|---|
| Total attempts | 500 |
| Committed (winners) | 100 |
| Outranked (HTTP 409) | 400 |
| Errored | 0 |
| Rounds with exactly one winner | 100 / 100 |
| Rounds with double-commit | 0 |
| Conflict-resolution accuracy | 100.0% |
| Rank-1 win rate | 100.0% |
| Latency p50 | 38 ms |
| Latency p99 | 112 ms |
| Agent | Priority | Attempts | Wins | Losses |
|---|---|---|---|---|
| sales-bot | 1 | 100 | 100 | 0 |
| recruit-bot | 2 | 100 | 0 | 100 |
| focus-blocker | 3 | 100 | 0 | 100 |
| exec-ea | 4 | 100 | 0 | 100 |
| ops-bot | 5 | 100 | 0 | 100 |
The story the table tells without commentary: a strict priority order produces a clean winner-take-all outcome. Equal-priority agents would split wins roughly evenly under contention — that scenario is a separate run in the harness.
Double-commits are the existential failure mode. Latency is interesting. Resolution accuracy is interesting. Double-commits — two agents both succeeding on the same slot — are what makes a human's calendar useless. The right value is zero, and the storage layer has to enforce it, not the application. AgentDraft's conditional-write engine (app/conflict/engine.py) does that work — the check is the write.
Priority is identity. Two-phase commit, optimistic locking, and naïve compare-and-swap all elect a winner by network jitter: whichever attempt arrives first. AgentDraft elects a winner by writer identity. Each agent carries a per-user priority; the engine's ConditionExpression bakes that priority into the storage-level write. A higher-priority agent's commit can evict a lower-priority hold or a still-bumpable commit. That moves the decision out of the racetrack and into a place a human operator can reason about.
Multi-agent collision rate is a thing you can measure. The deep version of the question is not "how fast is the commit?" — it's "as the agent population grows, does the outcome stay deterministic?" The benchmark above is the smallest credible version. At N=5 and 100 rounds the engine is correct. The harness scales to N=50; expect a follow-up edition with the higher-concurrency runs when they ship.
git clone https://github.com/ryabinski-labs/agentdraft
cd agentdraft
docker compose up -d dynamodb
uvicorn app.main:app --port 8080 &
python scripts/benchmark/run.py \
--rounds 100 \
--agents 5 \
--label ranked-5The harness lives in scripts/benchmark/ and re-runs in a single command against any AgentDraft deployment. Prior art: ScheduleMe (Wang et al., arxiv:2509.25693) — academic framing for multi-agent calendar assistants; this benchmark is the production-shape version.
@misc{agentdraft_collision_2026,
author = {{AgentDraft Labs}},
title = {AgentDraft Multi-Agent Collision Benchmark},
year = {2026},
url = {https://agentdraft.io/benchmark},
note = {Run 2026-05-24, commit reference-fixture}
}- How a deterministic conflict engine resolves 8,217 collisions — the architecture under the numbers above.
- Why AI scheduling agents collide — the thesis the benchmark is evidence for.
- AgentDraft protocol specification — the storage-level conditional write, in spec form.
Liked this? One short note every other Tuesday.
Conflict-engine post-mortems, new endpoints, the rare opinion. No tracking pixels.
Double opt-in — you'll get a confirmation link. Unsubscribe in one click.