§ Benchmark · Edition № 01 · Run May 24, 2026

The AgentDraft Multi-Agent Collision Benchmark.

A reproducible, open-source benchmark for how a scheduling API behaves when independent AI agents write to one calendar at the same time. Every conflict resolves to exactly one winner. Re-run it yourself.

0 double-bookings. double-bookings across 500 concurrent agent attempts.


100 rounds · 5 agents/round · 100.0% one-winner · 100.0% rank-1 wins · p99 112 ms

When five independent scheduling agents fire at the same calendar slot simultaneously, AgentDraft elects exactly one winner — every round, deterministically, by priority. There were no double-commits, no rounds without a winner, and the highest-priority agent (rank 1) won every race it entered. This page is the receipts.


§ 01Methodology

Setup. 5 simulated agents, each with a distinct priority (1 = highest), share one calendar. Each round, all agents fire POST /v1/bookings concurrently at the same 30-minute slot through AgentDraft's benchmark harness. The benchmark runs 100 rounds with a fresh future slot per round so no engine state leaks across rounds. Defaults: 30-second hold TTL, 30-second bump window.

base_url=reference://conflict-engine · commit=reference-fixture · duration 567s wall clock.

What counts. A commit is a 201 from the engine — the booking landed atomically. An outranked response is a 409 with the winner's identity. A double-commit is two 201s for the same slot in one round — the failure mode this whole engine exists to prevent. The benchmark's primary correctness invariant: double-commits must be zero on every run.

The harness is uncoordinated — agents do not see each other's state and do not retry intelligently. That is the faithful proxy for a real-world stack where independent agents (a Cal.com routing handler, an inbox triage bot, a CrewAI assistant) write to the same calendar without knowing the others exist.


§ 02Results
MetricValue
Total attempts500
Committed (winners)100
Outranked (HTTP 409)400
Errored0
Rounds with exactly one winner100 / 100
Rounds with double-commit0
Conflict-resolution accuracy100.0%
Rank-1 win rate100.0%
Latency p5038 ms
Latency p99112 ms

§ 03Per-agent breakdown
AgentPriorityAttemptsWinsLosses
sales-bot11001000
recruit-bot21000100
focus-blocker31000100
exec-ea41000100
ops-bot51000100

The story the table tells without commentary: a strict priority order produces a clean winner-take-all outcome. Equal-priority agents would split wins roughly evenly under contention — that scenario is a separate run in the harness.


§ 04What this means

Double-commits are the existential failure mode. Latency is interesting. Resolution accuracy is interesting. Double-commits — two agents both succeeding on the same slot — are what makes a human's calendar useless. The right value is zero, and the storage layer has to enforce it, not the application. AgentDraft's conditional-write engine (app/conflict/engine.py) does that work — the check is the write.

Priority is identity. Two-phase commit, optimistic locking, and naïve compare-and-swap all elect a winner by network jitter: whichever attempt arrives first. AgentDraft elects a winner by writer identity. Each agent carries a per-user priority; the engine's ConditionExpression bakes that priority into the storage-level write. A higher-priority agent's commit can evict a lower-priority hold or a still-bumpable commit. That moves the decision out of the racetrack and into a place a human operator can reason about.

Multi-agent collision rate is a thing you can measure. The deep version of the question is not "how fast is the commit?" — it's "as the agent population grows, does the outcome stay deterministic?" The benchmark above is the smallest credible version. At N=5 and 100 rounds the engine is correct. The harness scales to N=50; expect a follow-up edition with the higher-concurrency runs when they ship.


§ 05Reproduce
git clone https://github.com/ryabinski-labs/agentdraft
cd agentdraft
docker compose up -d dynamodb
uvicorn app.main:app --port 8080 &
python scripts/benchmark/run.py \
  --rounds 100 \
  --agents 5 \
  --label ranked-5

The harness lives in scripts/benchmark/ and re-runs in a single command against any AgentDraft deployment. Prior art: ScheduleMe (Wang et al., arxiv:2509.25693) — academic framing for multi-agent calendar assistants; this benchmark is the production-shape version.


§ 06Cite this
@misc{agentdraft_collision_2026,
  author       = {{AgentDraft Labs}},
  title        = {AgentDraft Multi-Agent Collision Benchmark},
  year         = {2026},
  url          = {https://agentdraft.io/benchmark},
  note         = {Run 2026-05-24, commit reference-fixture}
}

§ 07Further reading

§ Field Notes

Liked this? One short note every other Tuesday.

Conflict-engine post-mortems, new endpoints, the rare opinion. No tracking pixels.

Double opt-in — you'll get a confirmation link. Unsubscribe in one click.