Original data on agentic commerce

Firestarter sits at the execution layer, so we see what agents actually do when they buy: where they succeed, where they fail, and what it costs. We publish that data here, with methodologies fixed before collection starts.

Agent Checkout Reliability Benchmark (In progress)

How often do AI shopping agents actually complete a purchase, and where do they fail? We are measuring completion rates, wrong-item rates, and human interventions across browser-automation agents and structured commerce API execution, using a fixed task set of standardized purchase intents.

Pre-registered methodology

Fixed task set: standardized purchase intents across common product categories, written before any runs.
Two arms: agents driving merchant checkouts via browser automation, and agents executing the same intents through a structured commerce API.
Metrics: completion rate, wrong-item rate, human interventions per purchase, and time from intent to confirmed order.
Reporting: full methodology, task list, and aggregate results publish together. No cherry-picked runs.

Methodology is pre-registered: the method above was fixed before data collection so the results mean something. Results will publish on this page in both human-readable and machine-readable form.

Blog | API Docs