Original data on agentic commerce

Firestarter sits at the execution layer, so we see what agents actually do when they buy: where they succeed, where they fail, and what it costs. We publish that data here, with methodologies fixed before collection starts.

Agent Checkout Reliability Benchmark (In progress)

How often do AI shopping agents actually complete a purchase, and where do they fail? We are measuring completion rates, wrong-item rates, and human interventions across browser-automation agents and structured commerce API execution, using a fixed task set of standardized purchase intents.

Pre-registered methodology

Methodology is pre-registered: the method above was fixed before data collection so the results mean something. Results will publish on this page in both human-readable and machine-readable form.

Blog | API Docs