Firestarter sits at the execution layer, so we see what agents actually do when they buy: where they succeed, where they fail, and what it costs. We publish that data here, with methodologies fixed before collection starts.
How often do AI shopping agents actually complete a purchase, and where do they fail? We are measuring completion rates, wrong-item rates, and human interventions across browser-automation agents and structured commerce API execution, using a fixed task set of standardized purchase intents.
Methodology is pre-registered: the method above was fixed before data collection so the results mean something. Results will publish on this page in both human-readable and machine-readable form.