Every team running action-taking agents eventually asks the same question: how reliable is this thing, really? The usual answers, task success rate, eval scores, are computed from the agent’s own reporting, so they inherit its blind spots. A confident tool call that silently failed still counts as a win.
A metric grounded in reality
Verified Completion Rate (VCR) is the percentage of an agent’s claimed completions that are independently verified as correct against the system of record. It is a single number you can put on a dashboard, attribute, and improve, and crucially, it is computed from the source of truth, not from the agent’s account of itself.
Because every gap is attributed, false completion, duplicate, mismatch, policy failure, VCR doesn’t just tell you reliability dropped; it tells you which failure mode caused it. Fix the recurring patterns and the number climbs, backed by signed receipts rather than estimates.
The shift is small to describe and large in practice: stop measuring claimed success, start measuring verified completion. It is the difference between an agent that looks reliable and one you can prove is.