8 Metrics & Measurement Plan

Product success metrics for the Kav AI Platform

Published

June 22, 2026

9 Purpose

The PRD defines what we build and the pilot framework (PRD Appendix G) defines how a pilot is judged. This document defines how we know shipped product capabilities are working in use — adoption, engagement, quality, and outcomes — after they ship.

It is deliberately separate from the PRD so the PRD stays stable. Metrics reference capabilities / FR IDs rather than restating requirements.

10 Principles

Leading vs lagging. Leading metrics move in days–weeks (adoption, activation, usage, errors); lagging metrics move in weeks–months (retention, expansion, outcome impact).
One owner per metric. Every metric names an accountable owner and a single measurement method.
Targets are explicit. Each metric carries a success threshold and a stretch target with a measurement window. Where no baseline exists yet, the target is marked [set baseline] and filled after first measurement.
Outcomes over outputs. Prefer “time-to-first-finding” over “reports generated.”

11 Instrumentation

Primary source: per-workspace usage telemetry already emitted by the web app (anonymous, opt-out via configuration). Supplement with: backend query logs (latency, grounding outcomes), pilot success-criteria results (PRD Appendix G), and operator confirmations/dismissals from the triage workflow.

12 Metric set

Targets below are proposals to confirm with Product/Leadership, not decided values. Replace [set baseline] once first data is in.

12.1 Adoption (leading)

Metric	Definition	Success / Stretch	Method	Window	Owner
Capability adoption rate	% of eligible workspaces that use a capability ≥ once	`[set baseline]`	Telemetry, per FR/capability	30 days post-ship	Product
Active integrity engineers	Distinct IE users active per week	`[set baseline]`	Telemetry	Weekly	Product

12.2 Activation (leading)

Metric	Definition	Success / Stretch	Method	Window	Owner
Time to first finding	Time from data ingestion to first surfaced anomaly	≤ 48 h / ≤ 24 h	Backend timestamps (aligns with pilot SC-4)	Per onboarding	Product + Eng
First-week core action	% of new workspaces that complete a core action (confirm a finding / run a query / export a report) in week 1	`[set baseline]`	Telemetry	7 days	Product

12.3 Engagement & usage (leading)

Metric	Definition	Success / Stretch	Method	Window	Owner
Contextual-chat usefulness	% of NL queries rated “useful” or better	≥ 70% / ≥ 85%	In-product rating (aligns with pilot SC-3)	Monthly	Product
Query response latency (P95)	P95 NL query response time	≤ 3 s / ≤ 2 s	Backend logs (PRD NFR)	Continuous	Eng
Report export usage	Reports exported per active workspace / month	`[set baseline]`	Telemetry	Monthly	Product

12.4 Quality & reliability (leading)

Metric	Definition	Success / Stretch	Method	Window	Owner
False-positive rate (operator-assessed)	% of flagged anomalies the IE marks false positive	≤ 20% / ≤ 10%	Triage decisions (aligns with pilot SC-2)	Monthly	Eng + IE
Confidence calibration error	Gap between stated confidence and empirical accuracy per bucket	`[set baseline]`	Calibration curve (PRD AI-safety FR-AI-02)	Per campaign	Eng
Tag-resolution success	% of CAD/operator tags auto-resolved by dual-tagging without manual mapping	`[set baseline]`	Pipeline logs (FR-CAD-07)	Per ingest	Eng

12.5 Outcomes (lagging)

Metric	Definition	Success / Stretch	Method	Window	Owner
Triage-to-work-order cycle	Median time from detection to CMMS work order	< 4 h / < 2 h	Workflow timestamps (PRD Operations target)	Quarterly	Product
Pilot → production conversion	% of pilots that proceed to a production contract	`[set baseline]`	Commercial tracking	Per cohort	Leadership
Logo retention	% of production customers retained	`[set baseline]`	Commercial tracking	Annual	Leadership

13 Review cadence

Weekly: leading metrics (adoption, activation, usage, latency, FPR) in the product review.
Per campaign: calibration error and tag-resolution success.
Quarterly: outcome/lagging metrics with Leadership; reset targets that have a baseline.

Confirm owners and replace proposed targets / [set baseline] placeholders with agreed values at the first review.