8  Metrics & Measurement Plan

Product success metrics for the Kav AI Platform

Published

June 22, 2026

9 Purpose

The PRD defines what we build and the pilot framework (PRD Appendix G) defines how a pilot is judged. This document defines how we know shipped product capabilities are working in use — adoption, engagement, quality, and outcomes — after they ship.

It is deliberately separate from the PRD so the PRD stays stable. Metrics reference capabilities / FR IDs rather than restating requirements.

10 Principles

  • Leading vs lagging. Leading metrics move in days–weeks (adoption, activation, usage, errors); lagging metrics move in weeks–months (retention, expansion, outcome impact).
  • One owner per metric. Every metric names an accountable owner and a single measurement method.
  • Targets are explicit. Each metric carries a success threshold and a stretch target with a measurement window. Where no baseline exists yet, the target is marked [set baseline] and filled after first measurement.
  • Outcomes over outputs. Prefer “time-to-first-finding” over “reports generated.”

11 Instrumentation

Primary source: per-workspace usage telemetry already emitted by the web app (anonymous, opt-out via configuration). Supplement with: backend query logs (latency, grounding outcomes), pilot success-criteria results (PRD Appendix G), and operator confirmations/dismissals from the triage workflow.

12 Metric set

Targets below are proposals to confirm with Product/Leadership, not decided values. Replace [set baseline] once first data is in.

12.1 Adoption (leading)

Metric Definition Success / Stretch Method Window Owner
Capability adoption rate % of eligible workspaces that use a capability ≥ once [set baseline] Telemetry, per FR/capability 30 days post-ship Product
Active integrity engineers Distinct IE users active per week [set baseline] Telemetry Weekly Product

12.2 Activation (leading)

Metric Definition Success / Stretch Method Window Owner
Time to first finding Time from data ingestion to first surfaced anomaly ≤ 48 h / ≤ 24 h Backend timestamps (aligns with pilot SC-4) Per onboarding Product + Eng
First-week core action % of new workspaces that complete a core action (confirm a finding / run a query / export a report) in week 1 [set baseline] Telemetry 7 days Product

12.3 Engagement & usage (leading)

Metric Definition Success / Stretch Method Window Owner
Contextual-chat usefulness % of NL queries rated “useful” or better ≥ 70% / ≥ 85% In-product rating (aligns with pilot SC-3) Monthly Product
Query response latency (P95) P95 NL query response time ≤ 3 s / ≤ 2 s Backend logs (PRD NFR) Continuous Eng
Report export usage Reports exported per active workspace / month [set baseline] Telemetry Monthly Product

12.4 Quality & reliability (leading)

Metric Definition Success / Stretch Method Window Owner
False-positive rate (operator-assessed) % of flagged anomalies the IE marks false positive ≤ 20% / ≤ 10% Triage decisions (aligns with pilot SC-2) Monthly Eng + IE
Confidence calibration error Gap between stated confidence and empirical accuracy per bucket [set baseline] Calibration curve (PRD AI-safety FR-AI-02) Per campaign Eng
Tag-resolution success % of CAD/operator tags auto-resolved by dual-tagging without manual mapping [set baseline] Pipeline logs (FR-CAD-07) Per ingest Eng

12.5 Outcomes (lagging)

Metric Definition Success / Stretch Method Window Owner
Triage-to-work-order cycle Median time from detection to CMMS work order < 4 h / < 2 h Workflow timestamps (PRD Operations target) Quarterly Product
Pilot → production conversion % of pilots that proceed to a production contract [set baseline] Commercial tracking Per cohort Leadership
Logo retention % of production customers retained [set baseline] Commercial tracking Annual Leadership

13 Review cadence

  • Weekly: leading metrics (adoption, activation, usage, latency, FPR) in the product review.
  • Per campaign: calibration error and tag-resolution success.
  • Quarterly: outcome/lagging metrics with Leadership; reset targets that have a baseline.

Confirm owners and replace proposed targets / [set baseline] placeholders with agreed values at the first review.