8 Metrics & Measurement Plan
Product success metrics for the Kav AI Platform
9 Purpose
The PRD defines what we build and the pilot framework (PRD Appendix G) defines how a pilot is judged. This document defines how we know shipped product capabilities are working in use — adoption, engagement, quality, and outcomes — after they ship.
It is deliberately separate from the PRD so the PRD stays stable. Metrics reference capabilities / FR IDs rather than restating requirements.
10 Principles
- Leading vs lagging. Leading metrics move in days–weeks (adoption, activation, usage, errors); lagging metrics move in weeks–months (retention, expansion, outcome impact).
- One owner per metric. Every metric names an accountable owner and a single measurement method.
- Targets are explicit. Each metric carries a success threshold and a stretch target with a measurement window. Where no baseline exists yet, the target is marked
[set baseline]and filled after first measurement. - Outcomes over outputs. Prefer “time-to-first-finding” over “reports generated.”
11 Instrumentation
Primary source: per-workspace usage telemetry already emitted by the web app (anonymous, opt-out via configuration). Supplement with: backend query logs (latency, grounding outcomes), pilot success-criteria results (PRD Appendix G), and operator confirmations/dismissals from the triage workflow.
12 Metric set
Targets below are proposals to confirm with Product/Leadership, not decided values. Replace
[set baseline]once first data is in.
12.1 Adoption (leading)
| Metric | Definition | Success / Stretch | Method | Window | Owner |
|---|---|---|---|---|---|
| Capability adoption rate | % of eligible workspaces that use a capability ≥ once | [set baseline] |
Telemetry, per FR/capability | 30 days post-ship | Product |
| Active integrity engineers | Distinct IE users active per week | [set baseline] |
Telemetry | Weekly | Product |
12.2 Activation (leading)
| Metric | Definition | Success / Stretch | Method | Window | Owner |
|---|---|---|---|---|---|
| Time to first finding | Time from data ingestion to first surfaced anomaly | ≤ 48 h / ≤ 24 h | Backend timestamps (aligns with pilot SC-4) | Per onboarding | Product + Eng |
| First-week core action | % of new workspaces that complete a core action (confirm a finding / run a query / export a report) in week 1 | [set baseline] |
Telemetry | 7 days | Product |
12.3 Engagement & usage (leading)
| Metric | Definition | Success / Stretch | Method | Window | Owner |
|---|---|---|---|---|---|
| Contextual-chat usefulness | % of NL queries rated “useful” or better | ≥ 70% / ≥ 85% | In-product rating (aligns with pilot SC-3) | Monthly | Product |
| Query response latency (P95) | P95 NL query response time | ≤ 3 s / ≤ 2 s | Backend logs (PRD NFR) | Continuous | Eng |
| Report export usage | Reports exported per active workspace / month | [set baseline] |
Telemetry | Monthly | Product |
12.4 Quality & reliability (leading)
| Metric | Definition | Success / Stretch | Method | Window | Owner |
|---|---|---|---|---|---|
| False-positive rate (operator-assessed) | % of flagged anomalies the IE marks false positive | ≤ 20% / ≤ 10% | Triage decisions (aligns with pilot SC-2) | Monthly | Eng + IE |
| Confidence calibration error | Gap between stated confidence and empirical accuracy per bucket | [set baseline] |
Calibration curve (PRD AI-safety FR-AI-02) | Per campaign | Eng |
| Tag-resolution success | % of CAD/operator tags auto-resolved by dual-tagging without manual mapping | [set baseline] |
Pipeline logs (FR-CAD-07) | Per ingest | Eng |
12.5 Outcomes (lagging)
| Metric | Definition | Success / Stretch | Method | Window | Owner |
|---|---|---|---|---|---|
| Triage-to-work-order cycle | Median time from detection to CMMS work order | < 4 h / < 2 h | Workflow timestamps (PRD Operations target) | Quarterly | Product |
| Pilot → production conversion | % of pilots that proceed to a production contract | [set baseline] |
Commercial tracking | Per cohort | Leadership |
| Logo retention | % of production customers retained | [set baseline] |
Commercial tracking | Annual | Leadership |
13 Review cadence
- Weekly: leading metrics (adoption, activation, usage, latency, FPR) in the product review.
- Per campaign: calibration error and tag-resolution success.
- Quarterly: outcome/lagging metrics with Leadership; reset targets that have a baseline.
Confirm owners and replace proposed targets / [set baseline] placeholders with agreed values at the first review.