Kav AI Platform · Quarterly planning

Q3 2026 objectives: from PRD requirements to sized, testable goals.

Working session — start from the PRD, make each requirement precise, agree on the questions KAP must answer by end of quarter, and commit to goals with owners and timelines.

CEO · CTO · Product · Engineering June 11, 2026

Kav AI Platform

How we'll run this session

Agenda

Requirements → clarity → questions → goals.

Where we stand

M0–M3 scoreboard and the PRD requirements as written.

Make requirements precise

Scope boundaries, non-goals, and the CAD / P&ID scoping decision.

Questions KAP must answer

A requirement is clear when it is a question KAP can demonstrably answer. The Q3 question set doubles as the acceptance suite.

Sized goals & timeline

M4 feature sets with sizes, owners, weeks 23–35, and commercial alignment.

Kav AI Platform

Where we stand

Delivered to date

M3 closes this month. Q3 is the integration quarter.

M0 · DELIVERED

MVP v0.1 — Web application. Organizations, datasets, image workflows, dashboards, cloud-backed inspection data.

M1 · DELIVERED

MVP v0.2 — First agentic layer. CrewAI-backed Supabase/MCP reasoning over inspection data.

M2 · DELIVERED

MVP v0.3 — Data chat reliability. Retrieval, contextual chat, failure recovery, 3D CAD alignment with Gaussian splats, test coverage.

M3 · CLOSING JUN

Decision Impact — PRD workflow. Triage, engineer decision, audit trail, structured IDMS/CMMS handoff. Gate: quantified prioritization demo + persona sign-off + ROI calculator.

M4 · Q3 TARGET

Q3 Integration (Sep 2026) — multi-sensor automated detection, CAD/P&ID context (pulled forward), geo-tagged spatial anchoring, contextual chat phase 1. SCADA/OPC UA moves to Q4. Today's subject.

M5 · Q4 TARGET

Engineering Context & Enterprise (Q4 2026) — SCADA/OPC UA connectors, 3D CAD overlay, P&ID SQL, SOC 2 Type II, compliance, air-gapped deployment.

H1 2027 · HORIZON

Autonomous patrol & regulated sites (directional) — fixed-infrastructure robotics (beacons, comm backbone, coverage orchestration, fleet analytics), full spatial navigation, and dose-aware inspection for high-hazard sites.

Kav AI Platform

PRD recap

What the integrity engineer needs — as written in the PRD

From inspection data to a defensible decision.

HAVE · EXTEND

Multi-sensor anomaly analysis. Detect and classify defects across thermal, OGI, and gas imagery.

NEW

Engineering context — CAD + asset metadata. As-built / as-designed models and equipment metadata alongside the inspection.

NEW

P&ID ingestion. Connect each finding to its process line, tag, and instrumentation.

ROADMAP Q4

SCADA & live telemetry. Read-only OPC UA + historian against integrity operating windows — moved to Q4: no pilot SCADA data yet. Q3 secures the access.

BUILD

Actionable, ranked insights. Risk-scored recommendations — what to inspect first, and why.

BUILD

Custom, audit-ready reports. Zero-config PDF / Word, cited to standards and reproducible.

Kav AI Platform

Delivery architecture

The integrity engineer's assistant

Four living stores: files, code, skills, memory.

Each engagement deposits into all four — the data flywheel. Q3 objectives below map onto these stores.

Files

File Perception

Typed lenses — image, document (standards with citations), report, CAD, P&ID.

Code

Code & Execution

SQL today → sandboxed code for risk computation and reports. Code saved as the artifact of record.

Skills

Skill Repository

Governed markdown — human-curated, assistant proposes. Institutional knowledge.

Memory

Memory & Learning

Decisions, learned facts, preferences across sessions. Working knowledge — graduates into Skills.

Ground layer — the value layer: anomaly → P&ID tag → CAD asset → standard. Every Q3 question on the following slides exercises some segment of this chain.

Kav AI Platform

Requirement clarification

From PRD language to engineering scope

Each requirement, with boundaries and non-goals.

PRD requirement	Clarified Q3 scope	Explicit non-goals (Q3)
SCADA & live telemetry	Moved to Q4 — no pilot SCADA data is available yet. Q3 limits itself to escalating pilot-partner OPC UA access and data-sharing approval, so Q4 starts warm. IOW stage 1–2 checks move with it.	No connector build, no IOW pipeline work in Q3. The read-only / no-writes-ever boundary is unchanged when the work lands in Q4.
Multi-sensor detection	Automated anomaly detection across thermal, OGI, gas imagery; cross-source correlation within the 2-meter spatial radius; corroboration tiers (1 / 2 / 3+ sources) feeding triage.	No new sensor modalities beyond the three supported. Critical-severity items always require engineer sign-off (v3.4 canonical rule).
Spatial anchoring	Geo-tagged images placed in the 3D scene; query findings by asset and radius; anchoring accuracy consistent with the ~2 cm fixed-mesh localisation target.	No full spatial navigation (deferred to Q4). No live robot-fleet integration.
Contextual chat phase 1	Compound queries across imagery, spatial and CAD/P&ID context, with HITL checkpoints; ranked, explained recommendations; report generation with standard citations and provenance bundle.	No autonomous actioning — recommendations stop at the engineer decision gate. Telemetry joins the compound scope in Q4.
CAD + P&ID context	Pulled into Q3 (decided — see next slide): IFC ingestion (high priority per PRD) + P&ID tag-linkage, demonstrated on one real customer drawing set. Grounds the anomaly → tag → asset → standard chain.	No full P&ID SQL connector; no CAD overlay rendering in the viewer — those complete in Q4 on top of the Q3 ingestion.

Kav AI Platform

Scope decision · made

Decided 2026-06-11 — driven by data availability

The swap: CAD & P&ID move up to Q3; SCADA & IOW move to Q4.

We do not have pilot SCADA data yet — building the connector and IOW checks against simulated data would gate M4 on access we don't control. CAD/P&ID drawing sets are obtainable now, so the parsers come forward.

Into Q3 — CAD (IFC) + P&ID tag-linkage

IFC ingestion (high priority per PRD) and P&ID tag-linkage, proven end-to-end on one real customer drawing set — a committed objective, no longer a spike.

Why now: the Ground layer (anomaly → tag → asset → standard) gets real data early; the hardest format risk (IFC/RVT/DGN) is discovered this quarter, not next; visible progress on PRD "NEW" requirements.

To Q4 — SCADA connector + IOW checks

OPC UA / historian connector and IOW stage 1–2 checks start week 36, against real pilot data.

What stays in Q3: escalate pilot-partner OPC UA access and data-sharing approval now, so the Q4 build starts warm. The telemetry_iow question set keeps its place in the bank — tagged Q4.

Kav AI Platform

Requirements as questions

The clarity instrument

A requirement is clear when it's a question KAP can answer.

We already validate the AI server against a curated question bank — 20 questions, all answerable with Q1 capabilities. Q3 extends the bank: every objective ships with the new questions it unlocks.

One artifact, three audiences: a demo script for the CEO, acceptance criteria for product, and a system-level test suite for engineering (question_bank.json, tag Q3).

Bank today (tag: Q1)

20 questions across 6 categories:

dataset_discovery · image_retrieval · data_analysis · anomaly_detection · cross_modal · out_of_scope

"How many thermal anomalies are there?"

"Compare the RGB and thermal images for capture cap-123."

"Why is Valve-22 overheating?" (out-of-scope guard)

Each entry carries expected tools, response type, and validation rules — machine-checkable.

Kav AI Platform

New questions · detection & CAD/P&ID

By end of Q3, KAP answers these — live, on pilot data

Question set 1: multi-sensor detection & CAD/P&ID context.

CAD / P&ID context · unlocked by F-CAD-01

"Show every finding on line 304-L-1021 with its P&ID tag and design specification."

"Which equipment tags in the ingested P&ID have no linked inspection coverage at all?"

"For anomaly A-447, which model object, P&ID tag, and design spec does it ground to?"

Proof bar: all three answered on one real customer drawing set — ingested via IFC, not hand-mapped.

Multi-sensor detection · unlocked by F-AI-10

"Across last week's drone campaign, which assets have anomalies corroborated by two or more independent sources?"

"Flag every thermal anomaly that coincides with an OGI gas detection within the 2-meter correlation radius."

"Which single-source detections were later confirmed — and which turned out to be environmental noise?"

Guardrail: critical-severity findings must surface "engineer sign-off required" — never auto-promoted.

Kav AI Platform

New questions · spatial, chat & deferred

By end of Q3, KAP answers these — live, on pilot data

Question set 2: spatial context, compound reasoning, reports.

Spatial & compound · F-APP-10 + F-AI-12

"Show every inspection image captured within 5 meters of Exchanger E-101, placed in the 3D scene."

"Rank the top 10 assets to inspect next quarter by risk — and explain each ranking with its evidence."

"For Valve-22: combine the thermal history, cross-source corroboration, design context, and prior engineer decisions into one assessment."

"Generate the audit-ready report for Campaign 3 — cited to standards, with the provenance bundle (code, inputs, citations)."

Deferred to Q4 — SCADA / IOW (tagged Q4 in the bank)

"Which IOW exceedances occurred on Unit 300 this month, and how severe were they by duration and intensity?"

"Show the pressure and temperature trend for tag TI-3041 over the last 72 hours against its API 584 operating window."

"Which assets spent the most cumulative time outside their integrity operating window this quarter?" · Guardrail: "Change the setpoint on TI-3041" → must refuse.

Written now, gated in Q4 — they need pilot SCADA data we don't have yet. They become the wk-36+ acceptance set.

Kav AI Platform

Goals & sizing

M4 · Persistent Sensing & Engineering Context (Q3) · weeks 23–35

Four objectives plus Q4 prep, sized — sizes are proposals to pressure-test today.

ID	Objective	Size	Owner	Done means (question-bank gate)
F-CAD-01	CAD (IFC) ingestion + P&ID tag-linkage — pulled forward from Q4	L · 8 wk	TBD	cad_pid questions pass on one real customer drawing set, ingested via IFC.
F-AI-10	Multi-sensor automated anomaly detection + cross-source corroboration tiers	L · 10 wk	TBD	cross_source questions pass; critical-severity sign-off rule enforced end-to-end.
F-APP-10	Geo-tagged spatial anchoring & images in 3D	M · 6 wk	TBD	spatial_query questions pass; radius queries return anchored imagery in the viewer.
F-AI-12	Contextual chat phase 1 — compound queries, HITL, cited reports	L · 9 wk	TBD	Compound + report_generation questions pass with provenance bundle attached.
Q4-PREP	SCADA/IOW (was F-AI-11) + time-series dashboard (was F-APP-11) — moved to Q4: no pilot SCADA data yet	S · ongoing	TBD	Q3 deliverable is access, not code: pilot-partner OPC UA approval + data-sharing agreement signed, so Q4 starts warm.

Kav AI Platform

Timeline & risk

Weeks 23–35 · Jun → Sep 2026

One gate mid-quarter. One long pole, named now.

WK 23–26

CAD (IFC) parser + P&ID tag-linkage schema start on a real customer drawing set · detection pipeline extension · spatial anchoring schema · OPC UA access escalation opens (Q4 prep).

WK 27–29

Cross-source corroboration tiers on campaign data · anomaly → tag grounding wired · mid-quarter gate (wk 29): cad_pid tag-linkage demo on the customer drawing set + cross_source questions pass.

WK 30–33

Chat phase 1 compound queries over imagery + CAD/P&ID context · HITL checkpoints · cited report generation · spatial queries in the viewer.

WK 34–35

M4 gate: full Q3 question set (cad_pid · cross_source · spatial_query · compound · report_generation) green in CI · MVP v0.4 cut · OPC UA access signed for Q4 · campaign 3 prep complete.

Long pole — customer drawing set. The cad_pid gate needs one real customer IFC + P&ID package. Mitigation: request it from the pilot partner this week. In parallel, escalate OPC UA access now — it is the long pole of Q4, and approval lead time is theirs, not ours.

Risk — parser unknowns. Three L-sized objectives in 13 weeks, and CAD/P&ID parsing carries the format risk (IFC quality varies by authoring tool). If slip appears at the wk-29 gate, the pre-agreed cut line is: F-CAD-01 reduces to IFC-only ingestion; P&ID linkage drops to schema + manually mapped tags on the demo set.

Commercial alignment & decisions

Why this quarter matters commercially

Q3 tech feeds the Q3 business gate: 10 LOIs, pilot campaign, SOC 2 readiness.

Alignment

CAD/P&ID grounding (anomaly → tag → spec) → the live demo that differentiates in every LOI conversation; SCADA + IOW joins it in Q4 once pilot data flows.

Multi-sensor detection + spatial anchoring → headline capability of the oil & gas pilot campaign (F-COMM-05).

Cited, reproducible reports + HITL → the audit posture SOC 2 readiness (F-COMP-01) and enterprise buyers require.

Decisions we need today

1. ~~CAD/P&ID: Q3 or Q4?~~ Decided: CAD/P&ID into Q3; SCADA/IOW to Q4 (no pilot SCADA data yet).

2. Adopt the Q3 question set as the M4 acceptance gate (bank grows 20 → ~30; telemetry_iow tagged Q4)?

3. Confirm sizes and assign owners for the four objectives + Q4 prep.

4. Who owns the two pilot-partner asks this week: the CAD/P&ID drawing set (Q3 gate) and OPC UA access (Q4 start)?

Kav AI Platform

Backup · question tagging schema

Backup · proposal

Tag every Q3 question on two axes: capability × conversation shape.

Axis 1 — capability (what KAP must do)

Q1 bank: dataset_discovery · image_retrieval · data_analysis · anomaly_detection · cross_modal · out_of_scope

New for Q3: cad_pid · cross_source · spatial_query · compound · report_generation

Written now, gated Q4: telemetry_iow — awaits pilot SCADA data.

Source: question_bank.json categories + the two Q3 question-set slides.

Axis 2 — conversation shape (how the user behaves)

happy_path — clear ask, natural follow-ups; tests context retention.

vague_input — imprecise phrasing; must clarify, not guess.

scope_expansion — narrow start, mid-conversation pivots.

adversarial — pushback, impatience; tests graceful robustness.

Source: Q2 user-simulation categories, Testing Handbook (KavApps).

Single-turn questions keep validation rules as today; multi-turn shapes (vague_input · scope_expansion · adversarial) become user-simulation scenarios with rubric metrics — both tagged Q3 in the same bank.

Kav AI Platform

Backup · Q2 2026 test review

Backup · where agent testing stands after Q2

Q2 baseline: single-turn is production-ready; multi-turn is the gap Q3 inherits.

Evaluation area	Q2 signal	Release interpretation
Single-turn agent behavior	Strong classifier, executor, reporter performance on Q1–Q5 cases	Production-ready
Deterministic integration	69 deterministic integration tests at 100%	Production regression gate
Live chained evaluation	24 / 25 live chained tests passing	Production-ready, expected LLM variance
Failure recovery	16 passed · 5 skipped · 0 failed (DEBT-002)	Phase 3 done; chaos tests deferred — explicit debt, not silent passes
Multi-turn evaluation	Context carryover and report structure degrade across turns	Main improvement area → Q3

Q3 implication: the multi-turn gap is exactly where the Q3 question sets live — compound queries, scope expansion, cited reports. The Q2 baseline tells us to gate on deterministic metrics and treat LLM-judge scores as signals requiring trace inspection.

Kav AI Platform

Backup · Q2 2026 test review

Backup · what Q2 built that Q3 reuses

Q2 shipped the evaluation machine: two loops, a metric stack, and CI gates.

Two-loop model + metric stack

Inner loop (dev, offline): ADK Web UI golden datasets → adk eval → pytest AgentEvaluator in CI.

Outer loop (prod, scaled): Vertex AI Gen AI Eval — adaptive rubrics, pairwise A/B, weekly drift checks.

Default metrics:

tool_trajectory_avg_score · IN_ORDER · ≥0.8
hallucinations_v1 · ≥0.5
final_response_match_v2 · ≥0.5

CI gates + user simulation

Gates: unit tests + trajectory regressions block PRs; response-quality metrics warn; safety blocks always; user simulation runs nightly, never gates PRs.

User simulation (goal-oriented, LLM-driven): scenarios across the four conversation shapes — happy_path · vague_input · scope_expansion · adversarial.

Q3 reuses all of this as-is — the new question sets drop into the existing harness as Q3-tagged evalsets.