Database & Cloud Storage Handbook
Supabase schema, multi-cloud storage, migrations, and data types in KAP
Overview
KAP uses Supabase (managed Postgres, Auth, Storage, Realtime) as the primary database and thumbnail store, paired with Azure Blob Storage for high-resolution asset storage.
This handbook summarizes how the database and cloud storage fit the product, where migrations live, how TypeScript types stay aligned, and the multi-cloud storage architecture of KAP.
Product data types (conceptual)
Day-to-day features are organized around three user-visible data types; table names in Postgres may differ until a future rename.
| Concept | Primary table / API today |
|---|---|
| Datasets | datasets (campaigns are dataset rows under an organization) |
| Notes | location_notes (REST and routes under /api/.../notes) |
| Reports | pdf_reports (PDF URLs plus KRML fields: report_id, rendered, metadata, optional template_name) |
Work-order-style workflows use notes with structured metadata (for example kind: "work_order") rather than a separate work_orders table unless reporting needs justify one later.
KRML — shared report format (not engine-specific)
KRML (Kav AI Report Markup Language) is a platform-level convention: structured report sections, rendered Markdown (and optional PDF binaries via url), and JSON metadata stored on pdf_reports. Any part of the stack may produce or consume KRML-shaped rows—web exports, batch jobs, AG-UI chat report pipelines, or agent runtimes—as long as they honor the same table and metadata shapes (e.g. kind: "krml_report", layout templates with kind: "krml_layout_template").
The Hermes agent runtime currently ships the reference krml_* tool implementations (krml_render_report, krml_save_report, etc.) under ai/src/kavai/systems/hermes/. That is one integration path, not the definition of KRML. New engines or services should reuse pdf_reports and the same field semantics rather than inventing parallel report tables.
Schema changes (migrations)
- Location:
supabase/migrations/at the repository root. - Apply: Use your team's standard path (Supabase CLI
db push, Dashboard SQL, or CI) sosupabase_migrations.schema_migrationsstays in sync with the repo. - Idempotency: Prefer
IF NOT EXISTS/DROP IF EXISTSpatterns where replays or branches are common.
New tables and columns should follow existing RLS patterns: gate access through organization membership (organization_members + datasets.organization_id) unless the row is intentionally public.
TypeScript types
- Generated file:
web/lib/database.generated.ts(and companions if your project splits types). - Refresh: From
web/, runnpm run db:typesafter migrations are applied (requires Supabase CLI and project linkage as documented in the web package).
Do not hand-edit generated tables for long; regenerate after DDL changes so inserts and selects stay type-safe.
pdf_reports column semantics (KRML)
Persisted reports and templates live in pdf_reports (legacy hermes_* tables were removed in favor of this single store):
- Report rows:
report_id,name(title),rendered(Markdown),status,facility,metadata(typically includeskind: "krml_report"). - Layout templates: rows with
template_nameset andmetadata.kind: "krml_layout_template"(section_order,section_config).
Hermes registers LLM-callable tools named krml_* (krml_save_report, …) implemented in ai/src/kavai/systems/hermes/tools/krml_renderer.py and skills/krml_composition.py. Web or other services should use the same columns and metadata conventions when writing report rows so all consumers stay interoperable.
Cloud Storage Architecture
KAP employs a hybrid, multi-cloud storage architecture designed to optimize for heavy asset sizes (such as high-resolution RGB and thermal imagery) while ensuring rapid loading of thumbnails and documents.
Azure Blob Storage (Primary Asset Store)
To keep database-associated storage lightweight and cost-effective, raw high-fidelity assets are housed on Azure Blob Storage.
- RGB and Thermal Images: Stored in dedicated Azure Blob containers (e.g.,
raw-rgbandraw-thermal). This allows high-throughput pipeline access for the Orion and Argus ML runtimes. - Access Pattern: The backend generates short-lived, secure Shared Access Signature (SAS) tokens to serve these files directly to authorized front-end clients, preventing public exposure of sensitive industrial datasets.
Azure URL and Token Generation Mechanism
When a client requests a full-size image, the KAP server performs dynamic Shared Access Signature (SAS) token generation rather than storing public URLs in the database.
-
Resolution of Credentials:
- The dataset record is fetched containing the encrypted/stored access key under
dataset.credentials(accountNameandaccountKey). - The target storage configuration is derived from
dataset.storage_path, which is formatted asaccountName:containerName.
- The dataset record is fetched containing the encrypted/stored access key under
-
Blob Name Extraction:
- The full path to the image is retrieved from
image.storage_path(e.g.,container-name/DJI_20240807/DJI_20240807194032_0274_V.JPG). - The application strips the container prefix from the path to get the exact blob name relative to the container root:
const blobName = image.storage_path.split('/').slice(1).join('/');
- The full path to the image is retrieved from
-
SAS Query Parameter Construction:
- The server instantiates
StorageSharedKeyCredential(accountName, accountKey). - Read permissions are explicitly enabled:
const permissions = new BlobSASPermissions();
permissions.read = true; - An expiration timestamp is set (typically 60 minutes from the generation time).
- The
@azure/storage-bloblibrary compiles these parameters into a secure query string:const sasToken = generateBlobSASQueryParameters({
containerName,
blobName,
permissions,
expiresOn: expiryTime,
}, sharedKeyCredential).toString();
- The server instantiates
-
URL Compilation:
- The final signed URL is assembled and returned to the client as:
https://${accountName}.blob.core.windows.net/${containerName}/${blobName}?${sasToken}
- The final signed URL is assembled and returned to the client as:
This approach guarantees that raw, high-resolution imagery is never exposed to the public internet while minimizing database overhead by only storing the metadata path.
REST APIs for Azure Blob Storage
The Next.js backend exposes dedicated REST endpoints to handle container validation, file discovery, and SAS-signing:
/api/datasets/verify-storage(POST): Validates storage connection credentials (connectionStringoraccountName/accountKey) and container reachability./api/datasets/list-bucket(GET): Discovers and lists available blobs (images and videos) within a given container, supporting optional directoryprefixfiltering./api/datasets/onboarding(POST): Initiates crawler configuration to scan Azure containers and register raw imagery into the Postgres database./api/datasets/[slug]/image-url(POST): Generates a signed SAS URL to fetch a full-size image. This route performs active, cachedHEADchecks to detect cloud provider outages or subscription blockages (e.g.,403 Forbidden)./api/datasets/[slug]/sign-video(POST): Dynamically SAS-signs video streams (MP4s) for frontend browser playback./api/datasets/[slug]/index-images(POST): Triggers background worker execution to crawl the Azure container and index missing assets or update geodata.
Programmatic Asset Access in Backend Functions
Backend systems (Node.js web server and Python ML runtimes) access raw image data differently depending on the processing task.
Next.js (Node.js Backend)
Node.js servers utilize the standard @azure/storage-blob SDK wrapped in the unified CloudStorageService client (web/lib/cloud-storage.ts).
- Full Image Download: Downloads the complete binary blob into a local Buffer for processing or resizing:
const storageService = new CloudStorageService("azure", dataset.credentials);
const imageBuffer = await storageService.downloadBlob(containerName, blobName); - Fast Header/EXIF Parsing: To extract GPS and orientation tags without wasting network bandwidth on large images, the client retrieves only the first 1MB containing the header payload:
const headerBuffer = await storageService.downloadBlobHeader(containerName, blobName);
Python AI Runtimes (Argus & Orion)
Machine learning inference pipelines running in Python utilize two access patterns:
- Signed HTTP Fetch: The gateway fetches signed SAS URLs from the Next.js API layer and forwards them to the Python runtimes. Runtimes fetch the bytes over standard HTTP/HTTPS:
import requests
response = requests.get(signed_sas_url)
image_bytes = response.content - Direct SDK Integration: Since
azure-storage-blobis installed in thepixipython dependencies, offline batch runtimes can connect directly to Azure:from azure.storage.blob import BlobServiceClient
# Using Connection String or Shared Credentials
client = BlobServiceClient.from_connection_string(connection_string)
blob = client.get_blob_client(container=container, blob=blob_path)
# Read full bytes
image_bytes = blob.download_blob().readall()
Image UUID Resolution to Raw Azure Asset (Python)
If a Python backend function is initialized with only an image UUID, it must dynamically resolve the container name, cloud credentials, and exact blob path before fetching the file.
-
Required Parameters:
storage_pathof the target image (extracted from theimagestable).credentialsof the parent dataset (JSON containingconnectionStringoraccountName/accountKey).storage_pathof the parent dataset (extracted from thedatasetstable to determine the target container name).
-
Resolution and Download Steps:
import os
from supabase import create_client
from azure.storage.blob import BlobServiceClient
# A. Initialize Supabase Client (bypassing RLS with service role key for system operations)
supabase = create_client(
supabase_url=os.environ["SUPABASE_URL"],
supabase_key=os.environ["SUPABASE_SERVICE_ROLE_KEY"]
)
image_id = "your-image-uuid"
# B. Query the Image Metadata
img_res = supabase.table("images").select("storage_path, dataset_id").eq("id", image_id).single().execute()
image_data = img_res.data
raw_image_path = image_data["storage_path"] # e.g., "raw-rgb/DJI_0274_V.JPG"
dataset_id = image_data["dataset_id"]
# C. Query the Parent Dataset Storage and Credentials
ds_res = supabase.table("datasets").select("credentials, storage_path").eq("id", dataset_id).single().execute()
dataset_data = ds_res.data
credentials = dataset_data["credentials"] # e.g., {"accountName": "...", "accountKey": "..."}
storage_config = dataset_data["storage_path"] # e.g., "accountName:containerName"
# D. Parse Storage Target Details
# Extract the actual container name from the storage_path configuration
_, container_name = storage_config.split(":")
# Extract the relative blob path by stripping the container prefix from the image storage path
blob_name = "/".join(raw_image_path.split("/")[1:])
# E. Initialize Azure Blob Client and Fetch Content
if "connectionString" in credentials:
blob_service = BlobServiceClient.from_connection_string(credentials["connectionString"])
else:
from azure.storage.blob import StorageSharedKeyCredential
blob_service = BlobServiceClient(
account_url=f"https://{credentials['accountName']}.blob.core.windows.net",
credential=StorageSharedKeyCredential(credentials["accountName"], credentials["accountKey"])
)
blob_client = blob_service.get_blob_client(container=container_name, blob=blob_name)
image_bytes = blob_client.download_blob().readall()
Supabase Storage (Lightweight Assets & Reports)
Supabase Storage is utilized for highly interactive and frequently accessed metadata or generated assets.
- Thumbnails: Scaled-down versions of RGB and thermal images are stored directly in Supabase Storage buckets (e.g.,
thumbnails) to drive the high-performance UI galleries without fetching large MB-sized raw assets from Azure. - Generated Reports: Markdown and PDF binaries generated by KRML composition pipelines are stored in the
reportsbucket. - Access & Security: Access is restricted and gated via standard Row Level Security (RLS) policies based on user organization membership, mirroring database-level security.
Multi-Cloud Expansion (AWS S3 & Google Cloud Storage)
KAP is architected to support future multi-cloud enterprise deployments by abstracting the storage layer:
- Storage Provider Abstraction: A unified, provider-agnostic storage client interface allows seamless switching or concurrent use of Azure Blob Storage, AWS S3, and Google Cloud Storage (GCS).
- Interoperability: The
imagesandpdf_reportstables store relative URIs (e.g.,provider://bucket/path) rather than provider-specific hardcoded URLs, enabling the gateway runtime to resolve the correct signed URLs dynamically.
Related documentation
- Workspace and schema direction:
docs/proposals/20260418_workspace_architecture.md(Schema & Backend section). - Environment variables for DB access:
web/.env/.env.example(SUPABASE_*,POSTGRES_URL_NON_POOLING, etc.).
Quick reference (common tables)
The live project may include additional tables (Auth, Storage, Realtime, extensions). Application tables frequently touched by KAP include:
organizations,organization_members,profilesdatasets,images,data_groupslocation_notes,annotationspdf_reports,chat_sessions,messagesgas_readings,equipment,investigations(and related domain tables)
Use Supabase Studio or information_schema when you need an authoritative column list for your environment.