Behavior Verification

This guide explains practical ways to verify conformance to:

/docs/standards/agent-behavior-standards
/docs/standards/audit-compliance-standards

It focuses on operational verification: checks, sampling, dashboards, and evidence packaging.

What you are verifying

At minimum, verify that you can:

reconstruct what an agent did (action ledger completeness)
attribute actions to an agent identity and workflow session
locate the rationale for significant actions
detect boundary-violating behavior early (scope, auth, tenancy, environments)

Verification layers

Use multiple layers; each catches different failure modes.

Layer 1 — Schema validation for action logs

Goal: prevent “logs exist but are unusable”.

Practices:

validate presence and type of required fields
reject malformed or missing log events
enforce correlation/session IDs for multi-step workflows

Evidence:

sample log payloads (success + failure)
validation failure rates over time

Layer 2 — Coverage checks (are you logging everything you think you are?)

Goal: avoid “silent paths” that bypass logging.

Practices:

enumerate governed agents and expected action types
compare observed log action types vs expected
flag missing event types (e.g., deploy, migration, secrets access)

Evidence:

Expected Actions Registry (per agent)
weekly coverage diff report

Layer 3 — Sampling-based review of significant decisions

Goal: ensure rationale is present and useful.

Practices:

define what counts as “significant” (align with ABR-002 in standards)
review a weekly sample of significant actions:
- does the rationale exist?
- does it describe context, criteria, and an alternative (where applicable)?
- are assumptions recorded?

Evidence:

sampling checklist
annotated sample set (redacted if necessary)

Layer 4 — Boundary monitoring (scope, auth, tenancy, environments)

Goal: detect high-severity failures early.

Practices:

monitor for out-of-scope resource attempts
detect environment boundary violations (prod actions from non-prod contexts)
add targeted tests for tenancy and authorization invariants

Evidence:

alert definitions and sample alerts
invariant test results

Layer 5 — Replay drills (can you reconstruct and explain an incident?)

Goal: confirm auditability under pressure.

Practices:

run quarterly replay drills using a selected workflow:
- choose a “significant action”
- reconstruct the full sequence from correlation IDs
- identify decision points and approvals/overrides (if any)

Evidence:

replay drill notes
gaps found + corrective actions

Minimal verification checklist

Each governed agent has a defined identity (agent_id) and owner
Logs include correlation/session IDs for workflows
Significant actions include rationale records
Coverage report exists (expected vs observed action types)
Boundary violations are detectable (tests and/or alerts)
At least one replay drill has been completed and documented

Evidence packaging (what to keep ready)

Maintain a small evidence bundle you can hand to internal reviewers:

Agent Inventory (agents, scopes, environments)
Log Schema (field definitions)
Representative Logs
- normal workflow sample
- failure workflow sample
- significant change sample with rationale
Coverage Report (recent)
Replay Drill record (most recent)

TODO: Add an evidence bundle folder structure recommendation if you want strict consistency across teams.

What you are verifying​

Verification layers​

Layer 1 — Schema validation for action logs​

Layer 2 — Coverage checks (are you logging everything you think you are?)​

Layer 3 — Sampling-based review of significant decisions​

Layer 4 — Boundary monitoring (scope, auth, tenancy, environments)​

Layer 5 — Replay drills (can you reconstruct and explain an incident?)​

Minimal verification checklist​

Evidence packaging (what to keep ready)​

What you are verifying

Verification layers

Layer 1 — Schema validation for action logs

Layer 2 — Coverage checks (are you logging everything you think you are?)

Layer 3 — Sampling-based review of significant decisions

Layer 4 — Boundary monitoring (scope, auth, tenancy, environments)

Layer 5 — Replay drills (can you reconstruct and explain an incident?)

Minimal verification checklist

Evidence packaging (what to keep ready)