Evidence library

Engineering Evidence

Every claim on this site is backed by engineering evidence.
Not benchmarks. Not marketing. Artifacts.

9 capabilities · real artifacts · reproducible commands · source-of-truth links

2,000
Concurrent Workers
0%
Failures
1
Production Runtime
9 capabilities · live state
↓ click any card for full evidence
P-009

Fleet Retrieval

2,000 concurrent workers · 0% failures

Validated against the production runtime. The benchmark intentionally exceeded expected production behavior by driving individual retrieval surfaces at full concurrency.

Production Measured Verified
View evidence →
P-005

GPU Saturation

96.9% mean SM utilization

Measured on production workloads, not synthetic benchmarks. 22× stock TensorRT-LLM throughput.

Production Measured Verified
View evidence →
P-001

Recovery

8 deterministic recoveries

State survives the worker. The resume packet is regenerated from PostgreSQL on every stand-down.

Live Reproducible Production
View evidence →
P-003

Evidence Packets

8,438 generated packets per WO

Gates don't pass on assertions. They pass on files. Every claim is a path on disk.

Production Measured
View evidence →
P-004

Blast Radius

14 callers · 4 files · 2 modules

Before any code change ships, the graph tells you the exact blast radius. Sub-second, no LLM.

Live Reproducible Measured
View evidence →
P-006

Memory Grounding

Real graph traversal · ~1.2s p50

Ground decisions in code, not model recall. Every hit returns file path, line range, and score.

Live Reproducible
View evidence →
P-002

Deterministic Resume

SHA-256 across 8 regenerations

Same PostgreSQL state regenerates the same packet bytes. Determinism is a property of the bytes.

Reproducible Verified
View evidence →
P-008

Cost Attribution

Per WO · phase · role · model · action

Cost is a field on every task, not a line on an invoice. Sub-penny precision per call.

Reproducible
View evidence →
P-007

Model Routing

Schema-locked dispatcher

Every call routed by required task_type and schema_id. No ungoverned dispatches.

Production Verified
View evidence →

Engineering evidence, in the open.

These nine capabilities are the foundation. More evidence entries land as the system grows. If you want to put your own work through AgentOS, become a design partner.