← All evidence
P-009 · Fleet Retrieval
Validated with 2,000 concurrent AI workers.
Production
Measured
Verified
The benchmark intentionally exceeded expected production behavior by driving individual retrieval surfaces at full concurrency, even though real autonomous organizations distribute requests across multiple tools simultaneously rather than sending every worker to the same endpoint.
| Tool | Mode | RPS | p50 | p99 |
|---|---|---|---|---|
| Search Observations | keyword | 1,468 | 1.34s | 1.90s |
| Search Observations | by-id | 1,246 | 1.61s | 3.82s |
| Trace Symbol Dependencies | graph | 1,191 | 1.65s | 2.37s |
| List Indexes | registry | 1,070 | 1.87s | 2.51s |
| Create Observation | fire-and-forget | 1,079 | 1.59s | 3.49s |
| Create Observation | sync | 963 | 2.13s | 3.55s |
| Search Code | exact symbol | 268 | 8.02s | 11.06s |
| Search Code | keyword | 98 | 19.5s | 40.4s |
| Tool | Mode | RPS | p50 | p99 |
|---|---|---|---|---|
| Search Code | semantic | 445 | 2.63s | 3.58s |
| Search Code | hybrid | 448 | 2.61s | 3.37s |
| Search References | semantic | 340 | 1.95s | 8.93s |
| Search References | hybrid | 344 | 2.60s | 11.08s |
| Search Observations | hybrid | 810 | 0.81s | 31.1s |
| Tool | Concurrency | RPS | p50 | Notes |
|---|---|---|---|---|
| Explain Code Path | 100 | 11 | 8.79s | Bounded by available model throughput. Scales with the Inference Fabric's concurrent model serving. |
PostgreSQL wasn't dedicated. Redis wasn't dedicated. The GPU wasn't dedicated. FAFO Memory wasn't running on an empty machine.
The benchmark exercised the system the way customers would actually deploy it.