← All evidence
P-009 · Fleet Retrieval

Validated with 2,000 concurrent AI workers.

Production Measured Verified

The benchmark intentionally exceeded expected production behavior by driving individual retrieval surfaces at full concurrency, even though real autonomous organizations distribute requests across multiple tools simultaneously rather than sending every worker to the same endpoint.

Validated concurrent workers
Search Observations
2,000 ✓
Trace Symbol Dependencies
2,000 ✓
List Indexes
2,000 ✓
Create Observation
2,000 ✓
Search Code
2,000 ✓
Search References
1,500 ✓
Explain Code Path
100
Search Code validated modes: exact_symbol · keyword. Search References validated at 1,500 concurrent (default mode); hybrid and semantic modes validated at 1,200 concurrent. Explain Code Path is LLM-bound and scales with available model throughput.
Retrieval surfaces · peak throughput at 2,000 concurrent
ToolModeRPSp50p99
Search Observationskeyword1,4681.34s1.90s
Search Observationsby-id1,2461.61s3.82s
Trace Symbol Dependenciesgraph1,1911.65s2.37s
List Indexesregistry1,0701.87s2.51s
Create Observationfire-and-forget1,0791.59s3.49s
Create Observationsync9632.13s3.55s
Search Codeexact symbol2688.02s11.06s
Search Codekeyword9819.5s40.4s
Hybrid retrieval · peak throughput at 1,200 concurrent
ToolModeRPSp50p99
Search Codesemantic4452.63s3.58s
Search Codehybrid4482.61s3.37s
Search Referencessemantic3401.95s8.93s
Search Referenceshybrid3442.60s11.08s
Search Observationshybrid8100.81s31.1s
Hybrid retrieval prioritizes recall over tail latency by combining semantic and lexical ranking. Under extreme synthetic concurrency, p99 grows while maintaining 100% successful responses.
Reasoning surfaces · LLM-bound
ToolConcurrencyRPSp50Notes
Explain Code Path100118.79sBounded by available model throughput. Scales with the Inference Fabric's concurrent model serving.
Benchmark environment
Workstation
Single production workstation
CPU
Intel Core Ultra · 24 cores
Memory
256 GB DDR5
GPU
NVIDIA RTX 5080
Powers production inference services. Not used to accelerate retrieval. Retrieval benchmarks exercised the production runtime while these services remained active.
Platform
runtime
Production AgentOS runtime · running concurrently:
  • AgentOS
  • Agent Swarm
  • FAFO Memory
  • Inference Fabric
  • FAFO Buffer
  • MCP endpoint
  • PostgreSQL
  • Redis
  • Vector indexes
  • Embedding service
  • Cross-encoder reranker
  • Local model execution
  • Background orchestration
  • Observability / telemetry services
Endpoint
Production MCP endpoint — not a stripped-down benchmark harness
Methodology
Window
30s warmup · 60s measure · 30s cooldown per run
Captured
2026-05-20 → 2026-05-22
Coverage
7 fafo-memory tools across 5+ retrieval modes (keyword · semantic · hybrid · exact symbol · dependency · by-id · fire-and-forget) plus the LLM-bound reasoning surface
Measurement
REST mode against the live production deployment; p50/p95/p99 latency + RPS + error rate per (tool, mode, concurrency)
Reproducibility
Equivalent workstation hardware + the same production AgentOS stack will reproduce these results
The claim
The production AgentOS runtime sustained 2,000 concurrent autonomous workers against its memory layer while the broader platform was also running: Agent Swarm, Inference Fabric, FAFO Buffer, PostgreSQL, Redis, embeddings, reranking, local model services, and background orchestration.

PostgreSQL wasn't dedicated. Redis wasn't dedicated. The GPU wasn't dedicated. FAFO Memory wasn't running on an empty machine.

The benchmark exercised the system the way customers would actually deploy it.