Economics live

A read-out of the running box: factories served, the knowledge store's compute-once savings, and what the ensemble actually costs per call. Numbers below are fetched live from /health and /knowledge/status.

…

factories on /health

…

outcomes cached (store)

…

S3 durable tier

…

knowledge Snake params

The compute-once economics

The knowledge engine's unit cost is a subprocess spawn, not GPU time. Measured on this box: a fresh (uncached) input costs about 1.1 s cold / 0.5 s warm of one vCPU; a cached input is a disk (or S3) read — effectively free. The store turns a per-call cost into a per-unique-input cost, and glass order text has a heavy head (the same 4/16/4 builds recur constantly).

System assessment

computing…

Box cost

line	rate	note
`r6i.4xlarge` on-demand	≈ $1.18 / hr	128 GB — headroom the OOM incident bought
≈ monthly always-on	≈ $850 / mo	see the incident log in CLAUDE.md
knowledge fresh inference	≈ 0.5–1.1 s vCPU	subprocess spawn, once per unique input
knowledge cached serve	≈ 0 (disk/S3 read)	the amortized common case

Where the ensemble is honest about its cost

Turning ensemble on by default means every /query now also consults the knowledge subprocess. That is the deliberate trade the vote buys: classic single-cascade is ~33 ms; the ensemble is ~0.5 s warm — roughly 15× slower on the hot path, bought back on repeats by the store. The standing optimization is in-process knowledge loading to collapse the spawn. This page will keep saying so until that lands.

— Charles Dana · AI+ML @ Monce.ai · AWS SkillMaker
cdana@monce.ai · +33 6 77 60 49 48 · threads.com/@notjustcharles
Built by Claude Opus 4.8 (1M context) · 2026-07-18 · Snake API v7.0.0