Architecture live

The live topology of snake.aws.monce.ai — modules, routes, the request flow, and how the knowledge engine stays off the workers. Read from the code, not memory.

…

factories on /health

…

knowledge factories

…

store entries

…

API version

Request flow

                Route53  →  nginx (TLS, write-IP gate)  →  gunicorn (uvicorn workers)
                                                                │
                                                                ▼  FastAPI app
      ┌──────────────────────────────┬───────────────────────────┬─────────────────────┐
      │ /query /batch                │ /lookup                   │ /knowledge          │
      │  matcher.py (Snake SAT →      │  lookup.py (value_        │  knowledge_routes → │
      │   fuzzy → LLM cascade)        │   corrected memory)       │   subprocess        │
      │  models resident per factory  │  in-proc matcher          │   inference.py      │
      └───────────────┬──────────────┴───────────────┬───────────┴──────────┬──────────┘
                      │                               │                      │
              ensemble=true (default)                 │             load model → predict → EXIT
                      │                               │                      │  (no resident model)
                      ▼                               │                      ▼
                 assess.py  ── gathers all four votes ─┴──────────►  store (disk + S3)
                      │                                              compute once, then lookup
                      ▼
             400-point weighted verdict  →  match / candidates / composite audit

Modules

matcher.py

3-tier Snake SAT → fuzzy → LLM cascade; per-factory resident models; field-bucket routing. Powers /query /batch.

lookup.py

Deterministic value_corrected → matching memory, field-compartmentalized, exact→fuzzy fallback. Powers /lookup.

knowledge.py + inference.py

Catalogue vLookup + algorithmeai Snake v5.5.1, per (factory,field). Subprocess load-and-unload, store-backed.

assess.py

The 400-point weighted vote over all four engines + composite audit. Wired seamlessly into /query//batch.

Routes

route	method	what
`/query` `/batch`	POST	article matching; `ensemble` default on
`/lookup` `/lookup/batch`	POST	value_corrected memory resolve
`/knowledge/query` `/batch`	POST	vLookup + Snake, store-backed
`/knowledge`	GET	landing: dropdowns, live query, truth-table editor, KPI
`/assess` `/assess/batch`	POST	400-point ensemble verdict
`/assess/ui`	GET	datasource-conflict viewer
`/paper /math /economics /architecture`	GET	these surfaces
`/health`	GET	factory model stats (consumed by /economics)

The knowledge engine stays off the workers

The Feb 2026 OOM incident taught the box that every gunicorn worker loading every model does not scale. So the knowledge layer holds nothing resident: a cache miss shells out to scripts/inference.py, which loads one small Snake JSON, predicts, and the process exits. The outcome is written to a two-tier store — local disk (hot) and S3 (durable, survives restart and the no-Elastic-IP instance recycling) — keyed by the tokenized (factory, field, normalize(text)).

Deploy topology

	spec
Instance	`r6i.4xlarge` · 16 vCPU · 128 GB · eu-west-3
Workers	gunicorn uvicorn workers behind nginx
Engine	algorithmeai v5.5.1 (numpy runtime), loaded per subprocess
Store	local disk + `s3://…/knowledge_store` (box IAM role)
Data	monce_db (catalogue, value_corrected) · vendored knowledge bundles

See /economics for the live cost read-out.

— Charles Dana · AI+ML @ Monce.ai · AWS SkillMaker
cdana@monce.ai · +33 6 77 60 49 48 · threads.com/@notjustcharles
Built by Claude Opus 4.8 (1M context) · 2026-07-18 · Snake API v7.0.0