Client (Excel, OCR pipeline, /comprendre, Hasna's /fix wizard)
|
| HTTPS POST /query { query, factory_id, field_type, ... }
v
+----------------+
| Route53 | monce.ai zone → A record → EC2 public IP
+----------------+
|
v
+----------------+
| nginx | TLS termination + IP allowlist on write endpoints
| (eu-west-3) | /synonyms /rebuild /rebuild_all -- IP-restricted
+----------------+ /query /batch -- public
|
| proxy_pass http://127.0.0.1:8000
v
+-----------------------------------------------------------------+
| gunicorn master (PID 1854426) -- 4 UvicornWorker children |
| --workers 4 --timeout 600 -k uvicorn.workers.UvicornWorker |
+-----------------------------------------------------------------+
|
v
+-----------------------------------------------------------------+
| FastAPI app (api/snake_api/main.py) |
| + middleware: CORS, metrics, auth |
| + routers: routes.py (54 endpoints), web.py (/ /ui), |
| fix.py (/fix*), onboard.py, paper/economics/math/ |
| architecture |
+-----------------------------------------------------------------+
|
v
+-----------------------------------------------------------------+
| ArticleMatcher.match_single (matcher.py) |
| |
| 1. Resolve factory_id → FactoryModel from model_manager |
| 2. Resolve field_type → bucket via FIELD_TYPE_TO_BUCKET |
| 3. Tier 1: Snake SAT vote on field_models[bucket] |
| if conf >= 0.80 and (field is global OR global confirms) |
| → return MatchResult(method=snake_field/sat) |
| 4. Tier 2: FuzzyMatcher (built from Snake's population) |
| if conf >= 0.50 → return MatchResult(fuzzy) |
| 5. Tier 3: optional LLM (Claude Haiku) if use_llm=True |
| 6. include_fuzzy_candidates: append fuzzy pool to candidates |
| for arbitrator on /ui |
+-----------------------------------------------------------------+
|
v
+-----------------------------------------------------------------+
| Response (FastAPI JSON) |
| { factory_id, version, query, match: {...}, candidates: [], |
| audit: "...", latency_ms: 3.7 } |
+-----------------------------------------------------------------+
| Tier | Method | Source | Threshold | Latency |
|---|---|---|---|---|
| 1 | Snake SAT vote | field_model + global_model | conf ≥ 0.80 | ~3 ms |
| 2 | Fuzzy (Levenshtein + bigram) | FuzzyMatcher built from Snake.population | conf ≥ 0.50 | ~1 ms |
| 3 | LLM (Claude Haiku) | monceai/charles-json | conf ≥ 0.70 | ~400 ms |
Most production traffic terminates at Tier 1. v5.4.6 dropped the in-Snake
synonym_hash exact-match path: when a query is a known synonym, SAT returns
1.0 by construction, so 100% confidence is the exact-match case.
An incoming query carries a field_type (verre, intercalaire, remplissage,
faconnage, croisillon, forme, service, misc, or global). The matcher routes to the
appropriate bucket:
field_type from request
|
| matcher.FIELD_TYPE_TO_BUCKET (dict, single source of truth)
v
bucket name (one of 8)
|
v
model.field_models[bucket] if exists → Tier 1 SAT
else
model.global_model with field filter → Tier 1 SAT
The mapping at training time is driven by
monce_db.articles.type_article_monce — a curated FR taxonomy. Spanish
factories (F22, F23) get the same FR types because the data team curates the column. No
language-specific code path.
populate_models.py --train --factory N
|
| dumps: articles.json, synonyms.json, version.txt, global_model.json,
| field_models/*.json, clients/client_model.json
v
aws s3 sync → s3://snake-models-monce/models/factory_N/
|
| (separately) sudo kill -HUP
v
gunicorn master receives SIGHUP
|
| spawns 4 fresh worker processes (overlap window ~30 s)
v
each new worker: model_manager._load_all_models()
|
| s3.list_objects_v2(Prefix="models/") → discover factory_*
v
for fid in discovered_factory_ids:
download articles.json, synonyms.json, version.txt
download global_model.json → Snake(path)
download field_models/*.json → Snake(path) for each
download clients/client_model.json → Snake(path)
|
v
FactoryModel cached in model_manager._models[fid]
ClientFactoryModel cached in model_manager._client_models[fid]
|
| old workers drain in-flight requests & exit
v
API serves new versions
Factory IDs are discovered dynamically from S3 listing. New factories appear
without code changes. The factory_registry.py module merges S3-discovered
factories with monce_db.list_factories() for UI selectors.
The /batch endpoint takes factory_id at the top level (one
factory per batch) but field_type per item. This is intentional: a single
incoming OCR'd document contains glass + intercalaire + gas mentions in mixed order,
and each line needs its own field-typed routing. Top-level field_type is
silently ignored to avoid accidentally constraining the entire batch to one bucket.
Worker process (~5 GB RSS) +--------------------------------------------------------+ | Python interpreter + FastAPI + uvicorn ~250 MB | | algorithmeai v5.4.6 + Cython _accel.so ~5 MB | | | | ModelManager._models: Dict[factory_id, FactoryModel] | | F1 86 MB F3 86 MB F4 352 MB | | F9 52 MB F10 860 MB F13 ... | | ... 13 factories ... Σ ~2.5 GB raw JSON | | + Python object overhead ~2x → ~4.5 GB resident | | | | FuzzyMatcher cache (one per factory + bucket) | | Built lazily from Snake.population on first hit | | ~50 MB per (factory, bucket), invalidated on reload | +--------------------------------------------------------+
4 workers × ~5 GB = ~20 GB. r6i.4xlarge has 128 GB. Headroom for OCR storms, rebuild_all overlap, and future scaling to 8 workers if traffic grows.
| t | State |
|---|---|
| 0 s | SIGHUP arrives at master. Master forks 4 new workers. |
| 0–30 s | New workers each call model_manager._load_all_models(): ~30 S3 GET requests, JSON parse, Snake() reconstruction, FuzzyMatcher build on first query. Old workers continue serving traffic. |
| ~30 s | New workers ready. Master sends graceful shutdown to old workers. |
| ~30–120 s | Old workers finish in-flight requests, exit. Routing fully on new workers. |
Zero-downtime reload because old workers don't exit until they've drained. gunicorn handles the connection routing.
Browser POST /query { include_fuzzy_candidates: true, top_k: 10 }
|
v
matcher returns: candidates = [snake top-k] + [fuzzy top-k]
| (sums to 1.0) (sums to 1.0)
v
/ui (web.py): renderCard() stashes raw candidates in ARB_STORE[card_id]
|
v
user clicks "Candidates ▼"
|
v
arbitrate(candidates, topK): merge by num_article, raw = snake + fuzzy,
renormalize so ΣP = 1.0
|
v
render: { #num, denom, conf, source: Snake|Fuzzy|Both }
slider 1–50 re-ranks instantly without re-fetching
The arbitrator is pure client-side. No additional backend RPS. The math is in /math §3.
type_article_monce),
language-agnostic
Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)