Snake API: Architecture — Request Path

Charles Dana · Monce SAS · May 2026

snake.aws.monce.ai · /paper · /economics · /math

1. End-to-end block diagram

   Client (Excel, OCR pipeline, /comprendre, Hasna's /fix wizard)
        |
        | HTTPS POST /query  { query, factory_id, field_type, ... }
        v
   +----------------+
   |   Route53      |  monce.ai zone → A record → EC2 public IP
   +----------------+
        |
        v
   +----------------+
   |     nginx      |  TLS termination + IP allowlist on write endpoints
   |  (eu-west-3)   |  /synonyms /rebuild /rebuild_all -- IP-restricted
   +----------------+  /query /batch -- public
        |
        | proxy_pass http://127.0.0.1:8000
        v
   +-----------------------------------------------------------------+
   |  gunicorn master (PID 1854426)  --  4 UvicornWorker children    |
   |  --workers 4  --timeout 600  -k uvicorn.workers.UvicornWorker   |
   +-----------------------------------------------------------------+
        |
        v
   +-----------------------------------------------------------------+
   |  FastAPI app (api/snake_api/main.py)                            |
   |    + middleware: CORS, metrics, auth                            |
   |    + routers: routes.py (54 endpoints), web.py (/ /ui),         |
   |               fix.py (/fix*), onboard.py, paper/economics/math/ |
   |               architecture                                      |
   +-----------------------------------------------------------------+
        |
        v
   +-----------------------------------------------------------------+
   |  ArticleMatcher.match_single  (matcher.py)                      |
   |                                                                 |
   |   1. Resolve factory_id → FactoryModel from model_manager   |
   |   2. Resolve field_type → bucket via FIELD_TYPE_TO_BUCKET   |
   |   3. Tier 1: Snake SAT vote on field_models[bucket]             |
   |        if conf >= 0.80 and (field is global OR global confirms) |
   |          → return MatchResult(method=snake_field/sat)      |
   |   4. Tier 2: FuzzyMatcher (built from Snake's population)       |
   |        if conf >= 0.50  → return MatchResult(fuzzy)        |
   |   5. Tier 3: optional LLM (Claude Haiku) if use_llm=True        |
   |   6. include_fuzzy_candidates: append fuzzy pool to candidates  |
   |        for arbitrator on /ui                                    |
   +-----------------------------------------------------------------+
        |
        v
   +-----------------------------------------------------------------+
   |  Response (FastAPI JSON)                                         |
   |    { factory_id, version, query, match: {...}, candidates: [],   |
   |      audit: "...", latency_ms: 3.7 }                             |
   +-----------------------------------------------------------------+

2. Tier breakdown (the matching cascade)

TierMethodSourceThresholdLatency
1Snake SAT votefield_model + global_modelconf ≥ 0.80~3 ms
2Fuzzy (Levenshtein + bigram)FuzzyMatcher built from Snake.populationconf ≥ 0.50~1 ms
3LLM (Claude Haiku)monceai/charles-jsonconf ≥ 0.70~400 ms

Most production traffic terminates at Tier 1. v5.4.6 dropped the in-Snake synonym_hash exact-match path: when a query is a known synonym, SAT returns 1.0 by construction, so 100% confidence is the exact-match case.

3. Field routing via type_article_monce

An incoming query carries a field_type (verre, intercalaire, remplissage, faconnage, croisillon, forme, service, misc, or global). The matcher routes to the appropriate bucket:

   field_type from request
        |
        | matcher.FIELD_TYPE_TO_BUCKET (dict, single source of truth)
        v
   bucket name (one of 8)
        |
        v
   model.field_models[bucket]    if exists  → Tier 1 SAT
        else
   model.global_model            with field filter  → Tier 1 SAT

The mapping at training time is driven by monce_db.articles.type_article_monce — a curated FR taxonomy. Spanish factories (F22, F23) get the same FR types because the data team curates the column. No language-specific code path.

4. Model loading and reload (SIGHUP path)

   populate_models.py --train --factory N
        |
        | dumps: articles.json, synonyms.json, version.txt, global_model.json,
        |        field_models/*.json, clients/client_model.json
        v
   aws s3 sync → s3://snake-models-monce/models/factory_N/
        |
        | (separately) sudo kill -HUP 
        v
   gunicorn master receives SIGHUP
        |
        | spawns 4 fresh worker processes (overlap window ~30 s)
        v
   each new worker: model_manager._load_all_models()
        |
        | s3.list_objects_v2(Prefix="models/")  → discover factory_*
        v
   for fid in discovered_factory_ids:
        download articles.json, synonyms.json, version.txt
        download global_model.json → Snake(path)
        download field_models/*.json → Snake(path) for each
        download clients/client_model.json → Snake(path)
        |
        v
   FactoryModel cached in model_manager._models[fid]
   ClientFactoryModel cached in model_manager._client_models[fid]
        |
        | old workers drain in-flight requests & exit
        v
   API serves new versions

Factory IDs are discovered dynamically from S3 listing. New factories appear without code changes. The factory_registry.py module merges S3-discovered factories with monce_db.list_factories() for UI selectors.

5. Why field_type is per-query, not per-batch

The /batch endpoint takes factory_id at the top level (one factory per batch) but field_type per item. This is intentional: a single incoming OCR'd document contains glass + intercalaire + gas mentions in mixed order, and each line needs its own field-typed routing. Top-level field_type is silently ignored to avoid accidentally constraining the entire batch to one bucket.

6. Memory layout (single worker, all 13 factories)

   Worker process (~5 GB RSS)
   +--------------------------------------------------------+
   |  Python interpreter + FastAPI + uvicorn        ~250 MB |
   |  algorithmeai v5.4.6 + Cython _accel.so         ~5 MB  |
   |                                                        |
   |  ModelManager._models:  Dict[factory_id, FactoryModel] |
   |    F1   86 MB   F3   86 MB   F4  352 MB                |
   |    F9   52 MB   F10 860 MB   F13 ...                   |
   |    ... 13 factories ... Σ ~2.5 GB raw JSON       |
   |    + Python object overhead ~2x → ~4.5 GB resident |
   |                                                        |
   |  FuzzyMatcher cache (one per factory + bucket)         |
   |    Built lazily from Snake.population on first hit     |
   |    ~50 MB per (factory, bucket), invalidated on reload |
   +--------------------------------------------------------+

4 workers × ~5 GB = ~20 GB. r6i.4xlarge has 128 GB. Headroom for OCR storms, rebuild_all overlap, and future scaling to 8 workers if traffic grows.

7. SIGHUP under load (the reload window)

tState
0 sSIGHUP arrives at master. Master forks 4 new workers.
0–30 sNew workers each call model_manager._load_all_models(): ~30 S3 GET requests, JSON parse, Snake() reconstruction, FuzzyMatcher build on first query. Old workers continue serving traffic.
~30 sNew workers ready. Master sends graceful shutdown to old workers.
~30–120 sOld workers finish in-flight requests, exit. Routing fully on new workers.

Zero-downtime reload because old workers don't exit until they've drained. gunicorn handles the connection routing.

8. The /ui Arbitrator data flow

   Browser POST /query  { include_fuzzy_candidates: true, top_k: 10 }
        |
        v
   matcher returns: candidates = [snake top-k] + [fuzzy top-k]
        |                          (sums to 1.0)   (sums to 1.0)
        v
   /ui (web.py): renderCard() stashes raw candidates in ARB_STORE[card_id]
        |
        v
   user clicks "Candidates ▼"
        |
        v
   arbitrate(candidates, topK): merge by num_article, raw = snake + fuzzy,
                                renormalize so ΣP = 1.0
        |
        v
   render: { #num, denom, conf, source: Snake|Fuzzy|Both }
   slider 1–50 re-ranks instantly without re-fetching

The arbitrator is pure client-side. No additional backend RPS. The math is in /math §3.

9. Summary

Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)