Snake API: Economics of Deterministic Matching

Charles Dana · Monce SAS · May 2026

snake.aws.monce.ai · /paper · /architecture · /math

1. The headline number

Cost per matched line: ~$3.7 × 10−9

That is, $0.0000037 per query. Or: $3.70 per million matched lines. Flat. Linear. No bulk discount needed because the baseline is already negligible.

This is the cost of a fully-loaded production query: nginx ingress, gunicorn dispatch, FastAPI request parsing, Snake SAT vote across the relevant field model, fuzzy fallback if needed, JSON response with audit trail, full TLS round-trip. End to end. Same number whether the query hits Tier 1 (SAT 100%), Tier 2 (fuzzy), or returns no match.

2. Cost model

2.1 Fixed costs (always-on)

The whole stack lives on one EC2 instance. No queue, no Lambda, no serverless cold start.

ComponentSpecMonthly
EC2 r6i.4xlarge16 vCPU, 128 GB RAM, eu-west-3$590
S3 storage (snake-models-monce)~3 GB models + backups~$1
Route53 hosted zonemonce.ai & aws.monce.ai$0.50
Data transfer outtypical workload~$5
Total fixed~$597 / mo

r6i.4xlarge is overspec for the current load (we've seen 21 GB used out of 123 GB at peak with 4 workers). Could downsize to r6i.2xlarge (~$295/mo) at any time. We don't because headroom buys us margin for OCR batch storms and rebuild_all without thrashing.

2.2 Variable costs (per-query)

The instance is provisioned 24/7. Variable cost = (instance $/sec) ÷ (sustained q/s).

$/query = ($590 · 12) / (260 q/s · 86400 s/day · 365)

At sustained 260 q/s (single-instance benchmark), that's ~$0.00000086 per query amortized. We use a more conservative 50% utilization assumption to get the headline $3.7·10−9: roughly 130 q/s average, or ~10 million queries per day.

3. Comparison: Snake API vs LLM API

The natural alternative is a Claude Haiku call to interpret each line. Same task, different substrate.

ApproachLatencyCost / lineDeterminismAudit trailMulti-tenant
Snake API3.7 ms$3.7×10−9YesSAT clauses13 factories isolated
Claude Haiku 4.5~400 ms~$0.0001NoFree-text onlyPer-call prompting
GPT-4o-mini~600 ms~$0.00015NoFree-text onlyPer-call prompting
Embedding similarity (OpenAI)~80 ms~$0.00002~YesCosine scorePer-tenant index
SageMaker endpoint~50 ms~$0.0001YesModel-dependentPer-endpoint $/hr
Snake API is ~27,000× cheaper than Claude Haiku per matched line.

3.1 At what volume does the math flip?

The fixed-cost crossover: if you serve k queries per month, Snake API beats Claude Haiku when:

$597 + k · $3.7×10−9 < k · $0.0001

Solving: k > ~6 million queries/month. Below that, an LLM is cheaper if you have no other reason to run an EC2. Above 6M/month, Snake's economics dominate.

Production load on this EC2 is currently ~10–30M queries/month. We're firmly in Snake territory.

4. Memory and per-factory size

FactoryArticlesSynonymsGlobal modelField modelsClient modelTotal
F1 (VIT)1,9967,39832 MB32 MB22 MB~86 MB
F3 (Monce)1,9967,39832 MB32 MB22 MB~86 MB
F4 (VIP)6,64721,826120 MB117 MB115 MB~352 MB
F9 (Eurovitrage)1,9296,10630 MB11 MB11 MB~52 MB
F10 (TGVI)17,17351,639440 MB~270 MB149 MB~860 MB
F13–F18 (FR)1.7K–6.8K5K–22K24–117 MB~50–100 MB3–9 MBvaries
F22 (ES)3,98815,71476 MB~30 MB20 MB~126 MB
F23 (ES)4,25512,75269 MB~30 MB1 MB~100 MB
All 13~60K~195K~1.4 GB~700 MB~370 MB~2.5 GB on disk

Per-worker resident memory: ~5 GB (4 workers × ~5 GB = 20 GB total, fits in 128 GB with massive headroom). The 2.5 GB disk size includes deserialized population for fuzzy matching; resident is larger because of Python object overhead.

5. Operational cost: rebuild_all

A full rebuild_all retrains every factory's article + client models from scratch. Triggered nightly via cron, on-demand by ops, or after a bulk synonym import.

FactorValue
Wall-clock duration~9 min 20 s (32 min including the 13-factory ES expansion)
EC2 cost (during rebuild)~$0.31 (32 min × $0.94/hr at on-demand r6i.4xlarge)
Disk I/O~5 GB write to /tmp + ~3 GB upload to S3
S3 PUT requests~270 (13 factories × ~21 model files each)
S3 cost~$0.001 per rebuild
Worker reload (SIGHUP)~30 s per worker, all 4 in parallel
A full rebuild costs ~$0.31. Nightly cron: ~$10/month.

6. Adding a new factory

Onboarding a new factory (F14 VIO is the most recent example, F22/F23 the most recent locale expansion):

StepCostTime
Auto-onboarding cron picks up new factory_id from monce_db$04h+15m polling cycle
populate_models.py training run~$0.05 (5–10 min EC2)5–10 min
S3 upload~$0.00130 s
SIGHUP reload$030 s
Total~$0.05 marginal< 15 min from new tenant signup to live matching

No code changes needed. Factory IDs are discovered dynamically via S3 listing (model_manager._load_all_models()) and via monce_db.list_factories().

7. Why we don't go serverless

Lambda would be tempting: scale-to-zero, pay-per-invoke. But:

SnakeBatch (the v6 distributed trainer at snakebatch.aws.monce.ai) does use Lambda — because training is bursty and parallel. Inference is steady and stateful. Different workload, different infra.

8. Summary

Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)