Four pieces of math live inside this endpoint: (1) the Dana Theorem — why SAT for classification is constructive, not search-based. (2) The arbitrator — how Snake and fuzzy combine into a probability-summing ranking. (3) Confidence semantics — what a 0.85 actually means. (4) Audit & CPU equations — how every clause and every cycle maps to an inequality you can verify.
Theorem. For any indicator function f : {0,1}n → {0,1} over a finite discrete domain, there exists a CNF formula φ such that φ(x) = ¬f(x) for all x, and φ can be constructed in polynomial time in |dom f|.
Proof sketch (constructive). Let T = {x : f(x) = 1} (target / members) and F = {x : f(x) = 0} (non-members), both finite. For each f ∈ F:
Number of clauses < |F| (each clause excludes at least one element). Cost per clause: O(|F| · n). Total: O(|F|2 · n), polynomial. With bucketing into chunks of size b: O((|T| + |F|) · b · n), linear in data.
□
What this doesn't claim: that SAT is in P. SAT is NP-complete because, for arbitrary CNFs, finding a satisfying assignment is hard. Dana's theorem flips the question: we don't solve SAT, we construct a SAT instance directly from data, then evaluate it (also polynomial). The data is the assignment. The hardness of SAT applies to a different problem.
For class C and query x, Snake stores L stochastic layers, each with one CNF φC,ℓ. Define the layer-vote indicator:
(x is consistent with class C in layer ℓ.) The query's probability of being in class C is the empirical vote rate:
The denominator is L (not |classes|) because clauses are independent per layer, and each layer votes for at most one class given x. Sum over classes:
with equality iff every layer reaches a verdict (no abstentions). In practice the API re-normalizes the returned distribution so Σ = 1 over the top-k candidates.
If x is a known synonym of class C* in the training population, then for every layer ℓ, φC*,ℓ(x) = 0 by construction (the clause was built to exclude non-members of C*, and x ∈ C*). So:
Conversely, if P(C* | x) = 1.0, every layer voted for C*, which (modulo the bucket assignment, see Snake v5.0.0 lookalike origins) means x sits inside C*'s consistent set in every layer's bucket. SAT confidence 1.0 is the formal statement of “exact synonym”. v5.4.6 dropped the separate hash lookup because this equivalence makes it redundant.
The /query endpoint with include_fuzzy_candidates=true returns two pools:
Concatenating the two response lists gives a sum ≈ 2.0 with overlap. The arbitrator fuses them into a single probability distribution.
For each candidate c in the union of both pools:
(treating PS(c) = 0 if c not in Snake's pool, and similarly for fuzzy.)
Claim. Σc Parb(c) = 1, for any non-empty candidate union.
Proof. Σc Parb(c) = Σc raw(c) / Z = Z / Z = 1. The only failure mode is Z = 0, which occurs only when both pools are empty. The endpoint guards against this by returning an empty list rather than dividing by zero.
□
Suppose three candidates: A votes 0.5 from Snake and 0.4 from fuzzy. B votes 0.5 from Snake only. C votes 0.6 from fuzzy only. Total raw = 0.9 + 0.5 + 0.6 = 2.0. Then:
| Candidate | PS | PF | raw | Parb |
|---|---|---|---|---|
| A (Both) | 0.5 | 0.4 | 0.9 | 45% |
| C (Fuzzy only) | 0 | 0.6 | 0.6 | 30% |
| B (Snake only) | 0.5 | 0 | 0.5 | 25% |
A wins despite having a lower max-source confidence than C (0.4 vs 0.6) because being voted by both sources is itself the strongest signal. This is the only ranking that falls out of the Σ = 1.0 constraint without ad-hoc weighting.
One could write Parb(c) = α PS(c) + (1−α) PF(c). This forces a hyperparameter α that depends on the relative trust we put in each source. The arbitrator's pure-sum-then-renormalize formulation is equivalent to α = 0.5 only when both pools are individually unit-sum, which is exactly the regime we're in. So α = 0.5 is not chosen, it's derived.
Operationally: out of L = 5 stochastic layers, 4.25 of them voted that x is consistent with the predicted class. Concretely with L = 5, allowable values are {0, 0.2, 0.4, 0.6, 0.8, 1.0}. The 0.85 you see in API responses is post-renormalization across top-k.
The default tier-1 threshold θ = 0.80 (configurable via
SNAKE_CONFIDENCE_THRESHOLD) corresponds to at least 4 out of 5 layers
agreeing. This is a fairness bound: with random SAT clauses, the chance that 4/5
independent layers vote for the same class spuriously decays geometrically in L.
For uniform-random p = 1/|classes|, with 1,000 classes (a typical factory): false agreement probability is bounded above by ~5 × 10−12. The tier-1 threshold is operationally safe.
Fuzzy returns Levenshtein-similarity scores in [0, 1]. The 0.50 threshold
(FUZZY_THRESHOLD) is a calibration choice, not a probability. It corresponds
to roughly “at least half the characters match modulo edit distance.” The two
thresholds — SAT 0.80 and fuzzy 0.50 — are not directly comparable; the
arbitrator's renormalization is what bridges them.
Every Snake response with include_audit=true exposes the SAT reasoning trail.
For a query x classified as C with audit row {layer, bucket, clause, literal}:
The audit trail makes Snake's prediction reproducible character by character. If a human disagrees, they can pinpoint which literal was the deciding one and either:
POST /synonyms (writes to snake_learning,
forces the next training to include this pair).POST /taboo (filters this (text, target) pair out before
SAT construction, both the hash and SAT see clean data).The closed-loop equation is:
Each rebuild solves for φ given the current curated population. Hasna's corrections become tomorrow's clauses.
Snake v5.4.6 ships a Cython-accelerated _accel.pyx for the inner loop.
The hot path is clause evaluation: given a query x and N clauses, compute
φ(x) = Λi Ci(x). The naive Python loop is:
Cython brings the per-literal cost from ~1 µs (Python attribute access + dict lookup) to ~30 ns (C struct + indexed array lookup), a ~33× speedup. For a typical factory's field model (~5,000 clauses, mean clause size ~8 literals):
| Implementation | Per-clause cost | Per-query φ eval |
|---|---|---|
| Pure Python | ~8 µs | ~40 ms |
| Cython _accel | ~240 ns | ~1.2 ms |
| Speedup | ~33× |
The end-to-end /query latency of ~3.7 ms includes: HTTP parse (~0.5 ms), Snake clause eval (~1.2 ms), fuzzy candidate fetch (~0.3 ms when enabled), arbitrator merge (~0.1 ms), JSON serialization (~0.5 ms), TLS round-trip (~1 ms). Cython is the largest single contributor.
At 260 q/s sustained on a single instance, the 1.2 ms inner loop is the load-bearing piece. If we regressed to pure Python, throughput would drop to ~30 q/s and the EC2 spec would need to triple to keep latency below 100 ms. Cython is the difference between this workload running on one r6i.4xlarge and needing a fleet.
Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)