1,000+ TPS, measured
— not extrapolated
The Fraud Decision Engine's rule engine scales near-linearly across commodity nodes behind a bank-facing HTTPS / mTLS edge — reaching 1,118 TPS at zero errors, with PostgreSQL transaction throughput rising linearly across the full range.
Banking 33-rule chain · HTTPS with mutual TLS · synchronous (zero-data-loss) writes · no ML model loaded. A controlled cloud-lab measurement, not a guarantee — see how we tested.
Near-linear horizontal scale-out
Adding stateless engine nodes adds throughput at roughly ~215–236 TPS per commodity 4-vCPU node. One shared PostgreSQL keeps up — its transaction throughput climbs linearly from 2.4k to 10.3k xact/s through the 1,118-TPS peak. The 1,000-TPS line is first crossed by the 5th node.
The numbers
Every recorded point is 0 errors / 100% HTTP 200, with at least 6,600 requests per measured point, through the realistic HTTPS / mTLS bank-facing path (not a direct-to-engine shortcut).
| Engine nodes | Peak TPS (0 errors) | Scaling vs 1 node | PostgreSQL xact/s | p50 / p99 @ c=80 |
|---|---|---|---|---|
| 1 | 265.9 | 1.00× | 2,446 | 304 / 703 ms † |
| 2 | 492.8 | 1.85× | 4,467 | 116 / 575 ms |
| 3 | 711.0 | 2.67× | 6,460 | 62 / 443 ms |
| 4 | 942.7 | 3.55× | 8,234 | 63 / 318 ms |
| 5 | 1,118.0 | 4.20× | 10,318 | 64 / 268 ms ‡ |
† The 1-node p50/p99 are at high concurrency (c=80) on a single 4-vCPU node, where requests queue; at the per-node-optimal concurrency, latency is far lower.
‡ The 5-node peak (1,118 TPS) is at c=160. The 64 / 268 ms shown is the c=80 operating point (1,046 TPS — already >1,000 TPS). At higher concurrency throughput keeps climbing but tail latency rises: c=120 → 1,095 TPS (p99 374 ms); c=160 → 1,118 TPS (p99 552 ms). We report the operating point and the peak separately — we do not claim “p99 < 300 ms at peak.”
Why it scales
The engine is stateless: every worker holds no session state, so throughput scales with cores. The only stateful tier is one shared PostgreSQL. The binding constraint at this 4-vCPU node size is per-node application CPU; the shared PostgreSQL stays linear here. Scale the engine nodes up and PostgreSQL becomes the constraint near ~1,100 TPS — the database tier is then the next thing to scale.
Add nodes, add throughput
~215–236 TPS per 4-vCPU node, near-linear to 5 nodes. Fewer, larger nodes reach 1,000 TPS sooner.
One database, linear
Shared PostgreSQL scaled 2.4k → 10.3k xact/s linearly through the 1,118-TPS peak. The database tier is the next scaling lever — and a deferred batched-write mode adds ~25% more on the same hardware (measured).
Tail latency improves
Aggregate p99 drops as nodes are added (less per-node queueing): 703 ms → 268 ms at the operating point.
How we tested
Test bed
- • Alibaba Cloud lab,
ap-southeast-1; all instances torn down after the run. - • Up to 5 engine nodes — commodity 4 vCPU / 8 GB, 8 workers each.
- • One nginx edge / load balancer (HTTPS :443).
- • One shared PostgreSQL (8 vCPU / 16 GB).
- • In-VPC load generator driving a realistic banking wire-transfer payload.
Protocol
- • Requests hit the load balancer at
/v1/risk/evaluateover HTTPS, routed to engines via mutual TLS — the realistic bank-facing path. - • JWT + HMAC request authentication.
- • For each node count: route to exactly N engines, then sweep concurrency (c = 20 / 40 / 80 / 120, plus 160 at 5 nodes).
- • ≥ 6,600 requests per measured point; every point 0 errors / 100% HTTP 200.
- • Synchronous, zero-data-loss writes: each request commits its full ~33 rule-trigger rows inline before responding.
What this benchmark is — and isn't
- 1. Rule engine, no ML model. This run exercises the deterministic banking rule chain (33 rules) with no machine-learning model loaded. ML-scored throughput is a separate, lower number — we don't conflate the two.
- 2. Operating point vs peak. The headline 1,046 TPS at p99 268 ms is the c=80 operating point. The 1,118 TPS peak runs hotter (p99 552 ms). We report both and never claim sub-300 ms p99 at peak.
- 3. A lab measurement, not a guarantee. A controlled cloud lab with synthetic load. Real throughput depends on your hardware, rule mix, payload size, and traffic shape.
- 4. At this node size the bottleneck is per-node CPU; scale the engines and PostgreSQL becomes the ceiling. On these 4-vCPU nodes per-node CPU binds first and the shared database stays linear (428 of 600 connections at 10.3k xact/s). With larger engine nodes PostgreSQL becomes the constraint (~95% CPU near 1,100 TPS) — so the path past it is clear and measured: add database capacity, and/or the deferred batched-write mode, which we measured at ~25% more throughput on the same hardware.
- 5. Measured by node, not extrapolated. 1,000 TPS was reached by adding a 5th node and measured directly. We don't extrapolate single-server or unbounded throughput.
- 6. Throughput, not failover. This measures steady-state capacity, not node-failure / high-availability behavior. The single benchmark load balancer is a test simplification; a production deployment uses an HA load-balancer pair.
With ML scoring on, FDE still scales near-linearly — at about three-quarters of rule-only throughput
On the same commodity 5-node cluster that reaches ~1,000+ TPS on the rule engine, blending a gradient-boosted ML model into every decision sustained ≈ 710–775 TPS (about 73–76% of rule-only) at zero errors — and PostgreSQL stayed linear throughout. Adding ML costs roughly a quarter of throughput; the blend logic itself is essentially free.
What “blended” means: FDE scores every transaction two ways at once — the rule engine produces a score and an ML model produces a score — and combines them into one decision. This benchmark measures throughput when that ML score is computed and weighted into every decision.
| Engine nodes | Rule engine only | Rule + ML blended | Blended / rule-only |
|---|---|---|---|
| 1 | 278 | 203 | 73% |
| 2 | 493 | 362 | 74% |
| 3 | 707 | 539 | 76% |
| 4 | 941 | 699 | 74% |
| 5 | 1,015 | 711 | 70% |
The rule-only column is a same-day control on this cluster; the dedicated rule-only run peaks at the 1,118 TPS headline at the top of this page (instance/run variance). The blended engine scales with the same near-linear shape. The blend policy itself is essentially free — the cost is the ML serving path, not the combine math.
How we tested (blended)
Same topology as the rule-only benchmark above — N stateless engines behind an HTTPS/mTLS nginx load balancer, one shared PostgreSQL; banking mode, the full 33-rule chain, POST /v1/risk/evaluate, JWT+HMAC, 0 errors / 100% HTTP 200. The ML model is a gradient-boosted model on FDE's standard feature set, blended into every decision (model explanations off — the high-throughput serving configuration).
What this is — and isn't
- 1. A throughput benchmark, not an accuracy claim — accuracy is measured separately.
- 2. A representative synthetic model and synthetic load.
- 3. Model-dependent — the ~25% cost is for this gradient-boosted model; a lighter model costs less.
- 4. A controlled, ephemeral cloud lab.
A different test from our on-prem single-box benchmark
This is our cloud multi-node scale-out benchmark (rule engine, horizontal scale). It measures something different from our on-prem single-box KVM benchmark — full banking posture on one machine, including the optimized engine, at 257.1 RPS sustained (here RPS and TPS are the same — one request screens one transaction). Different hardware, different goals: read them side by side, not added together.
Want the numbers on your hardware?
We'll walk through the methodology and help you size a deployment for your transaction volume and latency targets.