Performance benchmark — lab measurement

1,000+ TPS, measured — not extrapolated

The Fraud Decision Engine's rule engine scales near-linearly across commodity nodes behind a bank-facing HTTPS / mTLS edge — reaching 1,118 TPS at zero errors, with PostgreSQL transaction throughput rising linearly across the full range.

Banking 33-rule chain · HTTPS with mutual TLS · synchronous (zero-data-loss) writes · no ML model loaded. A controlled cloud-lab measurement, not a guarantee — see how we tested.

1,118 TPS
peak, 5 nodes, 0 errors
1,046 TPS
sustained @ p99 268 ms
4.2×
throughput, 1 → 5 nodes
0
errors / non-200s, every point

Near-linear horizontal scale-out

Adding stateless engine nodes adds throughput at roughly ~215–236 TPS per commodity 4-vCPU node. One shared PostgreSQL keeps up — its transaction throughput climbs linearly from 2.4k to 10.3k xact/s through the 1,118-TPS peak. The 1,000-TPS line is first crossed by the 5th node.

Peak TPS by engine node count: 266 at 1 node, 493 at 2, 711 at 3, 943 at 4, 1,118 at 5 nodes — crossing the 1,000-TPS target at the 5th node. PostgreSQL transaction throughput scales linearly from 2.4k to 10.3k xact/s. Commodity 4-vCPU nodes, HTTPS with mTLS, banking 33-rule chain, no ML model loaded, zero errors.

The numbers

Every recorded point is 0 errors / 100% HTTP 200, with at least 6,600 requests per measured point, through the realistic HTTPS / mTLS bank-facing path (not a direct-to-engine shortcut).

Engine nodes Peak TPS (0 errors) Scaling vs 1 node PostgreSQL xact/s p50 / p99 @ c=80
1265.91.00×2,446304 / 703 ms
2492.81.85×4,467116 / 575 ms
3711.02.67×6,46062 / 443 ms
4942.73.55×8,23463 / 318 ms
51,118.04.20×10,31864 / 268 ms

The 1-node p50/p99 are at high concurrency (c=80) on a single 4-vCPU node, where requests queue; at the per-node-optimal concurrency, latency is far lower.

The 5-node peak (1,118 TPS) is at c=160. The 64 / 268 ms shown is the c=80 operating point (1,046 TPS — already >1,000 TPS). At higher concurrency throughput keeps climbing but tail latency rises: c=120 → 1,095 TPS (p99 374 ms); c=160 → 1,118 TPS (p99 552 ms). We report the operating point and the peak separately — we do not claim “p99 < 300 ms at peak.”

Why it scales

The engine is stateless: every worker holds no session state, so throughput scales with cores. The only stateful tier is one shared PostgreSQL. The binding constraint at this 4-vCPU node size is per-node application CPU; the shared PostgreSQL stays linear here. Scale the engine nodes up and PostgreSQL becomes the constraint near ~1,100 TPS — the database tier is then the next thing to scale.

Bank systems connect over HTTPS and TLS 1.3 to an nginx edge load balancer on port 443, which routes over mutual TLS to a pool of stateless engine nodes (4 vCPU, 8 workers each, scaling from 1 to 5 nodes). Every engine commits durably to one shared PostgreSQL (8 vCPU) before responding. A dedicated outbox flusher node is idle in this synchronous run.

Add nodes, add throughput

~215–236 TPS per 4-vCPU node, near-linear to 5 nodes. Fewer, larger nodes reach 1,000 TPS sooner.

One database, linear

Shared PostgreSQL scaled 2.4k → 10.3k xact/s linearly through the 1,118-TPS peak. The database tier is the next scaling lever — and a deferred batched-write mode adds ~25% more on the same hardware (measured).

Tail latency improves

Aggregate p99 drops as nodes are added (less per-node queueing): 703 ms → 268 ms at the operating point.

How we tested

Test bed

  • Alibaba Cloud lab, ap-southeast-1; all instances torn down after the run.
  • Up to 5 engine nodes — commodity 4 vCPU / 8 GB, 8 workers each.
  • One nginx edge / load balancer (HTTPS :443).
  • One shared PostgreSQL (8 vCPU / 16 GB).
  • In-VPC load generator driving a realistic banking wire-transfer payload.

Protocol

  • Requests hit the load balancer at /v1/risk/evaluate over HTTPS, routed to engines via mutual TLS — the realistic bank-facing path.
  • JWT + HMAC request authentication.
  • For each node count: route to exactly N engines, then sweep concurrency (c = 20 / 40 / 80 / 120, plus 160 at 5 nodes).
  • ≥ 6,600 requests per measured point; every point 0 errors / 100% HTTP 200.
  • Synchronous, zero-data-loss writes: each request commits its full ~33 rule-trigger rows inline before responding.

What this benchmark is — and isn't

  • 1. Rule engine, no ML model. This run exercises the deterministic banking rule chain (33 rules) with no machine-learning model loaded. ML-scored throughput is a separate, lower number — we don't conflate the two.
  • 2. Operating point vs peak. The headline 1,046 TPS at p99 268 ms is the c=80 operating point. The 1,118 TPS peak runs hotter (p99 552 ms). We report both and never claim sub-300 ms p99 at peak.
  • 3. A lab measurement, not a guarantee. A controlled cloud lab with synthetic load. Real throughput depends on your hardware, rule mix, payload size, and traffic shape.
  • 4. At this node size the bottleneck is per-node CPU; scale the engines and PostgreSQL becomes the ceiling. On these 4-vCPU nodes per-node CPU binds first and the shared database stays linear (428 of 600 connections at 10.3k xact/s). With larger engine nodes PostgreSQL becomes the constraint (~95% CPU near 1,100 TPS) — so the path past it is clear and measured: add database capacity, and/or the deferred batched-write mode, which we measured at ~25% more throughput on the same hardware.
  • 5. Measured by node, not extrapolated. 1,000 TPS was reached by adding a 5th node and measured directly. We don't extrapolate single-server or unbounded throughput.
  • 6. Throughput, not failover. This measures steady-state capacity, not node-failure / high-availability behavior. The single benchmark load balancer is a test simplification; a production deployment uses an HA load-balancer pair.
Companion benchmark · ML scoring on

With ML scoring on, FDE still scales near-linearly — at about three-quarters of rule-only throughput

On the same commodity 5-node cluster that reaches ~1,000+ TPS on the rule engine, blending a gradient-boosted ML model into every decision sustained ≈ 710–775 TPS (about 73–76% of rule-only) at zero errors — and PostgreSQL stayed linear throughout. Adding ML costs roughly a quarter of throughput; the blend logic itself is essentially free.

What “blended” means: FDE scores every transaction two ways at once — the rule engine produces a score and an ML model produces a score — and combines them into one decision. This benchmark measures throughput when that ML score is computed and weighted into every decision.

Peak zero-error TPS, rule engine only vs rule+ML blended, on the same 5-node cluster: 278 vs 203 at 1 node, 493 vs 362 at 2, 707 vs 539 at 3, 941 vs 699 at 4, and 1,015 vs 711 at 5 nodes — blended sustains 73 to 76 percent of rule-only and scales with the same near-linear shape.
Engine nodes Rule engine only Rule + ML blended Blended / rule-only
127820373%
249336274%
370753976%
494169974%
51,01571170%

The rule-only column is a same-day control on this cluster; the dedicated rule-only run peaks at the 1,118 TPS headline at the top of this page (instance/run variance). The blended engine scales with the same near-linear shape. The blend policy itself is essentially free — the cost is the ML serving path, not the combine math.

How we tested (blended)

Same topology as the rule-only benchmark above — N stateless engines behind an HTTPS/mTLS nginx load balancer, one shared PostgreSQL; banking mode, the full 33-rule chain, POST /v1/risk/evaluate, JWT+HMAC, 0 errors / 100% HTTP 200. The ML model is a gradient-boosted model on FDE's standard feature set, blended into every decision (model explanations off — the high-throughput serving configuration).

What this is — and isn't

  • 1. A throughput benchmark, not an accuracy claim — accuracy is measured separately.
  • 2. A representative synthetic model and synthetic load.
  • 3. Model-dependent — the ~25% cost is for this gradient-boosted model; a lighter model costs less.
  • 4. A controlled, ephemeral cloud lab.

A different test from our on-prem single-box benchmark

This is our cloud multi-node scale-out benchmark (rule engine, horizontal scale). It measures something different from our on-prem single-box KVM benchmark — full banking posture on one machine, including the optimized engine, at 257.1 RPS sustained (here RPS and TPS are the same — one request screens one transaction). Different hardware, different goals: read them side by side, not added together.

Want the numbers on your hardware?

We'll walk through the methodology and help you size a deployment for your transaction volume and latency targets.