How We Made Our Fraud Engine 2.5× Faster in a Day

In one working day, our Fraud Decision Engine went from 103.3 to 257.1 decisions per second on the same laptop-class server — roughly 2.5× the throughput of our published baseline — without adding a single CPU core, relaxing one security control, or changing one decision the engine makes. The gain came entirely from deleting redundant work inside each request, not from buying hardware.

That last point is the whole story. It is easy to make software faster by throwing a bigger box at it, and easy to make a benchmark look good by quietly switching checks off. We did neither. Every number below was measured in full banking posture — encrypted, mutually authenticated, durably persisted — and every decision the engine returned after the work was byte-for-byte identical to the one it returned before. This is a story about finding waste, not adding muscle.

The result, in plain numbers

On a single laptop-class machine — an Intel Core i7-11800H (8 cores / 16 threads) running two KVM virtual machines, one for the application and one for PostgreSQL — sustained throughput more than doubled while latency fell at the same time.

Metric (sustained, full banking posture)	Before	After	Change
Throughput (sustained)	103.3 RPS	257.1 RPS	2.5×
Throughput (peak)	—	293.5 RPS	—
p50 latency	147 ms	74 ms	~2× faster
p95 latency	446.5 ms	122.8 ms	3.6× faster
Errors	0	0	unchanged
Data-loss invariant	exact	exact	unchanged

The 2.5× is measured against our published baseline of 103.3 RPS. These are lab measurements on hardware we control — not a performance guarantee, and not a figure we are extrapolating to a production server.

On-prem KVM benchmark in full banking posture — sustained RPS rising from VMware 2-VM 70.8 to KVM optimized engine 257.1 (293.5 peak); p50/p95 latency dropping from 147/446.5 ms to 74/123 ms on a single 8-core KVM lab machine, zero errors. — Full banking posture, zero errors, on a single laptop-class KVM machine. Lab measurements — not a performance guarantee.

Why a bank should care about per-request CPU

A single node's throughput is bounded by one thing: how much CPU each decision costs. Cut that cost and the same machine serves more decisions — no new hardware required. That is exactly the lever this work pulled.

The engine is stateless: every durable fact lives in PostgreSQL, nothing in the application process. Because a node holds no session state, you scale out by adding nodes behind the load balancer — linear per core — until the shared database becomes the binding constraint. (That database-scaling step is on our roadmap, not a number we are claiming today.) But the first and cheapest win is making each decision itself cheaper, which is what we did here.

Running this on-premise on KVM matters for a second reason: the bank owns the metal. Fixed CPU governor, no noisy neighbours, no shared-tenant variance — so a per-core cost model measured in a lab is one a customer can size their own deployment against, with full data residency and full security posture on hardware they control. Performance becomes a property you can measure and reproduce, not a black box you have to trust.

We measured before we touched a line of code

The fastest way to waste an optimization week is to guess. So we started with a measured CPU profile using py-spy, which found that roughly half of the application's CPU was removable-class work — effort that produced no part of the decision.

The profile also overturned two popular assumptions before we spent any time on them. The async stack was already optimal, so there was no win waiting there. And the usual suspects — logging, JSON handling, in-process TLS, authentication crypto, PII encryption, and the machine-learning path — were each immaterial, a couple of percent at most. We spent zero effort on any of them. The waste was somewhere far less glamorous.

The five changes that did the work

Each change shipped on its own, with its own benchmark and a test that asserts the engine's decisions did not move. Ranked by contribution:

Flatten the request context once (+75%). Every rule was rebuilding a full deep copy of the request context — once for every rule in the banking chain, dozens of times per request. We now compute one flattened view per request and pass it down. This single change added 75% throughput on its own; the CPU spent evaluating rules fell from roughly 19% to under 2%.
Cache configuration lookups (+13%). Threshold configuration was read from the database repeatedly within a single request — the scoring fallback alone probes several combinations. A short-lived, tenant-scoped cache with explicit invalidation, negative caching, and fail-closed behaviour on errors removed the duplicate reads. This is the change that pushed sustained throughput past our target.
Serialize once, seed once (+15%). The decision snapshot was being serialized several times per request, and a registry table was being re-seeded on every single call. We now serialize the snapshot once and reuse it, and seed the registry at startup instead of per request — a 15% gain, measured together.
Stop the connection pool from churning (zero errors at peak). The database connection pool was opening and closing connections under load. Throughput-neutral on its own — but once the engine got faster, bursts briefly exposed this as a handful of fail-closed rejections at peak. Steady-state pool sizing and a bounded wait took burst rejections from 74 to 0. This is the change that lets the headline say “zero errors” under peak load.
Collapse the thread hops (tighter tail latency). The evaluation path made eight separate hand-offs to worker threads for synchronous lookups; we batched them into four. Throughput was flat by design — this was a latency win, pulling the p95 tail down further.

None of these is exotic. Every one replaces “do this N times” with “do this once” — flatten once, cache with invalidation, serialize once, seed at startup, hold a steady pool. That is also why the wins are durable: they survive restarts and redeploys, and a regression that reintroduces the waste trips a test in CI rather than quietly eroding throughput.

What we deliberately did not change

“Same workload, same hardware, same security posture, just a faster engine” is literally true here. Four things were held fixed across both the before and after runs:

The hardware — the same single host, the same VM shapes. No cores, RAM, or nodes added.
The security posture — TLS 1.3 mutual-TLS across services, PII-encrypted decision snapshots, and signed, authenticated requests, all active in both runs.
The decisions — the full banking rule chain, scoring thresholds, and routing are byte-identical before and after, pinned by a golden-master replay. No rule output changed; no thresholds were tuned.
Durability — synchronous, zero-data-loss persistence, with every event, rule trigger, audit record, and encrypted snapshot committed before the API responds. We re-counted the rows after every run to prove it.

The engine got faster, not smarter. It catches exactly what it caught before — it simply costs less to run.

Reproducible by design — and honest about its limits

This was not a lucky run. The whole effort was profiling-driven: a measured profile picked the targets, one change shipped at a time with its own benchmark, and a re-profile afterward confirmed the targeted hot spots had collapsed. The run methodology is fixed and documented, so the same measurement can be reproduced rather than taken on faith.

And the limits, stated plainly: this was measured on a laptop-class CPU, under synthetic banking traffic, in our lab. It is not a guarantee, and it is not production data. The engine is stateless and scales linearly per core, so server-class hardware will exceed these numbers — but we are not going to quote you a server figure we have not measured. Higher single-box throughput and multi-node scale-out are on our roadmap, deliberately not numbers we are putting in a benchmark today.

The fastest way to make software faster is usually to make it do less. We found that roughly half of each request was motion that changed no outcome; most systems running flat out are hiding the same. Before you size a bigger server, point a profiler at the request path — the cheapest capacity you will ever buy is the work you stop doing.

Run-True Decision is building a fraud decision engine purpose-built for Southeast Asian banks. Talk to us to learn more.

How We Made Our Fraud Engine 2.5× Faster in a Day

The result, in plain numbers

Why a bank should care about per-request CPU

We measured before we touched a line of code

The five changes that did the work

What we deliberately did not change

Reproducible by design — and honest about its limits

Explore the Platform

Related Articles

On-Premise vs. Cloud Fraud Detection: Making the Right Choice for Your Bank

Is Your Fraud Engine Agent-Ready?

When AI Agents Start Paying: What Banks Must Prepare For