Why We Built a Messaging Framework With Zero System Calls

TLDR: scaleRT is messaging infrastructure for systems where latency is measured in nanoseconds — matching engines, execution stacks, risk and market data pipelines, and the rest of the broker range:


When you are building latency-sensitive financial infrastructure — a matching engine, an execution gateway, a risk engine, market data distribution — every microsecond matters, and a disproportionate share is consumed by your messaging layer.

This was the pattern we encountered repeatedly. The core logic was fast. But the time it took for messages to arrive on the network and reach the right thread — and for results to leave the system — was dominated by infrastructure we didn't write and couldn't control: kernel transitions, thread contention, and the unpredictable overhead of general-purpose message brokers. In a matching engine, that shows up as order-to-fill latency; in other systems it shows up as tick-to-action delay, risk check latency, or execution report lag.

So we built scaleRT.

We built it in C. Not C with a framework on top, not a “systems language” that compiles to C — just C, the same language that Linux, Windows, and virtually every operating system kernel is written in. There’s a reason for that: when your job is to manage hardware, memory, and execution with zero abstraction tax, C is what you reach for. No garbage collector deciding when to pause your thread. No runtime making allocation decisions on your behalf. No virtual machine between your logic and the hardware. Operating systems are written in C because they can’t afford overhead they don’t control — and neither can we.


What scaleRT Actually Does

scaleRT is an event-driven messaging runtime purpose-built for applications where latency is the product — whether that’s a matching engine, a broker’s execution stack, or any component that can’t afford to wait on the kernel or the message bus. It handles message delivery, event dispatch, logging, redundancy, and monitoring — and it does all of it without making a single system call on the main thread after initialization.

That constraint is the design decision from which everything else follows.

No system calls on the hot path. Traditional messaging requires the kernel for every network operation: sending a message, receiving a message, checking a socket. Each kernel transition costs microseconds and introduces scheduling jitter that’s impossible to eliminate. scaleRT eliminates this entirely. All network I/O is handled by a dedicated companion process called IOServer, which communicates with the application via lock-free shared memory ring buffers. The main thread — where your matching logic, pricing engine, risk checks, or execution logic run — stays in user space permanently. No context switches. No scheduling interference. No unpredictable behavior.

Lock-free everything. There are no mutexes anywhere in scaleRT’s hot path. Message delivery, event dispatch, logging — all lock-free. This is not merely about avoiding contention — it means latency is determined by your application logic, not by which thread acquired a lock first.

One thread, no compromises. scaleRT uses a single-threaded cooperative event loop with priority-based dispatch. High-priority work (processing an incoming order) is never preempted by low-priority work (background housekeeping). There are no thread context switches, no race conditions, and no non-deterministic behavior under load.


The Messaging Layer: Glide

At the core of scaleRT is Glide, a topic-based publish/subscribe protocol built for financial infrastructure:

This property has significant operational implications. The same application binary runs in your development environment (shared memory), your colo data center (multicast with kernel bypass), and your cloud disaster recovery site (TCP bridge). Deployment topology is a configuration decision, not a code change.


Operational Infrastructure Without Latency Cost

Logging. scaleRT’s LogFlow framework enqueues log messages as binary data on a lock-free queue. A background thread handles all string formatting and file I/O. The main thread never formats a string, never opens a file, never makes a system call to write a log line. The result is full diagnostic logging with zero measurable latency impact.

Monitoring. scaleRT exports metrics to Carbon/Graphite for Grafana dashboarding and alerting. Three built-in stat types — counters, values, and timers — cover everything from message throughput to latency distributions (e.g. order-to-fill, tick-to-action). The Glide protocol automatically tracks its own health: packet loss, heartbeat liveness, duplicate detection. Event loop metrics expose system load before it becomes a problem. All collected with negligible overhead. All exported without touching the critical path.

Failover. RAIN (Redundant Array of Independent Nodes) provides hot-standby redundancy with automatic failover. It uses heartbeat-based health monitoring and priority-based primary election — simpler and lower-overhead than Raft, because latency-critical components (a matching engine, for example) can’t afford the latency of distributed consensus on every operation. Three missed heartbeats marks a node as down. Promotion is automatic. Applications implement two callbacks — upgrade and downgrade — and RAIN manages the remainder.


Managing It at Scale: Logos

A single scaleRT deployment might span multiple data centers with hundreds of nodes and thousands of topics. Manually coordinating multicast addresses, tracking node registrations, and managing topic-to-network bindings does not scale.

Logos is the control plane that handles this. It’s a REST API and management system that provides:


Production Validation

scaleRT is not a research project. It is the foundation of Synapse, our matching engine deployed in institutional trading venues for multi-instrument, credit-aware order execution — one example of the kind of latency-critical system scaleRT is built for.

In Synapse, scaleRT handles order ingestion, execution report distribution, database persistence handoff, and hot-standby failover. The matching thread runs entirely on the scaleRT event loop — zero system calls, zero dynamic allocation, zero locking. The nanosecond-level latency Synapse delivers is a direct consequence of the infrastructure underneath it.

The same design choices apply to any broker or financial infrastructure component where message latency dominates: execution gateways, risk engines, market data distribution, or matching. Every decision in scaleRT was made for production workloads and validated under the conditions those systems actually face.

If your infrastructure is the bottleneck, we built scaleRT so it doesn’t have to be.


To learn more about scaleRT or discuss how it can support your matching engine, execution stack, or other latency-critical infrastructure, get in touch.

More from Velio Labs