Why We Built a Messaging Framework With Zero System Calls

By Sean Gilman, CTO — Velio Labs LLC · Jan 2024

TLDR: scaleRT is messaging infrastructure for systems where latency is measured in nanoseconds — matching engines, execution stacks, risk and market data pipelines, and the rest of the broker range:

Zero system calls on the main thread after initialization
Lock-free message delivery, logging, and monitoring
Multiple transports — shared memory, multicast, TCP, kernel bypass — same binary, configuration only
Built-in observability — Grafana-ready metrics with zero critical-path impact
Hot-standby failover — lower overhead than Raft, automatic promotion
Centralized management — Logos control plane for multi-site deployments
Production proven — powering institutional trading and matching infrastructure today

When you are building latency-sensitive financial infrastructure — a matching engine, an execution gateway, a risk engine, market data distribution — every microsecond matters, and a disproportionate share is consumed by your messaging layer.

This was the pattern we encountered repeatedly. The core logic was fast. But the time it took for messages to arrive on the network and reach the right thread — and for results to leave the system — was dominated by infrastructure we didn't write and couldn't control: kernel transitions, thread contention, and the unpredictable overhead of general-purpose message brokers. In a matching engine, that shows up as order-to-fill latency; in other systems it shows up as tick-to-action delay, risk check latency, or execution report lag.

So we built scaleRT.

We built it in C. Not C with a framework on top, not a “systems language” that compiles to C — just C, the same language that Linux, Windows, and virtually every operating system kernel is written in. There’s a reason for that: when your job is to manage hardware, memory, and execution with zero abstraction tax, C is what you reach for. No garbage collector deciding when to pause your thread. No runtime making allocation decisions on your behalf. No virtual machine between your logic and the hardware. Operating systems are written in C because they can’t afford overhead they don’t control — and neither can we.

What scaleRT Actually Does

scaleRT is an event-driven messaging runtime purpose-built for applications where latency is the product — whether that’s a matching engine, a broker’s execution stack, or any component that can’t afford to wait on the kernel or the message bus. It handles message delivery, event dispatch, logging, redundancy, and monitoring — and it does all of it without making a single system call on the main thread after initialization.

That constraint is the design decision from which everything else follows.

No system calls on the hot path. Traditional messaging requires the kernel for every network operation: sending a message, receiving a message, checking a socket. Each kernel transition costs microseconds and introduces scheduling jitter that’s impossible to eliminate. scaleRT eliminates this entirely. All network I/O is handled by a dedicated companion process called IOServer, which communicates with the application via lock-free shared memory ring buffers. The main thread — where your matching logic, pricing engine, risk checks, or execution logic run — stays in user space permanently. No context switches. No scheduling interference. No unpredictable behavior.

Lock-free everything. There are no mutexes anywhere in scaleRT’s hot path. Message delivery, event dispatch, logging — all lock-free. This is not merely about avoiding contention — it means latency is determined by your application logic, not by which thread acquired a lock first.

One thread, no compromises. scaleRT uses a single-threaded cooperative event loop with priority-based dispatch. High-priority work (processing an incoming order) is never preempted by low-priority work (background housekeeping). There are no thread context switches, no race conditions, and no non-deterministic behavior under load.

The Messaging Layer: Glide

At the core of scaleRT is Glide, a topic-based publish/subscribe protocol built for financial infrastructure:

Per-sender sequencing — every message is sequence-numbered per sender, with immediate gap detection. Consumers detect missing data immediately.
Reliable multicast — NAK-based gap recovery provides guaranteed delivery without the latency cost of acknowledgment-based protocols. Heartbeat intervals are configurable down to nanosecond precision.
Multiple transports, one binary — shared memory for co-located processes, UDP multicast for network distribution, TCP bridging for cloud environments, and Solarflare OpenOnload for kernel-bypass networking. Switch between them with a configuration change. No recompilation. For deployments that need the absolute lowest latency, kernel bypass eliminates the OS network stack entirely from the message path — messages arrive at wire speed plus a small, deterministic processing overhead. Enabling it is a configuration flag per network interface.

This property has significant operational implications. The same application binary runs in your development environment (shared memory), your colo data center (multicast with kernel bypass), and your cloud disaster recovery site (TCP bridge). Deployment topology is a configuration decision, not a code change.

Operational Infrastructure Without Latency Cost

Logging. scaleRT’s LogFlow framework enqueues log messages as binary data on a lock-free queue. A background thread handles all string formatting and file I/O. The main thread never formats a string, never opens a file, never makes a system call to write a log line. The result is full diagnostic logging with zero measurable latency impact.

Monitoring. scaleRT exports metrics to Carbon/Graphite for Grafana dashboarding and alerting. Three built-in stat types — counters, values, and timers — cover everything from message throughput to latency distributions (e.g. order-to-fill, tick-to-action). The Glide protocol automatically tracks its own health: packet loss, heartbeat liveness, duplicate detection. Event loop metrics expose system load before it becomes a problem. All collected with negligible overhead. All exported without touching the critical path.

Failover. RAIN (Redundant Array of Independent Nodes) provides hot-standby redundancy with automatic failover. It uses heartbeat-based health monitoring and priority-based primary election — simpler and lower-overhead than Raft, because latency-critical components (a matching engine, for example) can’t afford the latency of distributed consensus on every operation. Three missed heartbeats marks a node as down. Promotion is automatic. Applications implement two callbacks — upgrade and downgrade — and RAIN manages the remainder.

Managing It at Scale: Logos

A single scaleRT deployment might span multiple data centers with hundreds of nodes and thousands of topics. Manually coordinating multicast addresses, tracking node registrations, and managing topic-to-network bindings does not scale.

Logos is the control plane that handles this. It’s a REST API and management system that provides:

Automatic multicast allocation — define address pools per network segment, and Logos assigns unique multicast address/port pairs to topics automatically, with pattern-based matching and manual reservation support
Dynamic node registration — nodes self-register at startup with time-limited leases, so the system always reflects current operational state
Multi-site network modeling — networks, hosts, and topic bindings across data centers, managed from a single API
Full audit trail — every configuration change is tracked with history

Production Validation

scaleRT is not a research project. It is the foundation of Synapse, our matching engine deployed in institutional trading venues for multi-instrument, credit-aware order execution — one example of the kind of latency-critical system scaleRT is built for.

In Synapse, scaleRT handles order ingestion, execution report distribution, database persistence handoff, and hot-standby failover. The matching thread runs entirely on the scaleRT event loop — zero system calls, zero dynamic allocation, zero locking. The nanosecond-level latency Synapse delivers is a direct consequence of the infrastructure underneath it.

The same design choices apply to any broker or financial infrastructure component where message latency dominates: execution gateways, risk engines, market data distribution, or matching. Every decision in scaleRT was made for production workloads and validated under the conditions those systems actually face.

If your infrastructure is the bottleneck, we built scaleRT so it doesn’t have to be.

To learn more about scaleRT or discuss how it can support your matching engine, execution stack, or other latency-critical infrastructure, get in touch.

Why We Built a Messaging Framework With Zero System Calls

What scaleRT Actually Does

The Messaging Layer: Glide

Operational Infrastructure Without Latency Cost

Managing It at Scale: Logos

Production Validation

More from Velio Labs

Synapse Matching Engine

LiqEngine Gateway Framework

All News