The Case Against Cloud

By Sean Gilman, CTO — Velio Labs LLC · March 2026

TL;DR — Start in the cloud; it is the right place to prove a market. For exchanges and low-latency trading, cloud hits a ceiling: virtualization, network overlay, lack of hardware control, broken CPU isolation, and compounding costs cap performance and drain margin. Bare metal is the path when latency and determinism matter. scaleRT and Synapse are built so that moving from cloud to bare metal is a configuration change, not a rewrite.

Somewhere along the way, the industry decided that cloud was the answer to every infrastructure question. Startups launch on AWS. Enterprises migrate to Azure. And a generation of engineers has internalized the assumption that owning hardware is a relic — something legacy banks do because they have not modernized yet.

For most software, that assumption holds. But exchanges are not most software.

We build trading technology and matching engines. We deploy them for trading and exchange clients where the difference between 1 microsecond and 100 microseconds is the difference between a competitive venue and a cautionary tale. And over the past several years, a pattern has repeated itself with every client we work with: they start in the cloud, they hit a ceiling, and they call us.

This post is about that ceiling — what it is, why the cloud cannot fix it, and what the path forward actually looks like. We call the assumption that cloud can scale to meet any requirement cloudmania, and for low-latency exchange infrastructure, it breaks down.

The Cloud Was the Right Call

Let us be clear: cloud infrastructure is the correct starting point for a new exchange platform. When you are proving a market, iterating on contract design, onboarding your first participants, and figuring out your regulatory posture, the last thing you need is a rack-and-stack project. Cloud gives you elastic capacity, managed databases, and operational simplicity. You ship your matching engine on a VM, wire up your gateways, and focus on building the business.

scaleRT — the messaging runtime that underpins Synapse — was designed with this in mind. Its transport layer is configuration-driven: shared memory, UDP multicast, TCP bridge. The same binary runs in the cloud or on bare metal. We tell our trading and exchange clients to start in the cloud because the software does not care. It will perform well enough to get you to market, and it will not lock you into infrastructure decisions you will regret later.

The problem is what happens after "well enough."

Where Cloudmania Breaks Down

Cloudmania is the belief that cloud infrastructure can scale to meet any requirement if you throw enough instances, availability zones, and managed services at it. For web applications, this is largely true. For a matching engine, it is not — and the reasons are architectural, not operational.

You Cannot Bypass a Kernel You Do Not Own

scaleRT integrates with Solarflare OpenOnload to eliminate the operating system's network stack from the message path entirely. On bare metal, an order arrives on the wire and reaches the matching thread without a single kernel transition. The NIC delivers the packet directly to user space memory. No interrupt handling, no socket buffers, no context switches.

In the cloud, this is impossible. You do not control the NIC. You do not control the hypervisor. Every packet traverses a virtualized network interface, software-defined routing, and often multiple overlay network hops before it reaches your application. You cannot bypass a kernel that belongs to your cloud provider.

Network Virtualization Is the Bottleneck

Of all the performance taxes the cloud imposes on trading infrastructure, network virtualization is the most consequential. Every cloud provider interposes a software-defined networking layer between your application and the physical wire. Packets are encapsulated in overlay protocols (VXLAN, GRE, Geneve), routed through virtual switches, processed by security group rules, and often traverse multiple hops across hypervisor boundaries before reaching your process.

For a web application serving requests in tens of milliseconds, this overhead is noise. For a matching engine processing orders in single-digit microseconds, it is the dominant contributor to both latency and jitter. The virtualized network path introduces variance that is fundamentally unpredictable — packet processing times depend on hypervisor load, overlay routing decisions, and contention from other tenants sharing the same physical network infrastructure. This is not a tuning problem. You cannot configure your way out of a network architecture that was designed for multi-tenant isolation, not deterministic latency.

On bare metal, a packet arrives on the physical NIC and — with kernel bypass enabled — is delivered directly to your application's memory space. One hop. Deterministic. Measurable in nanoseconds. The network is not virtualized because it does not need to be. You own it.

For our trading and exchange clients, network virtualization is typically the single largest source of latency variance in their cloud deployments. It is also the single largest improvement when they migrate to bare metal.

Shared Memory Requires Shared Hardware

scaleRT's fastest transport delivers sub-microsecond latency between co-located processes via shared memory ring buffers. The matching engine, the order gateway, and the market data publisher communicate through memory regions on the same physical host — no serialization, no network stack, no copies.

In a cloud environment, even processes nominally on the "same host" may be separated by virtualization boundaries. Shared memory either does not work or introduces latency that defeats its purpose. You are paying for the abstraction of isolation, and that abstraction costs you the transport mechanism that matters most.

Jitter Is Not a Bug — It Is the Architecture

The cloud performance conversation usually focuses on throughput or average latency. These are the wrong metrics for a matching engine. The metric that matters is tail latency — the P99 and P99.9 numbers that define worst-case execution quality.

A matching engine that processes orders in 5 microseconds on average but spikes to 500 microseconds once per thousand events delivers a worse participant experience than one that consistently runs at 10 microseconds. Market makers notice. They adjust their quoting accordingly. Liquidity degrades. The platform suffers.

On bare metal with CPU isolation, NUMA-aware placement, and interrupt affinity tuning, jitter sources are systematically eliminated. On a cloud VM, they are systemically guaranteed: hypervisor scheduling, noisy neighbors, network virtualization, storage I/O contention. These are not failure modes — they are the normal operating characteristics of shared infrastructure.

CPU isolation is not an optimization — it is a prerequisite. A matching engine must guarantee that the thread evaluating an incoming order is never interrupted. Any preemption — even for a few microseconds — inflates execution latency unpredictably, widens the gap between arrival and acknowledgment, and introduces variance that participants can measure. Market makers calibrate their quoting strategies to your worst-case latency, not your average. If your tail numbers are unstable, spreads widen and liquidity suffers. CPU isolation is how you keep those numbers tight.

CPU isolation in a virtualized environment does not work. On bare metal, isolcpus removes cores from the kernel scheduler entirely. Combined with cgroup cpusets, IRQ affinity masks, and nohz_full for tickless operation, you guarantee that the matching core executes nothing but your matching thread. The core belongs to you.

In a cloud VM, the vCPU you "isolate" inside the guest is a time-sliced thread on a physical core managed by the hypervisor. The hypervisor can — and does — preempt your vCPU to service other tenants or run its own housekeeping. Your guest-level isolcpus configuration is invisible to the hypervisor scheduler. The jitter is coming from below the floor you are standing on.

Hardware You Cannot Tune in the Cloud

On bare metal, the processor itself becomes a tuning surface. Disabling Turbo Boost and locking the CPU frequency governor to a fixed clock rate eliminates the frequency scaling that would otherwise cause instruction-level timing to vary between operations. Disabling hyperthreading ensures the matching core is never sharing execution resources with a sibling thread. Disabling power saving states (C-states) prevents the processor from entering low-power modes that add microseconds of wake-up latency when the next order arrives.

These are BIOS-level and kernel-level controls that produce measurable, reproducible improvements in both average and tail latency. In the cloud, none of them are available. You do not control the BIOS. You do not control the CPU power governor. You cannot disable hyperthreading on a core that a hypervisor is scheduling across multiple tenants. The hardware tuning that delivers the final tier of deterministic performance is simply not exposed to you.

Cloud environments do not expose hardware performance counters — the low-level CPU metrics (cache misses, branch mispredictions, memory bandwidth utilization) that are essential for profiling and tuning a latency-sensitive workload. On bare metal, these counters are the foundation of systematic performance optimization. In the cloud, the hypervisor owns the PMU (Performance Monitoring Unit), and you are tuning blind.

Multicast was Born to be Free

Market data distribution on an exchange is a one-to-many problem. On bare metal with UDP multicast, this is solved at the network layer: one packet sent, all subscribers receive it simultaneously. The cost is the same whether you have ten consumers or ten thousand.

In the cloud, multicast is not available. Every market data consumer requires a dedicated TCP stream through scaleRT's bridge process. This works — but it adds latency, increases CPU overhead, and incurs egress charges on every byte. For a platform distributing Level 2 data across thousands of contracts to hundreds of participants, the egress bill alone can exceed the cost of owning a network switch.

Cloud Costs Do Not Scale — They Compound

Cloud pricing is designed to be easy to adopt and difficult to leave. At low volume, the bills are manageable and predictable. As an exchange grows — more contracts, more participants, more market data — cloud costs do not scale linearly with value delivered. They compound.

Egress charges grow with every additional market data consumer. Compute costs grow with every redundant instance spun up to compensate for the performance variance the platform itself introduces. Storage costs grow with every audit log, every replay stream, every compliance archive that regulators require you to retain. And reserved instance commitments — the mechanism cloud providers offer to reduce per-hour costs — lock you into capacity forecasts that may not match your actual growth trajectory, converting variable expense into a fixed liability without the benefit of owning the asset.

The result is an infrastructure budget that becomes less predictable as the business becomes more successful. Volume spikes that should be revenue events become cost events. Capacity planning becomes cloud bill forecasting. And the engineering team spends cycles optimizing cloud spend instead of optimizing the platform.

On bare metal, the economics invert:

No per-hour compute charges. Your matching engine runs 24/7 on hardware you own.
No egress fees. Multicast is free on your own network.
No virtualization tax. Every CPU cycle goes to your application, not a hypervisor.
Predictable monthly costs that do not spike with volume or contract count.

As volumes grow, the per-transaction cost on bare metal decreases while the cloud bill increases. For a platform processing millions of contracts across thousands of markets, the total cost of ownership shifts decisively toward bare metal within the first year. The cloud bill that once seemed like a predictable operating expense becomes, in retrospect, a scaling tax on your own success.

The Migration Is Not a Rewrite

Here is what cloudmania gets wrong about the path to bare metal: the assumption that it requires ripping everything out and starting over. It does not — if the software was designed correctly from the beginning.

scaleRT's transport abstraction means that moving from cloud to bare metal is a configuration change, not a code change. The same Synapse binary that ran on EC2 runs on a tuned bare metal host. The difference is what you can now enable: kernel bypass, shared memory, multicast, CPU isolation. Features that were always in the software but could not be activated because the infrastructure did not support them.

This is why we built scaleRT the way we did. The cloud is a starting point, not a destination. The software should never be the reason you cannot move.

How Velio Labs Gets You There

Measure first. Instrument the existing cloud deployment to establish a latency baseline (application, network, infrastructure) so migration targets are concrete.

Design for the workload. Network topology for multicast and kernel bypass; compute layout with CPU pinning and NUMA alignment; co-location of matching engine, gateways, and market data for shared memory transport; storage sized for audit logging off the execution path.

Tune the stack. BIOS, kernel, NIC firmware, switch config, IRQ affinity, hugepages, tickless operation — hardware alone does not guarantee performance.

Validate end-to-end. Deploy Synapse and scaleRT on the new infrastructure and profile latency against the cloud baseline.

Keep what works in the cloud. Operations dashboards, monitoring, client portals, DR, and dev environments can stay in the cloud; the TCP Bridge connects both environments.

Start in the Cloud. Graduate to Bare Metal.

Cloudmania tells you that the cloud is the end state. For most software, it probably is. For an exchange — where latency is measured in microseconds, where market makers evaluate your infrastructure before committing liquidity, where regulators expect deterministic behavior and complete auditability — the cloud is a starting point.

Build on scaleRT and Synapse so that your software does not constrain your infrastructure choices. Start in the cloud because it is the right place to prove a market. And when you hit the ceiling — when participants demand better execution, when market makers compare you against bare metal venues, when your egress bill exceeds the cost of a network switch — call us.

Velio Labs professional services designs, builds, and operates bare metal datacenter environments for exchange and trading platform operators. Learn more about our infrastructure practice or contact us to discuss your migration.