How to build an exchange (Jane Street talk from 2017)
Source: https://www.youtube.com/watch?v=b1e4t2k2KJY
This presentation from Jane Street provides a deep dive into the engineering required to build a high-performance, reliable financial exchange.
I. Core Concepts and Messages
- Limit Order Book (LOB): The central data structure of the exchange. Bids (buys) are sorted in decreasing order of price, and offers (sells) are sorted in increasing order, creating a visible market.
- Key Transactions: The majority of market messages consist of New Orders (buying/selling at a specific price), Cancels, and Executions (when a new order is “marketable” and crosses with an existing order in the LOB, generating a trade).
II. System Requirements
The exchange must meet stringent requirements to function correctly and fairly:
- Scale: Handling up to 3 million messages per second at peak rates, with millions of live orders and thousands of participants.
- Fairness: Information must arrive to all competing participants as simultaneously as possible.
- Reliability & Durability: Trades must be final. The system cannot “forget” or reverse an executed trade, as that would disrupt participants who have committed capital based on that execution.
- Robustness: The system must be insulated from bad behavior or carelessness from individual clients.
III. The Architecture: A “Crazy Design”
The system achieves its goals through a high-speed, minimalist, and deterministic architecture:
- Single Matching Engine: The heart of the system is a single application, the Matching Engine, which keeps the entire market state in memory on a single, high-spec x86 machine. Its job is to maintain the LOB and process transactions as simply as possible.
- Decoupling with Multicast (UDP): All auxiliary applications—Client Ports, Drop Ports, Market Data, Trade Reporters—must receive updates instantly. This is achieved using multicast (UDP), which sends data electronically to all interested parties relatively simultaneously.
- Handling Unreliability: Because UDP is unreliable, dedicated retransmitters record messages. If an application misses a message, it requests a replay from a retransmitter to restore its state.
- State Machine Replication for Failover: All applications are built using State Machine Replication. They maintain their state by deterministically applying the totally ordered stream of transactions from the Matching Engine. This allows any component (like a client port) to be instantly rebuilt by replaying the day’s messages.
- Passive Matching Engine: To ensure the core system doesn’t fail, a Passive Matching Engine runs alongside the primary one. It listens to all incoming orders and mechanically runs the identical state machine code. If the primary Matching Engine fails, the passive one can be instantly switched in without losing any transaction integrity.
IV. Concurrency and Flow Control
- Latency Determines Throughput: The system maintains a simple flow control: each client is limited to a single unacknowledged transaction in flight with the Matching Engine. This links the throughput of a port directly to the system’s low single-digit microsecond latency.
- Locking and Sequencing: To prevent race conditions (e.g., a client sending a cancel for an order that is simultaneously being executed), all messages are assigned a unique sequence number per “topic” (client port). The Matching Engine acts as the sequencer. If a client sends a message with an incorrect sequence number, the Engine simply drops it, forcing the client’s port to process the correct transaction stream, apply the execution, and only then generate a Cancel Reject to the client.
Conclusion
The key takeaways are that speed and determinism simplify the overall architecture. State Machine Replication is a robust technique that makes the system testable, repeatable, and ensures high determinism, which is essential for auditability and regulatory compliance.