High Frequency Trading Project

Summary

The global financial markets process trillions of dollars daily, making microsecond-level order execution critical for high-frequency trading (HFT) profitability. Standard CPU-based trading engines often struggle with performance bottlenecks due to sequential execution and memory access limitations. This project addresses the need for ultra-low-latency, high-throughput computing to execute large volumes of trades in real time. By overcoming software limitations, the system provides a scalable solution that benefits modern financial institutions and traders relying on speed and reliability. The system utilizes a hybrid hardware-software architecture, employing an FPGA-based Tensor Processing Unit (TPU) to run a lightweight neural network for intelligent order placement. A novel FPGA-based MultiQueue BRAM architecture is used for deterministic, parallel order matching. Meanwhile, the host computer manages over 100,000 orders using lock-free Red-Black Trees for cache-efficient, large-scale storage. The machine learning component leverages a Proximal Policy Optimization (PPO) reinforcement learning agent to dynamically rank trades for caching in the FPGA's local memory.

Technical Approach/Methodology

The system utilizes a hybrid hardware-software architecture, employing an FPGA-based Tensor Processing Unit (TPU) to run a lightweight neural network for intelligent order placement. A novel FPGA-based MultiQueue BRAM architecture is used for deterministic, parallel order matching. Meanwhile, the host computer manages over 100,000 orders using lock-free Red-Black Trees for cache-efficient, large-scale storage. The machine learning component leverages a Proximal Policy Optimization (PPO) reinforcement learning agent to dynamically rank trades for caching in the FPGA's local memory.

Outcomes

Our team successfully built a complete end-to-end high-frequency trading system featuring a real-time web-based user terminal. Key accomplishments include a trade matching engine that achieves a throughput of 75 million orders per second with a deterministic 2-cycle matching latency. Additionally, the hardware-accelerated TPU pipeline evaluates order priority in just 20 nanoseconds at 200 MHz, achieving 63% order capture accuracy during mock market backtesting. Deliverables include the hardware-accelerated trade matcher, the TPU decision engine, the optimized software backend, and a live React-based UI that tracks order books and market depth.

Project Media

Project Poster