Skip to content

System Architecture

Planned improvements to the data layer and application backend: TimescaleDB for OHLCV, SQS for background tasks, Redis sorted sets for worker tracking, and global rate limiting.

Triggers by user count

User Count Architecture Change Business Value
1K Rate limiting middleware for all endpoints Prevents per-user abuse beyond webhooks
5K SQS for email notifications and audit log writes Durable delivery; no lost tasks on API restart
10K SQS for all background tasks; Redis sorted set for worker tracking Remove O(N) key scan bottleneck; offload all async work from API hot path
Any TimescaleDB for OHLCV candle storage Shared historical data across users; enables in-app backtest product feature

TimescaleDB for OHLCV Candles

The backtesting engine currently uses local .npz files for historical data. The planned migration to TimescaleDB enables:

graph TB subgraph current["Current"] CCXT["CCXT Fetch"] --> NPZ["Local .npz Files"] NPZ --> Backtest["Backtester"] end subgraph future["Planned"] Ingest["Ingestion Pipeline"] --> TS["TimescaleDB<br/>Hypertables"] TS --> API2["REST API"] API2 --> Backtest2["Backtester"] TS --> Resample["Continuous Aggregates<br/>(1m → 5m → 1h → 1d)"] end

Key Features: - Hypertables: Automatic time-based partitioning for OHLCV data - Compression: 10-20x compression for historical data - Continuous Aggregates: Pre-computed resampling (1m → higher timeframes) - DataSource Protocol: Swap DiskSource for TimescaleSource — zero backtester changes


Application (Backend) Improvements

SQS for All Background Tasks

Task Type Current Future Trigger
Order fill verification SQS FIFO (deployed) SQS FIFO
Email notifications BackgroundTasks (in-process) SQS 5K users
Webhook logging BackgroundTasks SQS 10K users
Audit log writes Synchronous SQS 10K users

Why Migrate?

FastAPI BackgroundTasks run in-process — if the API task crashes or restarts, pending background tasks are lost. SQS provides durable, at-least-once delivery with DLQ for failed messages.

Redis Sorted Set for Worker Tracking

Replace SCAN pattern for orphan detection with a Redis sorted set:

Current Future
SCAN all worker:active:* keys ZADD worker:all {timestamp} {user_id}
O(N) full key scan O(log N) sorted set operations
Slow at 10K+ keys Fast at any scale

Rate Limiting Middleware

Currently rate limiting is per-user at the webhook level. Planned: global rate limiting middleware for all endpoints:

  • Per-IP rate limiting (complement to WAF)
  • Per-user rate limiting across all endpoints
  • Configurable limits per endpoint group
  • Redis-based sliding window (already implemented for webhooks)

Overview · Infrastructure · System Features & Product · DevOps & Quality