System Architecture¶

Planned improvements to the data layer and application backend: TimescaleDB for OHLCV, SQS for background tasks, Redis sorted sets for worker tracking, and global rate limiting.

Triggers by user count

User Count	Architecture Change	Business Value
1K	Rate limiting middleware for all endpoints	Prevents per-user abuse beyond webhooks
5K	SQS for email notifications and audit log writes	Durable delivery; no lost tasks on API restart
10K	SQS for all background tasks; Redis sorted set for worker tracking	Remove O(N) key scan bottleneck; offload all async work from API hot path
Any	TimescaleDB for OHLCV candle storage	Shared historical data across users; enables in-app backtest product feature

TimescaleDB for OHLCV Candles¶

The backtesting engine currently uses local .npz files for historical data. The planned migration to TimescaleDB enables:

graph TB subgraph current["Current"] CCXT["CCXT Fetch"] --> NPZ["Local .npz Files"] NPZ --> Backtest["Backtester"] end subgraph future["Planned"] Ingest["Ingestion Pipeline"] --> TS["TimescaleDB<br/>Hypertables"] TS --> API2["REST API"] API2 --> Backtest2["Backtester"] TS --> Resample["Continuous Aggregates<br/>(1m → 5m → 1h → 1d)"] end

Key Features: - Hypertables: Automatic time-based partitioning for OHLCV data - Compression: 10-20x compression for historical data - Continuous Aggregates: Pre-computed resampling (1m → higher timeframes) - DataSource Protocol: Swap DiskSource for TimescaleSource — zero backtester changes

Application (Backend) Improvements¶

SQS for All Background Tasks¶

Task Type	Current	Future	Trigger
Order fill verification	SQS FIFO (deployed)	SQS FIFO	—
Email notifications	BackgroundTasks (in-process)	SQS	5K users
Webhook logging	BackgroundTasks	SQS	10K users
Audit log writes	Synchronous	SQS	10K users

Why Migrate?

FastAPI BackgroundTasks run in-process — if the API task crashes or restarts, pending background tasks are lost. SQS provides durable, at-least-once delivery with DLQ for failed messages.

Redis Sorted Set for Worker Tracking¶

Replace SCAN pattern for orphan detection with a Redis sorted set:

Current	Future
`SCAN` all `worker:active:*` keys	`ZADD worker:all {timestamp} {user_id}`
O(N) full key scan	O(log N) sorted set operations
Slow at 10K+ keys	Fast at any scale

Rate Limiting Middleware¶

Currently rate limiting is per-user at the webhook level. Planned: global rate limiting middleware for all endpoints:

Per-IP rate limiting (complement to WAF)
Per-user rate limiting across all endpoints
Configurable limits per endpoint group
Redis-based sliding window (already implemented for webhooks)

Overview · Infrastructure · System Features & Product · DevOps & Quality