System Architecture¶
Planned improvements to the data layer and application backend: TimescaleDB for OHLCV, SQS for background tasks, Redis sorted sets for worker tracking, and global rate limiting.
Triggers by user count
| User Count | Architecture Change | Business Value |
|---|---|---|
| 1K | Rate limiting middleware for all endpoints | Prevents per-user abuse beyond webhooks |
| 5K | SQS for email notifications and audit log writes | Durable delivery; no lost tasks on API restart |
| 10K | SQS for all background tasks; Redis sorted set for worker tracking | Remove O(N) key scan bottleneck; offload all async work from API hot path |
| Any | TimescaleDB for OHLCV candle storage | Shared historical data across users; enables in-app backtest product feature |
TimescaleDB for OHLCV Candles¶
The backtesting engine currently uses local .npz files for historical data. The planned migration to TimescaleDB enables:
Key Features:
- Hypertables: Automatic time-based partitioning for OHLCV data
- Compression: 10-20x compression for historical data
- Continuous Aggregates: Pre-computed resampling (1m → higher timeframes)
- DataSource Protocol: Swap DiskSource for TimescaleSource — zero backtester changes
Application (Backend) Improvements¶
SQS for All Background Tasks¶
| Task Type | Current | Future | Trigger |
|---|---|---|---|
| Order fill verification | SQS FIFO (deployed) | SQS FIFO | — |
| Email notifications | BackgroundTasks (in-process) | SQS | 5K users |
| Webhook logging | BackgroundTasks | SQS | 10K users |
| Audit log writes | Synchronous | SQS | 10K users |
Why Migrate?
FastAPI BackgroundTasks run in-process — if the API task crashes or restarts, pending background tasks are lost. SQS provides durable, at-least-once delivery with DLQ for failed messages.
Redis Sorted Set for Worker Tracking¶
Replace SCAN pattern for orphan detection with a Redis sorted set:
| Current | Future |
|---|---|
SCAN all worker:active:* keys |
ZADD worker:all {timestamp} {user_id} |
| O(N) full key scan | O(log N) sorted set operations |
| Slow at 10K+ keys | Fast at any scale |
Rate Limiting Middleware¶
Currently rate limiting is per-user at the webhook level. Planned: global rate limiting middleware for all endpoints:
- Per-IP rate limiting (complement to WAF)
- Per-user rate limiting across all endpoints
- Configurable limits per endpoint group
- Redis-based sliding window (already implemented for webhooks)
Overview · Infrastructure · System Features & Product · DevOps & Quality