Worker Isolation System¶
The defining architectural decision of 4pass is per-user worker isolation: every active user gets a dedicated ECS container running their broker session. This is not a convenience — it is a requirement driven by four hard constraints.
Business takeaway
Per-user isolation is non-negotiable for broker API session binding and fault containment. Pool pre-warming (897ms claim latency) makes this model economically viable at scale — without it, cold-start latency of 45-60 seconds would be unacceptable for a trading platform. The result: users get dedicated, fault-isolated broker sessions with sub-second readiness at a variable cost of $1.61/month per user.
Why Per-User Isolation¶
| Constraint | Explanation |
|---|---|
| Broker API Binding | Some broker SDKs (notably Shioaji) bind sessions to the originating IP or process. Sharing a process across users would break session affinity. |
| Credential Security | Each worker loads exactly one user's decrypted broker credentials. No worker ever has access to another user's secrets, even in the event of a memory dump or core dump. |
| Fault Isolation | A broker SDK crash, OOM, or hung connection kills only that user's worker. All other users are unaffected. The orchestrator restarts the failed worker within 60 seconds. |
| Resource Isolation | Memory-hungry broker sessions are capped at 1024 MB hard limit. A runaway process triggers an OOM kill on the container only — the EC2 host and its other 59 workers continue uninterrupted. |
Non-Negotiable
Shared-worker architectures fail here. A single Shioaji session consuming 800 MB would starve other users. A broker API timeout in one session would block the event loop for everyone. Per-user isolation is the only viable model.
Worker Lifecycle¶
Lifecycle Timings¶
| State Transition | Duration | Mechanism |
|---|---|---|
| Pool idle → Claimed | ~332ms | SQS pool-claim message received |
| Claimed → Active | 1–5s | Credential decryption + broker handshake |
| Active → Processing | <10ms | Redis BLPOP on request queue |
| Processing → Active | 50–500ms | Broker API call + response write |
| Active → Idle timeout | 30 min | No orders or control messages |
| Health check interval | 60s | Broker session ping |
| Heartbeat interval | 5s | Redis SET with 30s TTL |
Pool Pre-Warming¶
The pool is the key to sub-second worker readiness. Pre-warmed workers sit in the ECS cluster with USER_ID=-1, running a stripped-down event loop that does nothing but wait for a claim message on the SQS pool-claim queue.
How It Works¶
- pool_manager Lambda runs every 5 minutes via EventBridge
- Counts current pool workers in Redis (key pattern:
worker:active:-1:*or metadata scan) - Compares to target pool size (Terraform variable)
- Launches or terminates workers to match target
Claim Flow¶
- API determines user needs a worker → sends to
worker-control.fifo - worker_control Lambda checks Redis for pool availability
- Lambda sends claim message to
pool-claimqueue:{ user_id, encrypted_credentials } - Pool worker's
BLPOPpicks up the message - Worker transitions from
USER_ID=-1toUSER_ID={claimed_user} - Worker decrypts credentials, establishes broker connection
- Worker sets
worker:active:{user_id}in Redis with 30s TTL - Worker begins heartbeat loop (5s interval)
Benchmark Comparison¶
| Path | Step | Latency |
|---|---|---|
| Pool Claim | SQS → Lambda invocation | 565ms |
| Lambda → pool claim → worker ready | 332ms | |
| Total | 897ms | |
| RunTask (cold) | SQS → Lambda invocation | 565ms |
| Lambda → ECS RunTask → task running | 3,103ms | |
| Total | 3,659ms | |
| Cold EC2 Start | No instance available → ASG launch | 45,000–60,000ms |
| + ECS task scheduling | +3,000ms | |
| Total | 48,000–63,000ms |
4× Faster with Pool
Pool pre-warming reduces user-perceived latency from 3.6 seconds to under 1 second. For a trading platform where seconds matter, this is the difference between executing at the intended price and missing the move.
Redis Queue Communication¶
All communication between the API and workers flows through Redis lists using a request/response pattern. There is no direct network connection between the API process and any worker.
Key Patterns¶
| Key | Type | TTL | Operations |
|---|---|---|---|
trading:user:{user_id}:requests |
List | None | API: LPUSH / Worker: BLPOP (blocking, 30s timeout) |
trading:response:{request_id} |
String | 60s | Worker: SET / API: GET with polling |
worker:active:{user_id} |
String | 30s | Worker: SET every 5s / API & Lambda: GET |
worker:metadata:{user_id} |
Hash | 30s | Worker: HSET (task_arn, instance_id, launched_at) |
worker:control:{user_id}:messages |
List | None | API: LPUSH / Worker: LPOP (non-blocking check) |
Heartbeat Mechanism¶
The heartbeat is the foundation of the worker liveness protocol:
- Worker calls
SET worker:active:{user_id} {timestamp} EX 30every 5 seconds - If the worker dies, the key expires in 30 seconds (6 missed heartbeats)
- API checks the key before routing orders — if absent, triggers worker start
- Maintenance Lambda scans all
worker:active:*keys to detect orphans
The 30s TTL with 5s refresh gives a 25-second buffer — enough to survive brief Redis connectivity blips without falsely declaring a worker dead.
Connection Management¶
Each worker maintains broker connections lazily — connections are created on first use and cached for the session lifetime.
Connection Key¶
Connections are keyed on a 4-tuple: (broker_name, broker_type, simulation, account_id). A user with both a Shioaji simulation account and a Shioaji live account gets two separate broker sessions.
Connection Lifecycle¶
| Event | Action |
|---|---|
| First order for account | Create connection, authenticate with broker |
| Subsequent orders | Reuse cached connection |
| Connection error | Mark invalid, auto-reconnect on next order |
| Health check (60s) | Ping broker API, verify session alive |
| Idle timeout (30m) | Close all connections, shut down worker |
| Credential reload | Close affected connection, reconnect with new credentials |
Health Check¶
Every 60 seconds during idle periods, the worker pings each active broker connection:
- If healthy: no action
- If unhealthy: close connection, mark for reconnect on next order
- If broker API unreachable: log warning, retry on next check
This catches stale sessions before they cause order failures.
Control Messages¶
Workers monitor a control message queue in addition to the order queue:
Currently supported control messages:
| Message | Action |
|---|---|
reload_credentials |
Close broker connections, re-fetch and decrypt credentials from database, reconnect. No worker restart needed. |
shutdown |
Graceful shutdown — close all connections, stop heartbeat, exit. |
This enables credential rotation without service interruption. When a user updates their broker password in the dashboard, the API pushes a reload_credentials message, and the worker picks it up within seconds.
Orphan Detection & Recovery¶
The maintenance system runs continuously to detect and clean up inconsistencies between Redis state and ECS reality.
Anomaly Types¶
| Anomaly | Detection | Resolution | Cause |
|---|---|---|---|
| Orphan Mark | Redis key exists, no matching ECS task | Delete Redis key | Task crashed without cleanup |
| Orphan Task | ECS task running, no Redis key | Stop ECS task | Redis key expired (network partition) |
| Stale Mark | Redis key TTL expired but not deleted | Clean up associated keys | Worker froze (deadlock, OOM) |
| Duplicate Workers | Two tasks for same user_id | Stop older task | Race condition in claim flow |
Fan-Out Architecture¶
The maintenance function uses a fan-out pattern for parallel processing:
- Coordinator (
maintenance) scans all Redis marks and ECS tasks in one pass - Partitions anomalies into batches of 100
- Invokes
maintenance_workerLambda for each batch (asyncInvokeAsync) - Each worker processes its batch independently
- Results published to CloudWatch custom metrics
This scales linearly: at 10,000 active workers, the coordinator invokes ~100 maintenance_worker Lambdas that process in parallel, completing the full scan in under 5 seconds.
Resource Allocation¶
| Resource | Value | Notes |
|---|---|---|
| CPU | 64 units | 3.125% of one vCPU. Sufficient — workers are I/O-bound (waiting on broker API responses). |
| Memory (soft limit) | 384 MB | ECS scheduling target. Normal working set is 150–250 MB. |
| Memory (hard limit) | 1024 MB | OOM kill threshold. Broker SDK memory leaks are caught here. |
| ECS task count per instance | 60 | Conservative cap (could theoretically fit 84 by soft limit). |
OOM Kill Behavior
When a container exceeds 1024 MB, the Linux kernel's OOM killer terminates the container process. The EC2 instance is unaffected — all other containers continue running. The ECS agent detects the stopped task, the maintenance Lambda detects the missing heartbeat within 60 seconds, and the orchestrator launches a replacement. The user experiences a brief interruption (< 2 minutes) but no data loss — all order state is in Redis and PostgreSQL.