Skip to content

Infrastructure

Planned infrastructure improvements: compute (Graviton, Reserved Instances), database (Aurora Serverless v2), cache (Valkey cluster), and cross-region disaster recovery.

Triggers by user count

User Count Infrastructure Change Estimated Savings
1K Reserved Instances for always-on baseline ~37% on 2-3 instances
5K 1-year RI for peak capacity ~$680/month
10K Graviton (ARM64) — deployed ~20% on all EC2
50K Provisioned Valkey cluster (3-6 shards); 3-year RI + Savings Plans ~57% on all EC2
50K+ Cross-region disaster recovery (Aurora Global Database, Route 53 failover) HA, not cost savings

Graviton (r7g / m7g) — Deployed

Current Target Savings Trigger
r7g.xlarge ($0.258/hr) ✅ Deployed ~20% vs Intel Production
m7g.large ($0.102/hr) ✅ Deployed ~16% vs Intel Production

Migration Complete

Graviton (ARM64) migration is complete. Production runs on r7g.xlarge (ECS workers) and m7g.large (API) with ~20% and ~16% cost savings vs Intel equivalents.


Reserved Instances / Savings Plans

Tier Strategy Estimated Savings
1K users 1-year RI for always-on baseline ~37% on 2-3 instances
5K users 1-year RI for peak capacity ~$680/month saved
50K users 3-year RI + Savings Plans ~57% on all EC2

Aurora Serverless v2

Replace instance-based RDS with Aurora Serverless v2 for automatic scaling:

graph LR subgraph current["Current (Instance-Based)"] RDS1["db.t3.large<br/>Fixed capacity<br/>Manual scaling"] end subgraph future["Future (Serverless v2)"] RDS2["Aurora Serverless v2<br/>0.5 - 128 ACU<br/>Auto-scaling"] end current -->|"Migration at 1K users"| future

Benefits: - Auto-scales ACU (Aurora Capacity Units) with load - Scales to zero during off-hours (cost savings) - No manual instance type changes - Seamless Multi-AZ failover

Trade-off: Higher per-ACU cost than Reserved Instances at sustained high load.


ElastiCache Valkey Cluster Mode

Scale Current Future
1-10K Serverless (auto-scaling) Continue serverless
50K+ Serverless Provisioned cluster (3-6 shards)

At 50K+ users with 17,500 concurrent workers, provisioned cluster mode provides better price-performance than serverless at sustained high throughput.


Cross-Region Disaster Recovery

At 50K+ users, deploy cross-region backup:

  • Aurora Global Database (read replica in secondary region)
  • ElastiCache Global Datastore
  • S3 cross-region replication for backups
  • Route 53 failover routing

Overview · System Architecture · System Features & Product · DevOps & Quality