Infrastructure¶
Planned infrastructure improvements: compute (Graviton, Reserved Instances), database (Aurora Serverless v2), cache (Valkey cluster), and cross-region disaster recovery.
Triggers by user count
| User Count | Infrastructure Change | Estimated Savings |
|---|---|---|
| 1K | Reserved Instances for always-on baseline | ~37% on 2-3 instances |
| 5K | 1-year RI for peak capacity | ~$680/month |
| 10K | Graviton (ARM64) — deployed | ~20% on all EC2 |
| 50K | Provisioned Valkey cluster (3-6 shards); 3-year RI + Savings Plans | ~57% on all EC2 |
| 50K+ | Cross-region disaster recovery (Aurora Global Database, Route 53 failover) | HA, not cost savings |
Graviton (r7g / m7g) — Deployed¶
| Current | Target | Savings | Trigger |
|---|---|---|---|
| r7g.xlarge ($0.258/hr) | ✅ Deployed | ~20% vs Intel | Production |
| m7g.large ($0.102/hr) | ✅ Deployed | ~16% vs Intel | Production |
Migration Complete
Graviton (ARM64) migration is complete. Production runs on r7g.xlarge (ECS workers) and m7g.large (API) with ~20% and ~16% cost savings vs Intel equivalents.
Reserved Instances / Savings Plans¶
| Tier | Strategy | Estimated Savings |
|---|---|---|
| 1K users | 1-year RI for always-on baseline | ~37% on 2-3 instances |
| 5K users | 1-year RI for peak capacity | ~$680/month saved |
| 50K users | 3-year RI + Savings Plans | ~57% on all EC2 |
Aurora Serverless v2¶
Replace instance-based RDS with Aurora Serverless v2 for automatic scaling:
Benefits: - Auto-scales ACU (Aurora Capacity Units) with load - Scales to zero during off-hours (cost savings) - No manual instance type changes - Seamless Multi-AZ failover
Trade-off: Higher per-ACU cost than Reserved Instances at sustained high load.
ElastiCache Valkey Cluster Mode¶
| Scale | Current | Future |
|---|---|---|
| 1-10K | Serverless (auto-scaling) | Continue serverless |
| 50K+ | Serverless | Provisioned cluster (3-6 shards) |
At 50K+ users with 17,500 concurrent workers, provisioned cluster mode provides better price-performance than serverless at sustained high throughput.
Cross-Region Disaster Recovery¶
At 50K+ users, deploy cross-region backup:
- Aurora Global Database (read replica in secondary region)
- ElastiCache Global Datastore
- S3 cross-region replication for backups
- Route 53 failover routing
Overview · System Architecture · System Features & Product · DevOps & Quality