Infrastructure Explorer
Free-form sizing on a single screen. Edit any value and watch the numbers recompute. For multi-use-case sizing, use the wizard.
Hardware target
Model & precision
Workload pattern
Concurrency & tokens
50
4096
1024
Data architecture
Operations
VRAM
199.3 GB
GPUs
10
System RAM
448.0 GB
Storage
320.3 GB
Power
9.1 kW
3-yr TCO
$943.4K
VRAM breakdown
Customer Support RAG: KV cache exceeds model weights
KV cache is 104.9 GB vs 84.0 GB of weights. This is the #1 surprise at scale — consider PagedAttention (vLLM), reducing the agent multiplier, or capping concurrent users.
Multi-node deployment
Total 10 GPUs spans 2 server nodes. You'll need InfiniBand (or equivalent low-latency fabric) between them for tensor parallelism.
Only one use case configured
If you're sizing for the whole org, add the other workloads now. Aggregate sizing finds shared infrastructure savings (e.g., shared embedding GPU, deduped model files).
Hardware BOM
| Item | Quantity | Spec | Cost |
|---|---|---|---|
| NVIDIA H100 SXM | 10 | 80 GB · 3350 GB/s · 700 W | $300K |
| Server nodes (8-GPU HGX) | 2 | Chassis, CPU, motherboard, PSU | $40K |
| System RAM (DDR5 ECC) | 448.0 GB | ≥2× total VRAM, HNSW headroom | $2.2K |
| Enterprise NVMe storage | 320.3 GB | Models, vector index, logs | $48 |
| Networking fabric | 1 | InfiniBand HDR (multi-node) | $50K |
| Power distribution + UPS | 2 | PDU + battery backup per node | $10K |
| Total CAPEX | $402.3K | ||
| Annual OPEX (power + cooling + licenses + ops) | $180.4K | ||
| 3-Year TCO | $943.4K | ||