Untitled Sizing

1 use case on NVIDIA H100 SXM

Total GPUs
10
2 server nodes
Total VRAM
199.3 GB
System RAM
448.0 GB
Storage
320.3 GB
CPU cores
56
Power
9.1 kW
Annual OPEX
$180.4K
3-Year TCO
$943.4K

Insights from your configuration

Customer Support RAG: KV cache exceeds model weights
KV cache is 104.9 GB vs 84.0 GB of weights. This is the #1 surprise at scale — consider PagedAttention (vLLM), reducing the agent multiplier, or capping concurrent users.
Multi-node deployment
Total 10 GPUs spans 2 server nodes. You'll need InfiniBand (or equivalent low-latency fabric) between them for tensor parallelism.
Only one use case configured
If you're sizing for the whole org, add the other workloads now. Aggregate sizing finds shared infrastructure savings (e.g., shared embedding GPU, deduped model files).

VRAM breakdown by use case

Hardware bill of materials

ItemQuantitySpecCost
NVIDIA H100 SXM1080 GB · 3350 GB/s · 700 W$300K
Server nodes (8-GPU HGX)2Chassis, CPU, motherboard, PSU$40K
System RAM (DDR5 ECC)448.0 GB≥2× total VRAM, HNSW headroom$2.2K
Enterprise NVMe storage320.3 GBModels, vector index, logs$48
Networking fabric1InfiniBand HDR (multi-node)$50K
Power distribution + UPS2PDU + battery backup per node$10K
Total CAPEX$402.3K
Annual OPEX (power + cooling + licenses + ops)$180.4K
3-Year TCO$943.4K

CAPEX breakdown