Pick your models and precision

Precision is the single biggest lever — INT8 typically halves your GPU count vs FP16 with near-zero quality loss for most enterprise tasks.

Customer Support RAG

Model architecture and inference precision

Model preset
Precision

Recommended for cost savings — halves GPU count vs FP16

Advanced: architecture overrides