Nvidia Blackwell GPU Costs Double, But Per-Token Inference Expense Falls 35x vs Hopper

According to Nvidia's latest blog analysis, Blackwell GPUs cost nearly double per hour compared to Hopper generation, yet deliver 35-fold lower per-token inference costs. Using DeepSeek-R1 as a test model, Blackwell (GB300 NVL72) rents at $2.65 per GPU per hour versus Hopper's $1.41, but single GPU throughput jumps from 90 to 6,000 tokens per second. This 65x throughput gain reduces per-million-token costs from $4.20 to $0.12.

The $0.12 figure assumes full software optimization including FP4 low-precision inference and multi-token prediction (MTP). Without MTP enabled, per-million-token costs reach approximately $2.35, dropping to $0.11 with it active, demonstrating a 21x optimization impact from that feature alone.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments