According to Nvidia's latest blog analysis, Blackwell GPUs cost nearly double per hour compared to Hopper generation, yet deliver 35-fold lower per-token inference costs. Using DeepSeek-R1 as a test model, Blackwell (GB300 NVL72) rents at $2.65 per GPU per hour versus Hopper's $1.41, but single GPU throughput jumps from 90 to 6,000 tokens per second. This 65x throughput gain reduces per-million-token costs from $4.20 to $0.12.
The $0.12 figure assumes full software optimization including FP4 low-precision inference and multi-token prediction (MTP). Without MTP enabled, per-million-token costs reach approximately $2.35, dropping to $0.11 with it active, demonstrating a 21x optimization impact from that feature alone.