The GPU Shortage: What's Real, What's Not, and What Comes Next
For most of 2023 and 2024, the narrative was simple: there aren't enough GPUs. NVIDIA couldn't manufacture H100s fast enough, lead times stretched past a year, and companies hoarded allocation like a strategic asset.
That era is ending. H100 lead times have collapsed from 52 weeks to under 8. TSMC's CoWoS packaging capacity — the bottleneck that constrained HBM-equipped GPU production — has expanded significantly. And with Blackwell ramping, H100 and even H200 inventory is becoming more available as hyperscalers refresh their fleets.
But the picture isn't as simple as "the shortage is over."
What's actually happening
The market is bifurcating. At the high end — Blackwell, GB200 NVL72 racks, large-scale training clusters — demand still outstrips supply. NVIDIA's newest hardware is allocated to the largest buyers first, and lead times for rack-scale configurations remain measured in quarters, not weeks.
At the mid-range — H100, A100, and increasingly H200 — supply has caught up. Spot pricing on H100 instances has dropped 40-60% from peak. Secondary market prices for H100 SXM cards have fallen below list price for the first time.
The real constraint has shifted
For most AI teams, the bottleneck is no longer "can I get GPUs?" It's:
- Can I get GPUs configured the way I need them? Interconnect topology, memory configuration, and networking setup matter as much as the GPU itself for training workloads.
- Can I get them at a price that makes my unit economics work? Falling spot prices help, but the gap between on-demand cloud pricing and dedicated bare metal remains wide.
- Can I get them with predictable availability? Spot instances can be interrupted. Reserved capacity requires long commitments. The middle ground — flexible contracts with guaranteed availability — is where most teams actually want to be.
What this means for AI teams
If you're running inference or fine-tuning workloads, you're in a buyer's market. Shop aggressively, negotiate hard, and consider regions outside the U.S. where infrastructure costs are structurally lower.
If you're planning large-scale training, the constraint isn't individual GPUs — it's integrated systems. Superclusters with proper NVLink and InfiniBand fabric, adequate cooling, and reliable power aren't commodity products. They're engineered systems, and the teams that can deliver them reliably are the ones worth talking to.
The shortage narrative served its purpose. The reality that replaces it is more interesting: GPU compute is becoming a market with real price discovery, regional arbitrage, and differentiated offerings. That's healthier for everyone building with AI.