Glossary term
Throughput
The amount of data or work processed per unit time; what bandwidth ultimately enables for AI workloads.
Throughput is the amount of data or work processed per unit time — tokens per second for inference, samples per second for training. It is the practical performance metric that bandwidth, compute, and latency together determine. Maximizing throughput per dollar is the core optimization target of inference economics, since it sets the marginal cost of serving AI. See Inference Economics 101.