AI · Infrastructure·April 2026·8 min read

L40S vs A100 vs H100 — which GPU for which job.

Cloud catalogs make chip names look like currencies. The failure mode is picking the newest generation when your workload is mostly steady-state serving — where memory, batching, and good engines matter as much as peak TFLOPS [1].

This article complements the H100 explainer: it pulls L40S and A100 into the same practical comparison [2].

Three families, fast.

H100 (Hopper): the datacenter class most cited for heavy training and large-scale inference; Tensor Cores, wide HBM, and the ecosystem default for new GenAI TCO models [2].

A100 (Ampere): still common in broad training and HPC-style stacks; a balanced historical workhorse in many price lists [2].

L40S (Ada): often a strong play for efficient inference, visualization-adjacent stacks, and power-conscious deployments; not a universal H100 replacement for every training run [1].

The right question is not which chip. It is: which chip hits your SLO at your context size and your concurrency for the same $/M tokens.

FIG. 1 — train vs serve (simplified).

FIG. 1 — GPU FAMILY ↔ PRIMARY WORKLOAD (SCHEMATIC)

A Nuqta-field rule.

We often start pilots on smaller cards and only scale once the same tokens/second and latency are measured on identical prompts — that keeps procurement honest [5].

Pair this with inference vs training economics: training spends hours; inference spends forever tokens [4].

Frequently asked questions.

Is L40S enough for hard Arabic workloads? It depends on model size and context — not the language name [3].
Should I always jump from A100 to H100? Not if the bottleneck is a serving engine — sometimes software fixes the ceiling before you buy silicon [2].
How do I make vendor A vs B comparable? Fix precision, driver, and engine versions before comparing tokens/sec [3].
What about vLLM? It raises throughput — it does not change GPU physics [2].
Does this apply in Oman? Supply, contracts, and colocation still filter which SKUs you can land — also read digital sovereignty in Oman [5].

Closing.

NVIDIA public pages describe the product lines — the decision still needs a measured load, not a catalog guess [1][2].

Ask your vendor for a line: same load, same batch, same context — then compare $/1M useful tokens to match your SLO [4].

Sources.

[1] NVIDIA — L40S GPU (product).

[2] NVIDIA — H100 / A100 data center overviews.

[3] MLCommons — MLPerf Inference.

[4] Nuqta — internal GPU procurement notes, April 2026.

[5] Nuqta — pilot-to-prod playbooks, April 2026.

What is the H100 GPU — and why it became AI's reference hardware.
It is not a gaming card in a tower PC. It is the unit cloud bills and SLAs often anchor to when they say "GPU hour." H100 is not magic — it became a shared reference because hardware, software, and hyperscaler catalogs aligned on it for a full training era.
Inference vs training for LLMs — who pays for what.
Training might run once (or for many hours) and you pay a cluster bill. Inference runs forever and turns a model into a per-token Opex line. This article separates the two checkbooks so pilot budgets are not mixed with product bills [1].
What is vLLM — and why production teams use it.
vLLM is an open inference engine for LLMs: scheduling, continuous batching, and KV memory designs such as [PagedAttention](/en/journal/what-is-pagedattention-llm-serving-2026). The point is not a thin API wrapper — it is raising useful throughput under real traffic [1].
Oman's Special AI Zone: From COMEX Stage to Royal Decree.
On April 29, 2026, Sultan Haitham bin Tarik signed Royal Decree 50/2026 — formally establishing the Special AI Zone in Muscat Governorate. In one signature, a COMEX announcement became enforceable law: approximately 104,000 square metres, three defined sectors, and a binding economic framework. This is what the decree means for companies that want to build now.
GPU power budgets in Gulf data centers.
PUE, kWh tariffs, and summer peaks belong in the capex memo next to NVIDIA list price.

Explore the hub

Private AI

Private deployment, sovereignty, infrastructure, and enterprise-grade serving.

Share this article

X (Twitter)LinkedIn WhatsApp

← Back to the JournalNuqta · Journal