The AI OS · Letter #51
September 27, 2025

The Chip War Explained (Without Melting Your Brain)

Why GPUs run your AI, why the U.S.–China “ship war” exists, and what it means for you.

The Chip War Explained (Without Melting Your Brain)

“Why does my AI assistant sometimes answer instantly…and other times it spins like a slow pizza?” my friend asked.

“Because sometimes it’s ‘chatty.’ Other times it’s thinking,” I said.

“Thinking?”

“Yep. When it thinks, it rents a small supercomputer for a few seconds.”

“From where?”

“From the middle of a global tug-of-war over the fastest chips on Earth.”

He blinked. “Okay…start at the beginning.”

Let’s do exactly that.

The Headlines You’ve Seen (and What They Really Mean)

US restricts advanced Nvidia GPUs to China.

Translation: The US is trying to slow China from training and running the biggest, brainiest AI systems by limiting total compute that reaches Chinese labs and data centers.

China’s labs make surprising gains anyway (hello, DeepSeek).

Translation: With fewer “fancy” chips, Chinese engineers squeezed more from what they had—going ultra low-level and rewriting parts of the software stack so the hardware acts smarter than it looks on paper.

TSMC sits in the middle of everything.

Translation: One company in Taiwan manufactures the most advanced chips for basically everyone. That’s a supply-chain choke point—and a geopolitical headache.

Cloud loopholes and smuggling stories.

Translation: If you cap chip exports, people rent them through clouds abroad or… get creative with shipping routes. GPUs are small, pricey, and easy to move. Think diamonds, not dishwashers.

The Chip War Explained (Without Melting Your Brain) — Charafeddine Mouzouni | Cohorte

Chips are also about “critical minerals”, and when it comes to this, they are quite rare…

The Chip War Explained (Without Melting Your Brain) — Charafeddine Mouzouni | Cohorte
Source: Global Critical Mineral Outlook 2024, International Energy Agency (2024).

Wait—Why All This Now?

Because AI went from autocomplete to actual reasoning.

Older models were like fast typists: quick, cheap, good enough for short replies. Newer “reasoning models” (think OpenAI o1, DeepSeek-R1) do something different at inference time (aka when you ask a question):

So if AI is going to plan trips, write code, prove theorems, and manage boring tasks for millions of people, we need a mountain of chips—not just to build the models, but to RUN them all day, every day.

That’s the fuel of this race.

The Hardware Moves (what changed on the chips)

Key idea: For today’s reasoning AI, memory capacity and how fast chips talk can matter as much as raw “speed.” We’re not only bench-pressing; we’re running a library and a switchboard.

The Chip War Explained (Without Melting Your Brain) — Charafeddine Mouzouni | Cohorte
Source: SemiAnalysis x Lennart Heim

The Transformer in Your Pocket (Yes, You Use One)

Let’s demystify the buzzwords—attention, KV-cache, context window—using a regular conversation.

  1. Context window = how much the model can “keep in mind.”
  2. If you paste a long doc and ask questions, that doc occupies context. Long answers also occupy context. More context = more memory used.
  3. Attention = everyone-at-the-party checks everyone else.
  4. Each new word the model writes asks, “Which previous words matter?” That cross-checking scales badly as things get longer.
  5. KV-cache = sticky notes the model keeps so it doesn’t re-read everything from scratch.
  6. It stores “keys” and “values” (summaries of prior tokens) in GPU memory so future tokens can reference them quickly.

Here’s the kicker:

The Quadratic Problem (The Part That Eats Compute)

If you double the total length (prompt + the model’s long chain of thought), the memory needed to hold those sticky notes doesn’t just double—it quadruples. That’s the dreaded quadratic growth.

Analogy: Imagine a pizza oven (the GPU). Short orders (simple chats) let you bake many pies at once. An order for a 2-meter wedding pizza (reasoning) hogs the oven so others wait—and your price per slice jumps.

How Smart Models Cheat the Bill (In a Good Way)

Two big efficiency tricks you’ll hear about:

DeepSeek’s magic: They leaned on MoE + MLA, and—when hardware interconnect was limited—they literally wrote custom low-level code (below typical libraries like NCCL) to schedule communication across GPUs. Translation: They squeezed every last drop from the chips they had.

Check out this video to learn more about MOE.

Back to the Geopolitics (aka The “Ship War”)

Why does Washington care so much?

Why does Beijing persevere?

Why does Taiwan matter so much?

What about those loopholes?

How This Touches Your Everyday AI :)

Key Takeaways (Put This in Your Notes App)

Practical Playbooks (Whether You’re a User, Builder, or Exec)

If you’re a casual user

If you build products

If you run a team or budget

Frequently Asked (in Plain English)

Q: Why do some answers cost $10?

Because the model wrote a lot to itself before talking to you (reasoning). Those internal “drafts” live in memory that scales quadratically with length.

Q: Are cheaper chips useless?

No. For many tasks, memory-rich “mid” chips with good interconnect (even with lower FLOPS) can be fantastic.

Q: Will this all just get cheap soon?

It’ll get cheaper per unit, but demand explodes faster (Jevons paradox). Net: we’ll do way more with AI, and total spend still rises.

Q: Why does everyone worship TSMC?

Because they can actually make the cutting-edge stuff at scale. Talent density + process discipline = magic.

A Quick, Friendly Glossary

Where This Goes Next (And How to Prepare)

Wrap Up

The chip war isn’t abstract. It shows up every time your AI takes a breath to “think.” Memory isn’t boring plumbing—it’s the price meter. And the global fight over who controls that meter will shape everything from your app’s UX to your quarterly budget.

You don’t need to fear the acronyms. You just need to know which ones move your bill. Now you do.

Until the next one,

— Charafeddine

New letters now publish on charafeddine.co

Read the latest letters