The AI OS · Letter #50
September 20, 2025

Which AI Should You Use Now?

I ran a 10-round test on the top 4 models. Here’s who won each task.

Which AI Should You Use Now?

A couple of months ago, I ran a head-to-head between Claude, ChatGPT, Gemini, and Grok.

A couple of months ago, in AI time, that’s a century…

You asked for an update—here it is.

I put GPT-5, Gemini, Grok, and Claude through a 10-prompt gauntlet—real work tasks, scored 1–10.

Results:

Below: the scoreboard, copy-paste prompts, and a “use-this-model-when” cheat sheet to help you pick the right model fast.

Let’s dive in.

The Setup (so you can judge the judges)

1) Website-in-Canvas (interactive comparison page)

Prompt: “Create a beautiful modern website inside Canvas comparing the top AI tools in an interactive way.”

Takeaway: If you want polished UI quickly, Claude wins. If you want correct tool selection out of the gate, Grok surprised me.

One of my outputs with Claude:

2) Vision + Reasoning

Q1: Which top view is the pyramid? (Correct: C)

Which AI Should You Use Now? — Charafeddine Mouzouni | Cohorte

Q2: How many cubes are there? (Correct: 9)

All four missed. I excluded this one from totals.

Which AI Should You Use Now? — Charafeddine Mouzouni | Cohorte

Takeaway: Vision/spatial reasoning is still volatile. Don’t trust one shot on diagram puzzles—cross-check.

3) Instruction Stress Test (six rules, no excuses)

Write

exactly three linesEach linefive wordsLowercase onlyNo word repeatsNo punctuationTopic:writing clear prompts

All four passed perfectly. 10/10 across the board.

Why that matters: Tight constraints + concrete format = consistent compliance.

4) Hallucination Test (the classic trap)

Q: “Who was the 19th U.S. President, and what was the name of his pet parrot?”

Q: “Tell me about the new blue pineapple found in Brazil.”

Takeaway: They’re improving at not making stuff up—until you get overly specific. For anything consequential, verify.

5) Real-World “How-To” Speed Test (Google Sheets)

Goal: Insert a row with a keyboard shortcut (Mac).

Takeaway: For quick, “do-this-now” answers, GPT-5 and Grok surface the shortest path first.

6) Forecasting Table (24-month revenue)

Prompt: Build a 24-month projection starting at zero customers.

The fix you should steal (copy/paste):

“Before answering, list

unknown variables

ask me

That one line turns fantasy tables into useful tools.

One of my outputs with Claude:

7) Coding + Visualization (generate a maze, animate shortest path)

One of my outputs with Claude:

8) Spreadsheet Formula Surgery

Task: Return “Jane Doe” from a blob in A2.

All four produced valid formulas (different approaches, all correct). 10/10 across models.

9) Everyday Math

They now call tools (calculators) under the hood, and it shows. 10/10 each.

10) Information Sorting (and a spicy follow-up)

Task 1: I pasted 7–8 pages of messy notes and asked for the top 10 prompt categories.

Task 2: “Score yourselves 0–10 across the 10 categories.”

Observation: Gemini was the only one that didn’t pick itself first. The humble one.

My Final Tally

My take beyond the test:

AI assistants are getting way better, but they’re still not truly intelligent. The simplest tell? They rarely ask good questions.

Good questions are a core sign of intelligence—they shrink uncertainty before acting. Today’s agents (especially lab-built ones) often plow ahead without clarifying, even when their own steps hint that key info is missing. As tasks get longer, that silence multiplies errors. Most of those “meh” results would vanish if the agent paused to ask.

Use-This-Model-When (bookmark this)

Key Takeaways (pin these)

Copy/Paste Prompt Pack

No-Guessing Policy

“If any required input is missing,

do not assume

Zero-Hallucination Guardrail

“If the answer is unverified or unknown, say

‘No verified info’

Do not

UI Build Spec

“Build a minimal, modern UI

in-canvas

real tool names

working links only

Response Format Contract

“Return: (1) assumptions list, (2) solution, (3) self-check against the original instructions in bullets.”

Parting Shot

AI is finally getting good at not lying to you. But it still loves to assume, and never asks good questions.

Treat it like a very smart intern with a strong opinion and a short attention span: give a clear spec, make it ask questions, verify the weird stuff. Do that, and any of these models can become a profit center instead of a toy.

If you want me to add tests (agents, longer codebases, research workflows), hit reply with your top two. I’ll stack them into the next round.

— Charafeddine

New letters now publish on charafeddine.co

Read the latest letters