Not a subscriber?

Join 8,000+ getting a unique perspective every Saturday on growing their internet business with actionable AI workflows, systems, and insights.

You're in! Check your email

Oops! Something went wrong while submitting the form 🤔

March 29, 2025

I Compared ChatGPT, Gemini, and Grok—Here’s What Each One Actually Does Best

Once upon a time, we spent 6 hours in Photoshop just to make a cat look surprised.

That was 2024.

Now?

I can just tell an AI, “Give this cat the face of someone who just saw their code deleted,” and get three memes in 15 seconds.

That’s INCREDIBLE.

It’s a shift in how we create, think, and build.

I've spent a big chunk of this week playing with these AI generation tools with my kids.

I've turned many of our warm family memories into images with that soft Ghibli glow.

I'm thinking of turning this into a short Ghibli-style film for my kids. They loved this!

Anyway...

Today, I’ll walk you through a breakdown of Google’s Gemini, xAI’s Grok, and OpenAI’s ChatGPT-4o — three of the most powerful tools in image generation and editing today.

But I’m not just comparing features.

I’ll show you what matters:

How these tools fit into your workflow.

Which one is right for what kind of creator.

And where the opportunity is hiding.

AI Image Editing Showdown: The 3 Heavyweights

→ Google Gemini (AI Studio)

→ xAI’s Grok (Aurora model)

→ ChatGPT-4o (OpenAI’s new all-in-one model)

These aren’t just prompt-to-image tools.

They’re collaborative visual assistants and that’s the big difference in comparison to Midjourney for example.

Midjourney is a very powerful image generation (prompt-to-image) tool. However, control over generation is very limited, and edits are very complex (check out my old deep dive on Midjourney here).

Are they all the same?

No.

For me, each image editing tool plays to a different kind of user.

Let’s break it down.

Which One’s for You?

Tool	Best For	Strengths	Limitations
Gemini	Developers + Builders	API-first, multimodal, super fast	Experimental UI, dev environment
Grok	Social-first Creators + Memers	Dead simple UX, photoreal edits	One-shot prompts, limited control
ChatGPT-4o	All-purpose Creators & Teams	Natural convo flow, precision editing	API not open (yet), usage limits

What Changed and Why You Should Care?

The major shift is related to two main breakthroughs that set this apart from the "usual" image generation AI tools:

The ability to talk to visuals, ask for edits, combinations of images, tweaks and style application, etc.
The ability to generate text in images with high accuracy
The ability to generate diagrams and complex workflows in the image and adjust them

That's huge.

You don't need layers, masks, or design tools.

You need language.

You need imagination.

And a bit of promptcraft.

Let’s Meet the Contenders

Google Gemini (1.5 Flash & 2.0 Flash)

A tool built for builders.

Runs in Google AI Studio — a playground for devs. You chat with Gemini, give it images, audio, text — it understands all of it (check out my previous letter).

Gemini 2.0 Flash now creates images natively.

And it does so with multi-turn memory.

Example:

“Generate a photo of a horse.”

“Now make it black and white, in a field of yellow flowers.”

Gemini remembers, and edits.

But: it’s currently locked in a developer console. Not quite a plug-and-play app. And the "edit memory" is very limited so far.

If you’re building a product or need AI that works across media types, Gemini is your stack.

xAI Grok + Aurora

This one’s for the chaos agents.

Grok lives inside X (Twitter) — but accessible outside of X. You hit “Edit Image,” upload something, and type what you want. Done.

Simple. Fast. Surprisingly photoreal.

“Generate a sunset image”

“Add a horse and make this sunset feel like a happy ending.”

Result? Warm tones, glowing light.

It feels like Instagram filters on steroids — no technical knowledge needed.

Drawback: no step-by-step edits. No (or poor) memory — so far (it’s going so fast). No way to select parts. If it messes up, you try again.

But if you’re creating viral memes or visual riffs?

Grok is a social weapon.

OpenAI ChatGPT-4o

This is Photoshop via chat — not 100% true, but seriously close.

Upload an image.

Generate one from a prompt.

Click to edit. Draw a box. Describe your change.

It remembers everything and keeps refining. Honestly the outputs are incredibly accurate.

There are many examples currently being shared on the internet. It just made creating ads a fun play. I’ll go from this Apple ad.

“Replace the tagline with ‘AI for Everyone’”

“Now replace the logo with this one (attached). Use an “Inter” font for the text. Remove the website “apple.com" and replace the image with a very productive person using AI . Keep the same vibe and style.”

Hey, I just created a professional ad in under a minute!

You can go on...

“Make the logo bigger.”

“Add a blue outline.”

“Now place a cat mascot next to it.”

Done, done, done.

You can also just talk to it — no clicks needed.

Want to change a vibe or mood? Ask.

Want infographic-style text overlays? It nails that too.

Right now, it’s the most accessible, powerful and controllable option for creators who don’t code.

UX & Workflow: How It Feels To Use

Here’s how each one fits into daily creative work.

Google Gemini

Feels like talking to a smart assistant inside your app or project. Or you can use it in Google Studio: testing platform for devs.

Works best when paired with:

A dev project (e.g., build an app that auto-generates visuals)
A CMS (e.g., auto-generate images from article metadata)
Custom pipelines (you control the backend)

It’s a builder’s Lego set. But not friendly for casuals yet.

Grok

You’re scrolling X. See a funny photo.

Tap “Edit Image.”

Type: “Put clown makeup on the person.”

Boom. It works (usually).

No sign-up, no software, no learning curve.

It’s great for:

Fast content remixing
Reactive posts during trends
Meme-making and punchy social commentary

But again — one-shot edits only. No memory, no selections.

ChatGPT-4o

Feels like brainstorming with a designer… who executes instantly.

(I used to pay $20-60/hour for my designers)

Upload a draft.

Say: “Change the background to dark blue.”

Then: “Add a spotlight effect.”

Then: “Make the text pop more.”

You keep iterating without restarting.

Great for:

Brainstorming design ideas
Landing page graphics
Marketing visuals
Prototyping logos, infographics, product shots
Mixed media workflows (copy + image)
Almost everything...

ChatGPT-4o is the only image editing AI that has a conversational (image and design) memory and "astonishing" capabilities, to be very honest.

It’s the most polished experience for creators who think in words and ideas, not pixels.

However, it's very very slow so far...

Testing the Tools

I ran all three through a set of real-world tasks.

Test 1: Photorealism & Detail

Prompt A: “Busy urban market at sunset. Street vendors. Neon lights.”

Follow-up: “Add a neon sign that says ‘Open 24/7.’”

Prompt B: “Modern smartphone on a desk with reflections.”

Follow-up: “Emphasize metallic finish, add shadow.”

🧪 Results:

Gemini

Prompt A:

Weird characters.

Follow-up:

Maintains the highest fidelity between the original image and follow-up edit in this complex setting. Unlike other tools that introduce unintended changes during editing.

Prompt B:

Follow-up:

Weird reflection on the table.

Grok

Prompt A:

Weird faces.

Follow-up:

Not very realistic.

Prompt B:

Not bad. The reflections look convincingly realistic (though the phone's positioning defies physics 😄)

Follow-up:

Metallic reflections on the phone are convincingly realistic but it seems the request was interpreted too literally / “naïvely”.

ChatGPT 4o

Prompt A:

The faces look strange, but the overall composition appears more polished.

Follow-up:

Not bad globally. The overall aspect is good, but we can see the "details" limitations (better handled by Midjourney, for example)

Prompt B:

Highly accurate.

Follow-up:

💡 Over-all Qualitative Comparison:

All models struggle with details in complex settings with multiple characters, particularly Gemini
Gemini is the fastest
Gemini maintains the highest fidelity when editing images
ChatGPT 4o produces the most polished results (globally) compared to other models
Grok has difficulty interpreting slightly ambiguous requests
ChatGPT has a significant issue with completing images, often stopping midway through generation (you will notice this in many images here). This will likely be resolved in future updates.

After 20 tests like this… this is how I would synthesize my observations: