Not a subscriber?
Join 6,000+ getting a unique perspective every Saturday on growing their internet business with actionable AI workflows, systems, and insights.
You're in! Check your email
Oops! Something went wrong while submitting the form 🤔
March 29, 2025

I Compared ChatGPT, Gemini, and Grok—Here’s What Each One Actually Does Best

Once upon a time, we spent 6 hours in Photoshop just to make a cat look surprised.

That was 2024.

Now?

I can just tell an AI, “Give this cat the face of someone who just saw their code deleted,” and get three memes in 15 seconds.

That’s INCREDIBLE.

It’s a shift in how we create, think, and build.

I've spent a big chunk of this week playing with these AI generation tools with my kids.

I've turned many of our warm family memories into images with that soft Ghibli glow.

I'm thinking of turning this into a short Ghibli-style film for my kids. They loved this!

Anyway...

Today, I’ll walk you through a breakdown of Google’s Gemini, xAI’s Grok, and OpenAI’s ChatGPT-4o — three of the most powerful tools in image generation and editing today.

But I’m not just comparing features.

I’ll show you what matters:

How these tools fit into your workflow.

Which one is right for what kind of creator.

And where the opportunity is hiding.

AI Image Editing Showdown: The 3 Heavyweights

→ Google Gemini (AI Studio)

→ xAI’s Grok (Aurora model)

→ ChatGPT-4o (OpenAI’s new all-in-one model)

These aren’t just prompt-to-image tools.

They’re collaborative visual assistants and that’s the big difference in comparison to Midjourney for example.

Midjourney is a very powerful image generation (prompt-to-image) tool. However, control over generation is very limited, and edits are very complex (check out my old deep dive on Midjourney here).

Are they all the same?

No.

For me, each image editing tool plays to a different kind of user.

Let’s break it down.

Which One’s for You?

Tool Best For Strengths Limitations
Gemini Developers + Builders API-first, multimodal, super fast Experimental UI, dev environment
Grok Social-first Creators + Memers Dead simple UX, photoreal edits One-shot prompts, limited control
ChatGPT-4o All-purpose Creators & Teams Natural convo flow, precision editing API not open (yet), usage limits

What Changed and Why You Should Care?

The major shift is related to two main breakthroughs that set this apart from the "usual" image generation AI tools:

  • The ability to talk to visuals, ask for edits, combinations of images, tweaks and style application, etc.
  • The ability to generate text in images with high accuracy
  • The ability to generate diagrams and complex workflows in the image and adjust them

That's huge.

You don't need layers, masks, or design tools.

You need language.

You need imagination.

And a bit of promptcraft.

Let’s Meet the Contenders

Google Gemini (1.5 Flash & 2.0 Flash)

A tool built for builders.

Runs in Google AI Studio — a playground for devs. You chat with Gemini, give it images, audio, text — it understands all of it (check out my previous letter).

Gemini 2.0 Flash now creates images natively.

And it does so with multi-turn memory.

Example:

“Generate a photo of a horse.”

“Now make it black and white, in a field of yellow flowers.”

Gemini remembers, and edits.

But: it’s currently locked in a developer console. Not quite a plug-and-play app. And the "edit memory" is very limited so far.

If you’re building a product or need AI that works across media types, Gemini is your stack.

xAI Grok + Aurora

This one’s for the chaos agents.

Grok lives inside X (Twitter) — but accessible outside of X. You hit “Edit Image,” upload something, and type what you want. Done.

Simple. Fast. Surprisingly photoreal.

“Generate a sunset image”

“Add a horse and make this sunset feel like a happy ending.”

Result? Warm tones, glowing light.

It feels like Instagram filters on steroids — no technical knowledge needed.

Drawback: no step-by-step edits. No (or poor) memory — so far (it’s going so fast). No way to select parts. If it messes up, you try again.

But if you’re creating viral memes or visual riffs?

Grok is a social weapon.

OpenAI ChatGPT-4o

This is Photoshop via chat — not 100% true, but seriously close.

Upload an image.

Generate one from a prompt.

Click to edit. Draw a box. Describe your change.

It remembers everything and keeps refining. Honestly the outputs are incredibly accurate.

There are many examples currently being shared on the internet. It just made creating ads a fun play. I’ll go from this Apple ad.

“Replace the tagline with ‘AI for Everyone’”

“Now replace the logo with this one (attached). Use an “Inter” font for the text. Remove the website “apple.com" and replace the image with a very productive person using AI . Keep the same vibe and style.”

Hey, I just created a professional ad in under a minute!

You can go on...

“Make the logo bigger.”

“Add a blue outline.”

“Now place a cat mascot next to it.”

Done, done, done.

You can also just talk to it — no clicks needed.

Want to change a vibe or mood? Ask.

Want infographic-style text overlays? It nails that too.

Right now, it’s the most accessible, powerful and controllable option for creators who don’t code.

UX & Workflow: How It Feels To Use

Here’s how each one fits into daily creative work.

Google Gemini

Feels like talking to a smart assistant inside your app or project. Or you can use it in Google Studio: testing platform for devs.

Works best when paired with:

  • A dev project (e.g., build an app that auto-generates visuals)
  • A CMS (e.g., auto-generate images from article metadata)
  • Custom pipelines (you control the backend)

It’s a builder’s Lego set. But not friendly for casuals yet.

Grok

You’re scrolling X. See a funny photo.

Tap “Edit Image.”

Type: “Put clown makeup on the person.”

Boom. It works (usually).

No sign-up, no software, no learning curve.

It’s great for:

  • Fast content remixing
  • Reactive posts during trends
  • Meme-making and punchy social commentary

But again — one-shot edits only. No memory, no selections.

ChatGPT-4o

Feels like brainstorming with a designer… who executes instantly.

(I used to pay $20-60/hour for my designers)

Upload a draft.

Say: “Change the background to dark blue.”

Then: “Add a spotlight effect.”

Then: “Make the text pop more.”

You keep iterating without restarting.

Great for:

  • Brainstorming design ideas
  • Landing page graphics
  • Marketing visuals
  • Prototyping logos, infographics, product shots
  • Mixed media workflows (copy + image)
  • Almost everything...

ChatGPT-4o is the only image editing AI that has a conversational (image and design) memory and "astonishing" capabilities, to be very honest.

It’s the most polished experience for creators who think in words and ideas, not pixels.

However, it's very very slow so far...

Testing the Tools

I ran all three through a set of real-world tasks.

Test 1: Photorealism & Detail

Prompt A: “Busy urban market at sunset. Street vendors. Neon lights.”

Follow-up: “Add a neon sign that says ‘Open 24/7.’”

Prompt B: “Modern smartphone on a desk with reflections.”

Follow-up: “Emphasize metallic finish, add shadow.”

🧪 Results:

Gemini
  • Prompt A:

Weird characters.

  • Follow-up:

Maintains the highest fidelity between the original image and follow-up edit in this complex setting. Unlike other tools that introduce unintended changes during editing.

  • Prompt B:
  • Follow-up:

Weird reflection on the table.

Grok
  • Prompt A:

Weird faces.

  • Follow-up:

Not very realistic.

  • Prompt B:

Not bad. The reflections look convincingly realistic (though the phone's positioning defies physics 😄)

  • Follow-up:

Metallic reflections on the phone are convincingly realistic but it seems the request was interpreted too literally / “naïvely”.

ChatGPT 4o
  • Prompt A:

The faces look strange, but the overall composition appears more polished.

  • Follow-up:

Not bad globally. The overall aspect is good, but we can see the "details" limitations (better handled by Midjourney, for example)

  • Prompt B:

Highly accurate.

  • Follow-up:

💡 Over-all Qualitative Comparison:

  • All models struggle with details in complex settings with multiple characters, particularly Gemini
  • Gemini is the fastest
  • Gemini maintains the highest fidelity when editing images
  • ChatGPT 4o produces the most polished results (globally) compared to other models
  • Grok has difficulty interpreting slightly ambiguous requests
  • ChatGPT has a significant issue with completing images, often stopping midway through generation (you will notice this in many images here). This will likely be resolved in future updates.

After 20 tests like this… this is how I would synthesize my observations:

Test 2: Creative Control

Prompt A: “Cartoon robot with a heart in a colorful world.”

Follow-up: “Change robot to blue/green. Add speech bubble: ‘Hello, Future!’”

Prompt B: “Minimalist mug on white background.”

Follow-up: “Replace design with black ‘M’. Add subtle shadow.”

🧪 Results:

Gemini
  • Prompt A:
  • Follow-up:
  • Prompt B:
  • Follow-up:
Grok
  • Prompt A:
  • Follow-up:
  • Prompt B:
  • Follow-up:

High realism!

ChatGPT 4o
  • Prompt A:
  • Follow-up:
  • Prompt B:
  • Follow-up:

💡 Over-all Qualitative Comparison:

  • Grok offers the best balance between accuracy and speed
  • While not evident in this example, ChatGPT-4o demonstrates exceptional understanding and precision when creating stylized effects and elements.
  • ChatGPT ranks highly on the Leaderboard because of its powerful image editing capabilities, including various style options like Ghibli-style rendering.
  • Though Grok shows better accuracy in this particular test, ChatGPT excels at image editing and style manipulation.

Recall the ad example below? When I tried using Grok, here's what I got — a complete mess.

Overall synthesis of my observations after several tests:

Test 3: Speed

Overall synthesis of my observations after several tests: ChatGPT is noticeably slower compared to other tools, while Gemini Flash demonstrates remarkable speed.

Test 4: Marketing Content

Prompt A: “Header image for tech landing page. Modern, trustworthy. Tech graphics.”

Follow-up: “Add semi-transparent overlay for text.”

Prompt B: “Trendy coffee shop for local ad.”

Follow-up: “Add handwritten ‘Grand Opening’ banner.”

🧪 Results:

Gemini
  • Prompt A:
  • Follow-up:

I usually got awkward outputs for this one...

  • Prompt B:
  • Follow-up:
Grok
  • Prompt A:

Grok just kept giving me images like this. Despite trying multiple prompts, I wasn't able to get the results I wanted. For some reason, I kept getting this type of image in all my attempts. Try it and let me know.

  • Prompt B:

Same for Prompt B…

ChatGPT 4o
  • Prompt A:

Brilliant.

  • Follow-up:
  • Prompt B:

Wow. This used to cost a lot of money and time. The results you can achieve with a simple prompt are remarkable.

  • Follow-up:

Overall synthesis of my observations after several tests:

What Should You Use?

Here’s a quick framework:

Need Tool
API access & custom workflows Google Gemini
Fast edits in social content xAI Grok
Edgy, unrestricted content Grok
Controlled multi-step visuals ChatGPT-4o
Everything else (if you’re patient) ChatGPT-4o

Where This Is Going

This shift is bigger than tools.

It’s about how we create.

Here’s what I see happening:

1. Creation and editing merge.

No more generate → export → edit. You just keep evolving the image.

2. Design gets democratized.

Non-designers now make publish-worthy visuals. Power shifts to those with vision, not tools.

3. AI agents take over workflows.

You’ll say “Make a product campaign” and get images, copy, layout, maybe even video.

4. New tools, new businesses.

Smart builders will wrap these models into custom apps. Think niche image editors, auto-meme bots, visual storytellers.

5. Ethics and quality matter more.

Expect questions around deepfakes, bias, and copyright. You’ll need good judgment—and maybe a watermark.

Final Thought

We're living in an era where a single person, armed with AI tools like these, can build and run what used to require an entire company. The "one-person billion-dollar company" isn't science fiction anymore—it's becoming a real possibility.

Think about it: You can now generate professional visuals, write compelling copy, design interfaces, and automate workflows, all from your laptop. What once required a team of designers, copywriters, and developers can now be accomplished by one creative mind with the right AI tools.

Remember:

You’re not here to compete with AI.

You’re here to create with it.

Stay sharp,

Until the next one,

— Charafeddine

Share this article on: