Comparing AI Image Recognition

GPT-4o vs. Gemini vs. Claude 3.5 Sonnet on Image Recognition

I compared 3 LLMs on image recognition: GPT-4o, Gemini, and Claude 3.5 Sonnet. I gave each model a picture of my bookshelf and asked them to identify the books. Here’s an overview of the results:

📚 Books identified: How many books of the 20 shown did it correctly identify?

Correct Guesses: How many books did it correctly guess

Incorrect Guesses: How many books did it guess incorrectly

🧐 Attention Score: How many books did it identify in a row before it started hallucinating

It’s useful to track correct and incorrect guesses because some models are bolder than others, and some are more OK with hallucinating. For example, Llava gets zeros across the board because it refused to guess. It wanted a higher-quality photo.

Detailed Results

ChatGPT:

Gemini:

Claude:

Claude did not add any new books to the list after I asked for the second shelf.

In general, Claude 3.5 seems very powerful. I think Claude 3 Sonnet was already equivalent with GPT-4o, but I don’t think y’all are ready for that conversation. It seems behind on vision, though.

Gemini did not do well here, but it did perform very well in a separate test where I used video. Still waiting on other models to support video.

Recent Posts