Filmora
Filmora - AI Video Editor
Edit Faster, Smarter and Easier!
OPEN
Filmora Video Editor
Effortlessly create video with AI.
  • Various AI editing tools to increase your video creation efficiency.
  • Offer popular templates and royalty-free creative resources.
  • Cross-platform functionality for editing everywhere.

A Thorough Evaluation of Claude 3, ChatGPT, and Gemini

Gianni
Gianni Originally published Mar 18, 24, updated Dec 10, 24

Not long after ChatGPT went live over eighteen months ago, a slew of other chatbots hit the market. But not all of these AI models have been helpful. Claude stands head and shoulders above the competition, right beside Gemini and ChatGPT.

The Claude 3 model family is a new set of AI models that Anthropic unveiled recently. Opus, Sonnet, and Haiku are the three types of models offered by the developer, as is typical. In terms of price, pace, and intelligence, each model is unique.

Generative AI experts today can't resist comparing Claude 3 to all the top AI systems. The most famous models from OpenAI and Google are both surpassed by Claude's Opus.

To help you choose the right tool, we have included a detailed comparison of the three chatbots.

Key Takeaways:

  • Claude 3 demonstrated better performance in comprehension, logic, and technical coding help tasks when compared with Gemini and ChatGPT. Because of its intelligence and adaptability, the Opus model stood apart.
  • Various test situations revealed the strengths and shortcomings of each AI model. Claude 3 struggled to solve specific mathematical problems. But it was great at other things, including following directions and writing descriptions.
  • When it came to producing text in a variety of forms, Gemini and GPT-4 showed remarkable speed. They performed well across the board, but particularly in basic text-generating activities.
  • Tough queries requiring thinking or context awareness were beyond the capability of all models. They offered missing information or answers that were inaccurate in a few instances.
  • Users looking for AI's help in video editing jobs may find Filmora's chatbot - AI Copilot, a viable option.
In this article
  1. Part 2. A Side-by-Side Evaluation of ChatGPT-4, Gemini, and Claude 3
  2. Part 3. Claude, ChatGPT, or Gemini - Who Comes Out on Top After Tests?
  3. Conclusion

A Thorough Comparison Between Claude3, GPT4, and Gemini

We have pitted ChatGPT, Gemini, and Claude against one another. Our questions tested their ability to conduct practical tasks for the business. It includes extracting information from documents and sending emails.

On seven of the tests, we discovered that Claude gave three correct answers, while ChatGPT and Gemini won on some. Thus, Claude came out on top in this matchup since the last one ended in a tie.

Below are all the queries we asked the three chatbots.

1) Writing Descriptions of Products

Writing original descriptions for your items can be a huge pain if you own an online shop or sell a lot of stuff online. A general kid's toy was the product we wanted to describe. Thus, we requested the help of ChatGPT, Gemini, and Claude. Let us tell you how Claude fared:

We had to request a somewhat lengthy explanation from Claude. That's because we haven't encountered it writing several product descriptions as ChatGPT. Finally, it succeeded well; the writing is captivating, and the sentence structure is superb.

In comparison to ChatGPT and Gemini, Claude produces superior product descriptions. It sounds much more personable, much like its ethical argument. You would need to alter Claude-generated product descriptions, much less if you were using these two programs to create them in bulk.

  • Test Prompt: Create a unique 50-word product description for me. I sell kid's toys online. I have a huge collection of toys in varying qualities and prices.
  • Claude 3's Response
product description writing test claude
  • Gemini's Response
product description writing test gemini
  • ChatGPT's Response
product description writing test chatgpt
  • Winner: Claude

2) Calculating the Accurate Duration

In this test, we aim to fool AI algorithms to detect any sign of intelligence. Unfortunately, Claude 3 Opus, like Gemini, does not pass the test. The question is tough, so think wisely; we incorporated that in the system prompt as well. Even so, the Opus model got its maths all wrong.

Likewise, GPT-4 erred on this test as well. Moreover, it produced inconsistent results. Even after correcting our prompt, GPT-4 continued to provide incorrect results when we reran the identical query this morning.

  • Test Prompt: It took one hour to dry 25 shirts. So, how long would it take to air-dry five shirts in sunlight?
  • Claude 3's Response
duration calculation test claude
  • Gemini's Response
duration calculation test gemini
  • ChatGPT's Response
duration calculation test chatgpt
  • Winner: None

3) Figure Out a Math Problem

Our next inquiry sought a mathematical solution from the Claude 3 Opus model that did not involve computing the whole number. Still no success. We encountered incorrect results to varying extents every time we executed the prompt. Claude 3 Opus topped the math benchmark, beating off competitors like GPT-4 and Gemini.

Applying such prompts to the Claude 3 Opus model appears to provide better outcomes. At this time, GPT-4 and Gemini provided the right response when asked with such prompting.

  • Test Prompt: 132*321 has a tens digit (A) and a unit digit (B); find the value of A + B. Can you provide the simplest solution?
  • Claude 3's Response
math problem test claude
  • Gemini's Response
math problem test gemini
  • ChatGPT's Response
math problem test chatgpt
  • Winner: Gemini and GPT-4

4) The Orange Counting Test

Let's try the prominent orange assessment that tests LLMs' thinking abilities. The Claude 3 Opus model got this question right; you now have five oranges. But we had to add that you are a brilliant helper with a knack for advanced thinking to a system prompt to get the right answer. Opus was providing an inaccurate result in the absence of the system prompt. Gemini and GPT-4, like in our previous testing, provided accurate results.

Test Prompt: I ate one orange yesterday, and I now have five oranges. So, how many oranges do I have now?

  • Claude 3's Response
orange counting test claude
  • Gemini's Response
orange counting test gemini
  • ChatGPT's Response
orange counting test chatgpt
  • Winner: GPT-4, Gemini, and Claude 3 Opus.

5) Weight Calculation

The next thing we did was to have each of the three AI models tell us if one pound of potatoes is heavier than one kilogram of tomatoes. Well, Claude 3 Opus was incorrect. The GPT-4 and Gemini AI models gave accurate responses.

Due to the fact that a kilogram is about 2.2 times heavier than a pound, tomatoes will have a higher weight per kilogram than potatoes per pound.

  • Test Prompt:A pound of potatoes weighs more than a kilogram of tomatoes.
  • Claude 3's Response
weight calculation test claude
  • Gemini's Response
weight calculation test gemini
  • ChatGPT's Response
weight calculation test chatgpt
  • Winners: Chatgpt and Gemini

6) Adhere to the Provided Directions

The Claude 3 Opus model does an excellent job of obeying the user's commands. It has essentially supplanted all existing AI models. We gave the command to come up with five sentences that conclude with the word "chocolate." It produced three completely reasonable phrases that do just that.

When compared, GPT-4 managed to produce some phrases. Gemini is at the bottom of the heap, failing miserably to make even three of them.

Thus, Claude 3 Opus is a reliable AI model if your job requires it to adhere strictly to user instructions.

  • Test Prompt: Create five phrases that conclude with the word "chocolate."
  • Claude 3's Response
following user directions test claude
  • Gemini's Response
following user directions test gemini
  • ChatGPT's Response
following user directions test chatgpt
  • Winner: Claude 3 Opus

7) Offering Personal Advice

We intended to test ChatGPT, Gemini, and Claude's reactions to a unique scenario. We requested to advise an individual struggling with mental health issues. As these technologies grow increasingly ingrained in our lives, they should react to our demands suitably.

The responses provided by all chatbots are excellent. Their suggestions started by reassuring the users that their emotions were genuine. Thus, it's impossible to find fault with them.

The methods recommended by each chatbot were also identical. They were the same as any kind human would recommend to a buddy having trouble with the problems listed in the prompt.

  • Test Prompt:Lately, I've been dealing with so many mental health issues and feel lonely. Is there anything you would tell someone in this predicament?
  • Claude 3's Response
personal advice test claude
  • Gemini's Response
personal advice test gemini
  • ChatGPT's Response
personal advice test chatgpt
  • Winner: Tie

A Side-by-Side Evaluation of ChatGPT-4, Gemini, and Claude 3

Pointer Claude 3 Gemini ChatGPT
Company Anthropic AI Google AI OpenAI
Release Time March 4th, 2024 2022 (First release) 2020
Platform Cloud-based Cloud-based Cloud-based
Price Subscription-based Free and paid versions available Free and paid versions available
Visual Input Image input supported Image input supported No
Pros - High maximum context length
- Excellent benchmark performance
- Early release
- Excellent visual understanding
- Consistently improving
- Reasoning and understanding capabilities
Cons - Requires subscription
- Potentially slower free version than others
- Limited maximum context length
- Limited public information
- No visual input supports
- Limited access (controlled)

Claude, ChatGPT, or Gemini - Who Comes Out on Top After Tests?

They are all huge LLMs that are cutting-edge when it comes to artificial intelligence. Their comparison is as follows:

Claude 3 shines in reasoning-based activities and visual interpretation tasks like graphs and charts. One possible downside is its speed, which may be slower than alternatives like Gemini and GPT-4, particularly in their free versions.

The OpenAI GPT-4 is a fast text generator. Because of some restrictions, there is less information available about its capabilities.

When dealing with code or factual language, Gemini is an excellent choice. The most recent version, Gemini Ultra, may not be doing well on some benchmarks.

1) Coding Performance:

Claude 3's primary function is to ease general writing tasks. It offers some help in coding assignments. It can help with code completion, error detection, and syntax recommendations.

Launched with the intention of becoming a code creation tool, Gemini has now grown in scope. It provides passable coding speed, whereas Claude 3 offers more depth and specialization.

While not intended for coding jobs in particular, ChatGPT can help with questions about coding. Even though it isn't as efficient as Claude 3 or Gemini, it may provide general coding help.

2) Expertise Level:

If you need help with finishing your code, troubleshooting, or advice, Claude 3 is the one to call.

Gemini can handle a wide range of text-generating jobs.

ChatGPT can generate text, have conversations, answer questions, and much more. It doesn't have the same concentration on coding jobs as Claude 3 and, to a lesser degree, Gemini.

3) Response to Prompts:

Because of its expertise in coding-related inquiries, Claude 3 may provide efficient and rapid answers, depending on the difficulty of the coding work.

The difficulty of the work at hand determines Gemini's expected response time. Its efficiency in this domain is more versatile.

The amount of time it takes to respond depends on the query's intricacy and the system's current load. It isn't as well-suited to coding jobs as Claude 3 because of its algorithms.

4) Availability and Price

Claude gives a free version with restrictions. People may not be able to afford to access it as it requires membership for premium features.

Based on the individual's budget and requirements, Gemini provides free and paid programs.

ChatGPT is suitable for customers with different budgets, offering free and paid options. However, a membership is necessary to access premium services.

5) Restrictions and Ethical Aspects:

Concerns about data privacy, inaccuracy in results, and possible abuse of technology are present in all three models. Each of the three models relies on the correct management of private data and the assurance of fair results.

Though they excel at text-based activities and can grasp input, Claude 3, Gemini, and ChatGPT can't handle video inputs anymore. Visual information, such as video frames or footage, is beyond the capabilities of these models since they mainly work with textual data. Thus, they wouldn't be very helpful to users who are trying to edit videos.

Don't worry! You have the option of using Filmora's AI Copilot chatbot, which is tailored to assist users in creating video content. With this feature, users get access to various capabilities designed to streamline the video editing process. Moreover, AI Copilot can examine video footage, understand editing needs, and provide pertinent recommendations.

In short, it is video-specific and provides a unique solution to the demands of people who make videos. Check out the following video introducing AI Copilot.

AI Copilot Editing - New Smart Feature in Filmora 13

Free Download
Free Download

Final Thoughts

By comparing Claude 3, Gemini, and ChatGPT, we can see how each model excels and where it falls short. In contrast to Gemini's adaptability across various text-based applications, Claude 3's specific capabilities make it an ideal choice for coding-related jobs. Conversely, Claude 3 stands out because of its specialty, whereas ChatGPT shines out due to its extensive range of features.

However, the inability to handle video as input is a common denominator throughout all three models, making them useless for video editing tasks.

The AI Copilot chatbot from Filmora is a great solution for users who want AI's help while making videos. This tool provides personalized support by suggesting and executing various actions to expedite the editing process. So, get Filmora to try AI Copilot now!

Gianni
Gianni Dec 10, 24
Share article: