OpenAI recently released o1 and o1 pro in their 12 Days of OpenAI – Live updates, offering unlimited access through a $200 ChatGPT Pro subscription. With much speculation surrounding their capabilities, I wondered – Is this premium subscription worth the investment? To answer this, I pitted these two AI models against each other in six challenging tasks. This article explores their strengths, weaknesses, and overall performance. By the end, you’ll have a clear understanding of whether the $200 ChatGPT Pro subscription is the right choice for you or not!
Challenge 1. Solving a Zebra Problem
As the first test of o1 vs o1 pro, I am taking a zebra problem – hard level from this website. Let’s see which one cracks it better!
Prompt:
Solve this zebra problem:
o1 Resonse:
Putting this response as the solution, it turns out to be incorrect:
o1 pro Response:
Putting this resonse as the solution, it turns out to be correct:
Observation:
o1 pro took much more time than o1 to respond. o1 fails to solve the problem, whereas o1 pro succeds!
Verdict:
o1 ❌ | o1 pro ✅
Challenge 2: Find Difference
Prompt:
Find 3 differences in the two images:
o1 Response:
- Item in hand: In the left image, she’s holding a spoon. In the right image, she’s holding an ice cream cone instead.
- Hair accessory: In the left image, her hair is plain. In the right image, she is wearing a small pink bow in her hair.
- Background object: In the left image, there is a green sewing machine on the shelf behind her. In the right image, the sewing machine is replaced by a simple green box (or frame).
Observation:
Only the second difference is correct which is the hair accessory missing, rest 2 are incorrect.
o1 pro Response:
- The girl’s hairstyle accessory: In the right image, she has a pink bow in her hair, which is not present in the left image.
- What she’s holding in her hand: In the left image, she’s holding a spoon. In the right image, she’s holding an ice cream cone instead.
- The color of the middle scoop in the ice cream bowl: In the left image, the middle scoop is pink, while in the right image, that scoop has turned white.
Observation:
Only the first difference is correct. Rest 2 are incorrect.
Both o1 and o1 pro were not able to respond correctlty. However, o1 was faster in generating the response.
Verdict:
o1 ❌ | o1 pro ❌
Challenge 3: Solve a Math Problem
In this challenge, I will be giving a computing Indefinite Integrals problem to the models. Let’s see which one is able to solve it!
Prompt:
Solve this math problem:
o1 and o1 pro Response (Same):
Observation:
Both the models provided the correct answer but the o1 was much faster than o1 pro in finding the solution.
Verdict:
o1 ✅ | o1 pro ✅
Challenge 4: Analyzing Article
Prompt:
Read the article – https://www.analyticsvidhya.com/blog/2024/07/building-agentic-rag-systems-with-langgraph/ to understand the process of building a vector database for Wikipedia data. Summarize the key steps in a concise manner.
o1 Response:
o1 Pro Response:
Observation:
The “o1 pro response” is closer to the actual implementation in the article. Here’s why:
The article provides a much more detailed, step-by-step implementation involving:
- Using specific libraries like LangChain and OpenAI embeddings
- Loading Wikipedia data from a specific archive
- Using Chroma as the vector database
- Implementing advanced RAG components like:
- Query rephrasing
- Document relevance grading
- Web search integration
- A complex LangGraph workflow
The o1 pro response captures more nuance by mentioning:
- Specific embedding models (sentence-transformers)
- Vector database options
- Metadata storage
- Testing retrieval
- Integration with a RAG pipeline
By contrast, the initial “o1 response” is more generic and lacks the technical depth demonstrated in the article. So the o1 pro response is significantly closer to the article’s actual implementation, though still a simplified version of the comprehensive approach detailed in the full article.
Verdict:
o1 ❌ | o1 pro ✅
Challenge 5: Image Creation
Prompt:
Create an image of a cat.
o1 Response:
o 1 pro Response:
Observation:
Both o1 and o1 pro were not able to generated images indicating both the o1 versions do not support image generation. However, on giving the same prompt to GPT 4o, I got the response:
Hence, it is safe to say that only GPT 4o is beating both o1 and o1 pro in image generation!
Verdict:
o1 ❌ | o1 pro ❌
Challenge 6: Creating a Logical Flow Chart
Prompt:
Create a comprehensive flow chart illustrating the Reflection Pattern in Agentic AI.
o1 Response:
o1 pro Response:
Both provided incomplete flow chats, so I decided to update my prompt. Here’s my updated prompt:
New Prompt:
These are the steps involved in reflection patter –
- Generate Initial Output
- Self-Review/Critique the Output
- Identify Errors, Gaps, or Improvement Areas
- Develop Improvement Suggestions
- Revise/Refine the Output
- Repeat Steps 2-5 Until Satisfactory Result is Achieved
- Create a new flow chart now
o1 Response:
o1 pro Response:
Observation:
Even though the content in both the responses is the same, o1 is definetly winning by providing an actual flow chart, whereas o1 pro only provided the correct content.
Verdict:
o1 ✅ | o1 pro ❌
Result Chart: o1 vs o1 pro
Challenge | Verdict |
---|---|
Zebra Problem | o1 pro succeeded, but was slower |
Find Differences | Both models performed poorly |
Math Problem | Both solved correctly, o1 was faster |
Analyzing Article | o1 pro provided more depth |
Image Creation | Neither could generate images (GPT 4o could) |
Creating a Logical Flow Chart | o1 won by creating an actual flow chart |
o1 pro seems to have a slight edge in terms of problem-solving depth and accuracy, particularly in complex tasks like solving the zebra problem and analyzing technical articles. However, o1 tends to be faster and performs well in simpler tasks.The verdict appears to be that o1 Pro is marginally better, especially for more complex or technical challenges that require deeper understanding.
Also Read: Is the New o1 Model Better than GPT-4o?
End Note
While o1 pro shows promise in complex problem-solving, it’s important to consider your specific needs and budget. For basic to intermediate tasks, GPT-4o or other more affordable alternatives might suffice. If complex problem-solving is a priority and you’re willing to invest, o1 pro could be a valuable tool.
However, given that OpenAI is continually refining these models, it might be wise to wait for further updates before making a definitive decision. OpenAI is likely to add more benefits to the $200 ChatGPT Pro plan in the future.
What are your thoughts on this? Let me know in the comment section below.
Stay tuned to Analytics Vidhya Blog for more such awesome updates!