yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Why OpenAI's o1 Is A Huge Deal | YC Decoded


5m read
·Nov 3, 2024

Open AI's newest model is finally here. It's called 01, and it's much better for questions around mathematics and coding, and scores big on many of the toughest benchmarks out there. So, what's the secret to why it works? Let's take a look.

Inside OpenAI recently released two brand new models: 01 preview and 01 mini. These are the models that Sam Altman has been hinting at for months—the ones previously codenamed QAR and Strawberry. Together they represent an entirely new class of models that are designed to reason or think through complex problems.

01 really is the first system that can do pretty advanced reasoning. You know, if you give it a difficult programming challenge or a difficult math problem, or a difficult science thing you need help with, you can really get pretty extraordinary results. It performs similarly to PhD students on challenging benchmark tasks in areas like physics, chemistry, and biology, and excels in math and coding.

It's worth noting that when compared to GPT-4, users don't always prefer 01 for more informal subjective tasks like creative writing or editing text. This is likely a result of the very unique way in which OpenAI trained 01. It's fair to say that 01 preview and 01 mini amount to an entirely new kind of LLM.

If 01 is reasoning, the question is how similar it is to how humans work through a complex problem. It makes use of a chain of thought process to break down the question into smaller steps. Many of us have already used such a strategy when prompting earlier models like GPT-4, telling it to think step by step or take a breath and go line by line.

It'll work through the steps, recognize its own mistakes, try to correct them, try different strategies, and fine-tune its approach as needed. In other words, it's not just spitting out answers; it's working through problems in a way that mirrors human reasoning.

Now, people were already doing this since we already had a term for it: chain of thoughts, which came out in 2022 by Google Brain researchers. Here's an example of chain of thoughts, direct from the paper: "John has one pizza cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left?"

Chain of thought will break this down. First, you'd ask it to identify the total number of slices. The pizza is cut into eight equal slices. Then calculate the number of slices eaten by John and his friend. John eats three slices, and his friend eats two slices. Finally, subtract the total number of slices eaten from the original number of slices to find out how many are left—that's three slices.

Without chain of thoughts breaking it down into steps, LLMs would just try to predict the most likely token, and in any given request, there often would be just not enough context. If lots of people were already using manual chain of thoughts, how exactly did OpenAI approach this? They haven't said much, but here's a good guess: their AI researchers have said no amount of prompt engineering on GPT-4 could get it to rival the abilities of 01.

Instead, the new model was trained in an entirely novel fashion via reinforcement learning. This is a type of ML that allows a model to learn by trial and error from its own actions, often using rewards and punishments as signals for positive and negative behavior. Instead of only training on human-written chains of thought, OpenAI trained 01 further with large-scale reinforcement learning.

This means they allowed it to generate its own synthetic chains of thought that emulate human reasoning. These chains of thoughts are judged by the reward model and then used to train and fine-tune it more and more over time. OpenAI has found 01 consistently improves with more reinforcement learning and with more time spent thinking.

What this means is not only can the base model continue to improve with further training, but that in production, when you, the user, ask 01 a complex problem, the longer it is allowed to think, the more compute OpenAI is able to use to do so, and the more accurate its response is going to be.

Does this mean that 01 will only keep improving? Well, yes. We know the unreleased versions of 01 are still evolving. 01 preview has been described as an early version of the fully baked model, which we can hopefully expect to be released in the coming weeks or months. A few early-access startups have already received early access, and the results for them have been nothing short of staggering.

In fact, recently published research proved that by using chain of thought, an LLM can essentially solve any inherently serial problem. This means the sky truly is the limit for this series of models with enough compute resources. According to Sam Altman, we can definitely expect rapid improvement in these models over time, given these inference time scaling laws.

Sam compared the current 01 models to being at the GPT-2 stage, hinting that we will likely see a leap to the GPT-4 stage within a few years. So, is 01 actually reasoning? Without getting too philosophical, we think it is fair to say yes, it is.

01 tackles complex problems that require planning by generating its own sequence of intermediate steps, working through them, and often, but not always, arriving at a correct answer. Perhaps it is more accurate to say that 01 marks a shift from models that memorize the answers to ones that memorize the reasoning.

Of course, 01 still needs work. It hallucinates occasionally, forgets details, and struggles with problems that fall out of distribution. Like all models, its results can be improved a bit with better prompt engineering, especially prompts that outline edge cases or guide its reasoning style.

So, what's next? According to OpenAI's own researchers, the company has some exciting updates planned, including support for additional tools such as code interpreter and browsing, longer context windows, and eventually even multimodality. The only real question that remains is: what will you build with 01?

More Articles

View All
Derivative as a concept | Derivatives introduction | AP Calculus AB | Khan Academy
You are likely already familiar with the idea of a slope of a line. If you’re not, I encourage you to review it on Khan Academy. But all it is, it’s describing the rate of change of a vertical variable with respect to a horizontal variable. So, for examp…
MACAWS in SlowMotion! Rainforest Research! Smarter Every Day 60
Hey, it’s me, Dtin. Welcome back to Smarter Every Day! So let’s pretend for a second that you’re a macaw, and you live in the Amazon rainforest. Life is pretty good; you have all the fruit you want. But there’s one problem: you don’t get all the nutrient…
Graphing circles from features | Mathematics II | High School Math | Khan Academy
We’re asked to graph the circle which is centered at (3, -2) and has a radius of five units. I got this exercise off of the Con Academy “Graph a Circle According to Its Features” exercise. It’s a pretty neat little widget here because what I can do is I c…
Inside the Kurdish Ground War on ISIS | Explorer
[Music] I began covering War for National Geographic in 2006, and I never got to Kurdistan during that part of the war. In fact, I really didn’t have any idea who the Kurds were back then. I happened to meet some wounded Kurdish soldiers in Baghdad, and I…
The Sun is NOT Yellow! #shorts
The sun is yellow, or is it? You’re used to seeing a happy yellow circle floating in a blue sky, but that’s fake news. If you placed a prism in a sunbeam in space, you’d see that the sun radiates light in every color of the visible spectrum. If these colo…
Getting Water in the Arctic | Life Below Zero
[Music] Not everything goes the way you want it to go. You don’t get to choose how life unfolds; you just get to live it. [Music] Looks like I’ve got good moving water, but it looks like it’s out there quite a ways right now here in Kavik. This is the cha…