Why OpenAI's o1 Is A Huge Deal | YC Decoded

5m read

·Nov 3, 2024

Open AI's newest model is finally here. It's called 01, and it's much better for questions around mathematics and coding, and scores big on many of the toughest benchmarks out there. So, what's the secret to why it works? Let's take a look.

Inside OpenAI recently released two brand new models: 01 preview and 01 mini. These are the models that Sam Altman has been hinting at for months—the ones previously codenamed QAR and Strawberry. Together they represent an entirely new class of models that are designed to reason or think through complex problems.

01 really is the first system that can do pretty advanced reasoning. You know, if you give it a difficult programming challenge or a difficult math problem, or a difficult science thing you need help with, you can really get pretty extraordinary results. It performs similarly to PhD students on challenging benchmark tasks in areas like physics, chemistry, and biology, and excels in math and coding.

It's worth noting that when compared to GPT-4, users don't always prefer 01 for more informal subjective tasks like creative writing or editing text. This is likely a result of the very unique way in which OpenAI trained 01. It's fair to say that 01 preview and 01 mini amount to an entirely new kind of LLM.

If 01 is reasoning, the question is how similar it is to how humans work through a complex problem. It makes use of a chain of thought process to break down the question into smaller steps. Many of us have already used such a strategy when prompting earlier models like GPT-4, telling it to think step by step or take a breath and go line by line.

It'll work through the steps, recognize its own mistakes, try to correct them, try different strategies, and fine-tune its approach as needed. In other words, it's not just spitting out answers; it's working through problems in a way that mirrors human reasoning.

Now, people were already doing this since we already had a term for it: chain of thoughts, which came out in 2022 by Google Brain researchers. Here's an example of chain of thoughts, direct from the paper: "John has one pizza cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left?"

Chain of thought will break this down. First, you'd ask it to identify the total number of slices. The pizza is cut into eight equal slices. Then calculate the number of slices eaten by John and his friend. John eats three slices, and his friend eats two slices. Finally, subtract the total number of slices eaten from the original number of slices to find out how many are left—that's three slices.

Without chain of thoughts breaking it down into steps, LLMs would just try to predict the most likely token, and in any given request, there often would be just not enough context. If lots of people were already using manual chain of thoughts, how exactly did OpenAI approach this? They haven't said much, but here's a good guess: their AI researchers have said no amount of prompt engineering on GPT-4 could get it to rival the abilities of 01.

Instead, the new model was trained in an entirely novel fashion via reinforcement learning. This is a type of ML that allows a model to learn by trial and error from its own actions, often using rewards and punishments as signals for positive and negative behavior. Instead of only training on human-written chains of thought, OpenAI trained 01 further with large-scale reinforcement learning.

This means they allowed it to generate its own synthetic chains of thought that emulate human reasoning. These chains of thoughts are judged by the reward model and then used to train and fine-tune it more and more over time. OpenAI has found 01 consistently improves with more reinforcement learning and with more time spent thinking.

What this means is not only can the base model continue to improve with further training, but that in production, when you, the user, ask 01 a complex problem, the longer it is allowed to think, the more compute OpenAI is able to use to do so, and the more accurate its response is going to be.

Does this mean that 01 will only keep improving? Well, yes. We know the unreleased versions of 01 are still evolving. 01 preview has been described as an early version of the fully baked model, which we can hopefully expect to be released in the coming weeks or months. A few early-access startups have already received early access, and the results for them have been nothing short of staggering.

In fact, recently published research proved that by using chain of thought, an LLM can essentially solve any inherently serial problem. This means the sky truly is the limit for this series of models with enough compute resources. According to Sam Altman, we can definitely expect rapid improvement in these models over time, given these inference time scaling laws.

Sam compared the current 01 models to being at the GPT-2 stage, hinting that we will likely see a leap to the GPT-4 stage within a few years. So, is 01 actually reasoning? Without getting too philosophical, we think it is fair to say yes, it is.

01 tackles complex problems that require planning by generating its own sequence of intermediate steps, working through them, and often, but not always, arriving at a correct answer. Perhaps it is more accurate to say that 01 marks a shift from models that memorize the answers to ones that memorize the reasoning.

Of course, 01 still needs work. It hallucinates occasionally, forgets details, and struggles with problems that fall out of distribution. Like all models, its results can be improved a bit with better prompt engineering, especially prompts that outline edge cases or guide its reasoning style.

So, what's next? According to OpenAI's own researchers, the company has some exciting updates planned, including support for additional tools such as code interpreter and browsing, longer context windows, and eventually even multimodality. The only real question that remains is: what will you build with 01?

Why OpenAI's o1 Is A Huge Deal | YC Decoded

More Articles