yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Why OpenAI's o1 Is A Huge Deal | YC Decoded


5m read
·Nov 3, 2024

Open AI's newest model is finally here. It's called 01, and it's much better for questions around mathematics and coding, and scores big on many of the toughest benchmarks out there. So, what's the secret to why it works? Let's take a look.

Inside OpenAI recently released two brand new models: 01 preview and 01 mini. These are the models that Sam Altman has been hinting at for months—the ones previously codenamed QAR and Strawberry. Together they represent an entirely new class of models that are designed to reason or think through complex problems.

01 really is the first system that can do pretty advanced reasoning. You know, if you give it a difficult programming challenge or a difficult math problem, or a difficult science thing you need help with, you can really get pretty extraordinary results. It performs similarly to PhD students on challenging benchmark tasks in areas like physics, chemistry, and biology, and excels in math and coding.

It's worth noting that when compared to GPT-4, users don't always prefer 01 for more informal subjective tasks like creative writing or editing text. This is likely a result of the very unique way in which OpenAI trained 01. It's fair to say that 01 preview and 01 mini amount to an entirely new kind of LLM.

If 01 is reasoning, the question is how similar it is to how humans work through a complex problem. It makes use of a chain of thought process to break down the question into smaller steps. Many of us have already used such a strategy when prompting earlier models like GPT-4, telling it to think step by step or take a breath and go line by line.

It'll work through the steps, recognize its own mistakes, try to correct them, try different strategies, and fine-tune its approach as needed. In other words, it's not just spitting out answers; it's working through problems in a way that mirrors human reasoning.

Now, people were already doing this since we already had a term for it: chain of thoughts, which came out in 2022 by Google Brain researchers. Here's an example of chain of thoughts, direct from the paper: "John has one pizza cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left?"

Chain of thought will break this down. First, you'd ask it to identify the total number of slices. The pizza is cut into eight equal slices. Then calculate the number of slices eaten by John and his friend. John eats three slices, and his friend eats two slices. Finally, subtract the total number of slices eaten from the original number of slices to find out how many are left—that's three slices.

Without chain of thoughts breaking it down into steps, LLMs would just try to predict the most likely token, and in any given request, there often would be just not enough context. If lots of people were already using manual chain of thoughts, how exactly did OpenAI approach this? They haven't said much, but here's a good guess: their AI researchers have said no amount of prompt engineering on GPT-4 could get it to rival the abilities of 01.

Instead, the new model was trained in an entirely novel fashion via reinforcement learning. This is a type of ML that allows a model to learn by trial and error from its own actions, often using rewards and punishments as signals for positive and negative behavior. Instead of only training on human-written chains of thought, OpenAI trained 01 further with large-scale reinforcement learning.

This means they allowed it to generate its own synthetic chains of thought that emulate human reasoning. These chains of thoughts are judged by the reward model and then used to train and fine-tune it more and more over time. OpenAI has found 01 consistently improves with more reinforcement learning and with more time spent thinking.

What this means is not only can the base model continue to improve with further training, but that in production, when you, the user, ask 01 a complex problem, the longer it is allowed to think, the more compute OpenAI is able to use to do so, and the more accurate its response is going to be.

Does this mean that 01 will only keep improving? Well, yes. We know the unreleased versions of 01 are still evolving. 01 preview has been described as an early version of the fully baked model, which we can hopefully expect to be released in the coming weeks or months. A few early-access startups have already received early access, and the results for them have been nothing short of staggering.

In fact, recently published research proved that by using chain of thought, an LLM can essentially solve any inherently serial problem. This means the sky truly is the limit for this series of models with enough compute resources. According to Sam Altman, we can definitely expect rapid improvement in these models over time, given these inference time scaling laws.

Sam compared the current 01 models to being at the GPT-2 stage, hinting that we will likely see a leap to the GPT-4 stage within a few years. So, is 01 actually reasoning? Without getting too philosophical, we think it is fair to say yes, it is.

01 tackles complex problems that require planning by generating its own sequence of intermediate steps, working through them, and often, but not always, arriving at a correct answer. Perhaps it is more accurate to say that 01 marks a shift from models that memorize the answers to ones that memorize the reasoning.

Of course, 01 still needs work. It hallucinates occasionally, forgets details, and struggles with problems that fall out of distribution. Like all models, its results can be improved a bit with better prompt engineering, especially prompts that outline edge cases or guide its reasoning style.

So, what's next? According to OpenAI's own researchers, the company has some exciting updates planned, including support for additional tools such as code interpreter and browsing, longer context windows, and eventually even multimodality. The only real question that remains is: what will you build with 01?

More Articles

View All
Homeroom with Sal & Fareed Zakaria
Hi everyone! Welcome to the daily homeroom livestream. Very excited about the conversation we’re about to have. I will start with my standard announcement to remind everyone that we are a not-for-profit organization and we can only exist with support from…
How optimizing my sleep is making me limitless
You’ve heard your whole life that you should get eight hours of sleep every single night. It’s advice so common that even your grandma has probably told you that at least three times. But that advice has always annoyed me somewhat because it’s like, yeah,…
Why Blue Whales Don't Get Cancer - Peto's Paradox
Cancer is a creepy and mysterious thing. In the process of trying to understand it, to get better at killing it, we discovered a biological paradox that remains unsolved to this day: Large animals seem to be immune to cancer, which doesn’t make any sense.…
Visually dividing unit fraction by a whole number
We’re asked to figure out what is one-seventh divided by four. They help us out with this diagram. We have a whole divided into seven equal sections; each of those is a seventh. We have one of those sevenths filled in, so this is one-seventh right over he…
Do Cell Phones Cause Brain Tumors?
Do cellphones cause brain cancer? Yeah, if you’re on them a lot, yes, it can’t be good for you. I did decide to stop, you know, putting the phone whilst I’m driving in my groin, inside my movie and over there, in case it’s gonna cause testicular cancer. …
How I live for FREE by House Hacking and investing in Real Estate
What’s up you guys? It’s Graham here. So, so many people have asked me to make a video of how I live for free by house hacking and investing in real estate. So, I wanted to break down my exact numbers with you guys, share exactly what I’m doing, and maybe…