yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Why OpenAI's o1 Is A Huge Deal | YC Decoded


5m read
·Nov 3, 2024

Open AI's newest model is finally here. It's called 01, and it's much better for questions around mathematics and coding, and scores big on many of the toughest benchmarks out there. So, what's the secret to why it works? Let's take a look.

Inside OpenAI recently released two brand new models: 01 preview and 01 mini. These are the models that Sam Altman has been hinting at for months—the ones previously codenamed QAR and Strawberry. Together they represent an entirely new class of models that are designed to reason or think through complex problems.

01 really is the first system that can do pretty advanced reasoning. You know, if you give it a difficult programming challenge or a difficult math problem, or a difficult science thing you need help with, you can really get pretty extraordinary results. It performs similarly to PhD students on challenging benchmark tasks in areas like physics, chemistry, and biology, and excels in math and coding.

It's worth noting that when compared to GPT-4, users don't always prefer 01 for more informal subjective tasks like creative writing or editing text. This is likely a result of the very unique way in which OpenAI trained 01. It's fair to say that 01 preview and 01 mini amount to an entirely new kind of LLM.

If 01 is reasoning, the question is how similar it is to how humans work through a complex problem. It makes use of a chain of thought process to break down the question into smaller steps. Many of us have already used such a strategy when prompting earlier models like GPT-4, telling it to think step by step or take a breath and go line by line.

It'll work through the steps, recognize its own mistakes, try to correct them, try different strategies, and fine-tune its approach as needed. In other words, it's not just spitting out answers; it's working through problems in a way that mirrors human reasoning.

Now, people were already doing this since we already had a term for it: chain of thoughts, which came out in 2022 by Google Brain researchers. Here's an example of chain of thoughts, direct from the paper: "John has one pizza cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left?"

Chain of thought will break this down. First, you'd ask it to identify the total number of slices. The pizza is cut into eight equal slices. Then calculate the number of slices eaten by John and his friend. John eats three slices, and his friend eats two slices. Finally, subtract the total number of slices eaten from the original number of slices to find out how many are left—that's three slices.

Without chain of thoughts breaking it down into steps, LLMs would just try to predict the most likely token, and in any given request, there often would be just not enough context. If lots of people were already using manual chain of thoughts, how exactly did OpenAI approach this? They haven't said much, but here's a good guess: their AI researchers have said no amount of prompt engineering on GPT-4 could get it to rival the abilities of 01.

Instead, the new model was trained in an entirely novel fashion via reinforcement learning. This is a type of ML that allows a model to learn by trial and error from its own actions, often using rewards and punishments as signals for positive and negative behavior. Instead of only training on human-written chains of thought, OpenAI trained 01 further with large-scale reinforcement learning.

This means they allowed it to generate its own synthetic chains of thought that emulate human reasoning. These chains of thoughts are judged by the reward model and then used to train and fine-tune it more and more over time. OpenAI has found 01 consistently improves with more reinforcement learning and with more time spent thinking.

What this means is not only can the base model continue to improve with further training, but that in production, when you, the user, ask 01 a complex problem, the longer it is allowed to think, the more compute OpenAI is able to use to do so, and the more accurate its response is going to be.

Does this mean that 01 will only keep improving? Well, yes. We know the unreleased versions of 01 are still evolving. 01 preview has been described as an early version of the fully baked model, which we can hopefully expect to be released in the coming weeks or months. A few early-access startups have already received early access, and the results for them have been nothing short of staggering.

In fact, recently published research proved that by using chain of thought, an LLM can essentially solve any inherently serial problem. This means the sky truly is the limit for this series of models with enough compute resources. According to Sam Altman, we can definitely expect rapid improvement in these models over time, given these inference time scaling laws.

Sam compared the current 01 models to being at the GPT-2 stage, hinting that we will likely see a leap to the GPT-4 stage within a few years. So, is 01 actually reasoning? Without getting too philosophical, we think it is fair to say yes, it is.

01 tackles complex problems that require planning by generating its own sequence of intermediate steps, working through them, and often, but not always, arriving at a correct answer. Perhaps it is more accurate to say that 01 marks a shift from models that memorize the answers to ones that memorize the reasoning.

Of course, 01 still needs work. It hallucinates occasionally, forgets details, and struggles with problems that fall out of distribution. Like all models, its results can be improved a bit with better prompt engineering, especially prompts that outline edge cases or guide its reasoning style.

So, what's next? According to OpenAI's own researchers, the company has some exciting updates planned, including support for additional tools such as code interpreter and browsing, longer context windows, and eventually even multimodality. The only real question that remains is: what will you build with 01?

More Articles

View All
Warren Buffett Just Made a Huge $6.7B Investment.
Over the past few months, Warren Buffett has been hiding something: a secret stock, a secret position that was deliberately not disclosed to the public in his periodic 13F filings. And the SEC let him do it. They gave Buffett permission to buy up a stock …
Perfect Your Desires
One of the things I’ve learned relatively recently in life is that it’s way more important to perfect your desires if you want to do something than it is to try to do that thing when your desire is not 100%. An example would be like… you know, self-disci…
Homeroom with Sal, Carol Dweck, PhD, & Vicky Colbert - Tuesday, May 25
Hi everyone, Sal Khan here from Khan Academy. Welcome to the Homeroom with Sal live stream. We have a very exciting show today. We have, I would say, two mega figures in the world of education. We have Carol Dweck, a professor at Stanford. You all might …
Chris Hemsworth sends his best mates in search of the secret elixir of Bali | Azza & Zoc Do Earth
[cheering] - Hello. Chris Hemsworth here. I’ve decided to create a new series about unlocking health and wellness secrets around the world. Here’s the catch. [announcer] Chris Hemsworth! [Chris] I’m too busy to travel to all these countries and get the go…
Comparing proportionality constants
We’re told that cars A, B, and C are traveling at constant speeds, and they say select the car that travels the fastest. We have these three scenarios here, so I encourage you to pause this video and try to figure out which of these three cars is travelin…
See Elephants at Their Local Watering Hole – Day 55 | Safari Live
[Music] this program features live coverage of an African safari and may include animal kills and caucuses viewer discretion is advised. It’s a breezy, shimmery party-filled atmosphere as we celebrate the birth of Scotty 2 Hotty. This is Safari Live! I am…