yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Why OpenAI's o1 Is A Huge Deal | YC Decoded


5m read
·Nov 3, 2024

Open AI's newest model is finally here. It's called 01, and it's much better for questions around mathematics and coding, and scores big on many of the toughest benchmarks out there. So, what's the secret to why it works? Let's take a look.

Inside OpenAI recently released two brand new models: 01 preview and 01 mini. These are the models that Sam Altman has been hinting at for months—the ones previously codenamed QAR and Strawberry. Together they represent an entirely new class of models that are designed to reason or think through complex problems.

01 really is the first system that can do pretty advanced reasoning. You know, if you give it a difficult programming challenge or a difficult math problem, or a difficult science thing you need help with, you can really get pretty extraordinary results. It performs similarly to PhD students on challenging benchmark tasks in areas like physics, chemistry, and biology, and excels in math and coding.

It's worth noting that when compared to GPT-4, users don't always prefer 01 for more informal subjective tasks like creative writing or editing text. This is likely a result of the very unique way in which OpenAI trained 01. It's fair to say that 01 preview and 01 mini amount to an entirely new kind of LLM.

If 01 is reasoning, the question is how similar it is to how humans work through a complex problem. It makes use of a chain of thought process to break down the question into smaller steps. Many of us have already used such a strategy when prompting earlier models like GPT-4, telling it to think step by step or take a breath and go line by line.

It'll work through the steps, recognize its own mistakes, try to correct them, try different strategies, and fine-tune its approach as needed. In other words, it's not just spitting out answers; it's working through problems in a way that mirrors human reasoning.

Now, people were already doing this since we already had a term for it: chain of thoughts, which came out in 2022 by Google Brain researchers. Here's an example of chain of thoughts, direct from the paper: "John has one pizza cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left?"

Chain of thought will break this down. First, you'd ask it to identify the total number of slices. The pizza is cut into eight equal slices. Then calculate the number of slices eaten by John and his friend. John eats three slices, and his friend eats two slices. Finally, subtract the total number of slices eaten from the original number of slices to find out how many are left—that's three slices.

Without chain of thoughts breaking it down into steps, LLMs would just try to predict the most likely token, and in any given request, there often would be just not enough context. If lots of people were already using manual chain of thoughts, how exactly did OpenAI approach this? They haven't said much, but here's a good guess: their AI researchers have said no amount of prompt engineering on GPT-4 could get it to rival the abilities of 01.

Instead, the new model was trained in an entirely novel fashion via reinforcement learning. This is a type of ML that allows a model to learn by trial and error from its own actions, often using rewards and punishments as signals for positive and negative behavior. Instead of only training on human-written chains of thought, OpenAI trained 01 further with large-scale reinforcement learning.

This means they allowed it to generate its own synthetic chains of thought that emulate human reasoning. These chains of thoughts are judged by the reward model and then used to train and fine-tune it more and more over time. OpenAI has found 01 consistently improves with more reinforcement learning and with more time spent thinking.

What this means is not only can the base model continue to improve with further training, but that in production, when you, the user, ask 01 a complex problem, the longer it is allowed to think, the more compute OpenAI is able to use to do so, and the more accurate its response is going to be.

Does this mean that 01 will only keep improving? Well, yes. We know the unreleased versions of 01 are still evolving. 01 preview has been described as an early version of the fully baked model, which we can hopefully expect to be released in the coming weeks or months. A few early-access startups have already received early access, and the results for them have been nothing short of staggering.

In fact, recently published research proved that by using chain of thought, an LLM can essentially solve any inherently serial problem. This means the sky truly is the limit for this series of models with enough compute resources. According to Sam Altman, we can definitely expect rapid improvement in these models over time, given these inference time scaling laws.

Sam compared the current 01 models to being at the GPT-2 stage, hinting that we will likely see a leap to the GPT-4 stage within a few years. So, is 01 actually reasoning? Without getting too philosophical, we think it is fair to say yes, it is.

01 tackles complex problems that require planning by generating its own sequence of intermediate steps, working through them, and often, but not always, arriving at a correct answer. Perhaps it is more accurate to say that 01 marks a shift from models that memorize the answers to ones that memorize the reasoning.

Of course, 01 still needs work. It hallucinates occasionally, forgets details, and struggles with problems that fall out of distribution. Like all models, its results can be improved a bit with better prompt engineering, especially prompts that outline edge cases or guide its reasoning style.

So, what's next? According to OpenAI's own researchers, the company has some exciting updates planned, including support for additional tools such as code interpreter and browsing, longer context windows, and eventually even multimodality. The only real question that remains is: what will you build with 01?

More Articles

View All
P-values and significance tests | AP Statistics | Khan Academy
Let’s say that I run a website that currently has this off-white color for its background, and I know the mean amount of time that people spend on my website. Let’s say it is 20 minutes, and I’m interested in making a change that will make people spend mo…
WATCH THIS Before Building Multiple Income Streams
It’s been constantly said that in order to get really wealthy, you need to have multiple streams of income. We’ve also mentioned this in past videos. Ideally, you should aim to have around three to seven individual streams of income to be safe. But here’s…
Current | Introduction to electrical engineering | Electrical engineering | Khan Academy
All right, now we’re going to talk about the idea of an electric current. The story about current starts with the idea of charge. So, we’ve learned that we have two kinds of charges: positive and negative charge. We’ll just make up two little charges like…
The Real Story of Oppenheimer
J. Robert Oppenheimer might be the most important physicist to have ever lived. He never won a Nobel Prize, but he changed the world more than most Nobel Prize winners. Under his leadership, the best physicists of the 20th century built the atomic bomb, f…
Underwater Lost City in England | Lost Cities With Albert Lin
ALBERT LIN (VOICEOVER): Maritime archaeologist Garry Momber has been exploring these waters for 20 years. Thank you. ALBERT LIN (VOICEOVER): The English Channel is a notoriously difficult place to dive. Meticulous preparations are vital. Visibility isn’t…
Death Along the Ganges River | The Story of God
Bodies have been cremated on the banks of the River Ganges for hundreds of years, bathed in the waters of their holy river, wrapped in linen, and placed on a wooden pyre. The dead are consumed by flame. Swami Barista, a monk and a doctor, is my guide to d…