The 10 Trillion Parameter AI Model With 300 IQ

22m read

·Nov 3, 2024

If O1 is this magical, what does it actually mean for Founders and Builders? One argument is it's bad for Builders because maybe O1 is just so powerful that OpenAI will just capture all the value. You mean they're going to capture a light cone of all future value? Yeah, there you go. They'll capture a light cone of all present, past, and future value. Oh my God.

The alternative, more optimistic scenario is we see ourselves how much time the founders spend, especially during the batch, on getting prompts to work correctly, getting the outputs to be accurate. But if it becomes more deterministic and accurate, then they can just spend their time on bread-and-butter software things. The winners will just be whoever builds the best user experience that gets all these nitty-gritty details.

[Music]

Correct. Welcome back to another episode of The Light Cone. We are sort of in this moment where OpenAI has raised the largest V round ever: $6.6 billion, with a B. Here's what Sarah Friar, the CFO of OpenAI, said about how they're going to use the money: "It's compute first, and it's not cheap; it's great talent second. And then, of course, it's all the normal operating expenses of a more traditional company. But I think there is no denying that we are on a scaling law right now where orders of magnitude matter. The next model is going to be an order of magnitude bigger and the next one on and on. So that does make it very capital intensive. So it's really about orders of magnitude."

Let's live in the future. There's 10 trillion parameters out there: 10 trillion parameter large language models, two orders of magnitude out from the state-of-the-art today. What happens? Are people actually going to be throwing queries and actually using these 10 trillion parameter models? It seems like you'd be waiting, you know, 10 minutes per token.

Yeah, for a bit of context right now: the frontier models, I mean, they're not public exactly how many parameters they have, but they're roughly in the 500s of billions. LLaMA 3: 405 billion; Anthropic is speculated to be 500 billion; GPT-4 is roughly around that much. Getting to 10 trillion, that’s a two-order magnitude.

I think the type of level of potential innovation could be the same leap we saw from GPT-2, which was around 1 billion parameters, that was released with the paper of a scaling loss, which was one of these seminal papers that people figured out, "Okay, this is Transformer architecture." What if we just throw a bunch of engineering at it and just do a lot of it? Where does this scale? And this logarithmic type of scaling was proved out when GPT-3.5 or GPT-3 got released. That was about 170ish billion parameters. So that's like that two-order magnitude, and we saw what happened with that.

That created this new flourishing era of AI companies. We saw it, we experienced this back in 2023 when we started seeing all these companies building on top of GPT-3.5 that were starting to work and it created this giant wealth. So we could probably expect if this scaling law continues, the feeling will be similar to what we felt from that year of transition: 2022 to 2023. Yeah, that was the moment when everything changed. So that would be pretty wild if that happens again.

I think there's one interesting aspect to this, which is clearly the current generation state-of-the-art models available, especially given 01 chain of thoughts, they sort of basically rival normal intelligence. You could make a strong case that AGI is basically already here. The majority of the tasks that 98% of knowledge workers do day-to-day, it is now possible for a software engineer, probably sitting in front of a cursor, to write something that gets to, you know, 90 to 98% accuracy and actually do what a human knowledge worker with 120 IQ would be doing all day.

And that's sort of really large. There are probably hundreds of companies that each of us have worked with over the past few years that are literally doing that day-to-day right now. You know, the weird interesting question is: at 10 trillion parameters, at, you know, 200 to 300 IQ, sort of ASI beyond what a normal human being normally could do, you know, what does that unlock?

There is a great article in The Atlantic with Terence Tao, sort of famously this Taiwanese mathematician who is literally north of 200 IQ and how he uses ChatGPT right now and it's sort of unlocking new capabilities for him. There are some examples of this happening, you know, quite a few times in human history. Like, you could argue that nuclear power was that vision; you had to actually model theoretically that something like nuclear fission was possible before anyone experimentally tried to do it.

For your transforms, yeah, maybe the thing is: if we think a lot of the capabilities right now are here but are not evenly distributed, if you go walk down the street and you talk to the random Joe, they don't feel the AI. They're just living their normal life and stuff is still just normal. It hasn't changed. But I think the counterexample is sometimes these discoveries take time for it to really pan out.

This is an example we're discussing: Fourier transform was this mathematical representation that Joseph Fourier discovered in the 1800s that was like a seminal thesis he wrote about representing series of functions that were repeating in periods. Before Fourier transform, they were written as these long sums and series that were very expensive to add them up and figure out how to really model the equation, basically.

But he found this very elegant way that instead of just doing sums of series, you could basically collapse all these math functions into sines and cosines waves that only need two variables, basically the amplitude and the period. You could represent every periodic signal and function. I mean, it sounds really cool math, which is like how some of this LLM use case sounds like, "Okay, cool, they can do all this coding." But Fourier transform, it took another 150 years until the 1950s when people figured out what to do with this.

It turned out that Fourier transforms were super good at representing signals, and we need signals basically to represent everything in the analog world to be digital because bits are ones and zeros. How do you compress that? One of the big applications as well is radio waves and made telecommunication a lot more efficient, image representation, encoding information theory. It just unlocks so much of the modern world.

Like the internet and cell towers work because of this theory, but it took 150 years until the average Joe could feel the Fourier transform. Interesting. That's a really powerful idea. I mean, that took a while then. I mean, apparently in the 1950s, that's the moment that color TV happened, so unlocked by Fourier transforms as well.

That's right. If you apply to the AI stuff that's happening today though, it's like, where do you start the clock ticking from? It's not clear if you start it from the ChatGPT-3 moment two years ago or from just all of the research that's been going on for decades. Like, we might actually just be—we’ve talked about this before—but we might actually be like decades into this now and it's starting to hit like the inflection moment potentially.

Yeah, for sure. I mean, if we run with Diana's example of Fourier transforms, like, all the math that’s underpinning all of this new AI stuff is linear algebra stuff that’s like 100 years old. It just turns out how far you could push it. Yeah, if you have all the GPUs to compute it, I guess that's one potential way that these 10 trillion parameter models actually alter the face of what humans are capable of.

They sort of unlock something about the nature of reality and our ability to model it, and then somehow it leads to either nuclear weapons or color TV. The other big thing is just because this is all in software, like compared to, like, Fourier transforms, a lot of the applications are just seen in physical devices, right? Like record players or telephones, like you said.

And so it takes a while for the technology to get adopted because you have to buy your updated device and all these things. Now we have like Facebook and Google who have a pretty decent percentage, just like the world using their software already. Like, as soon as these things start rolling out, and I feel like it's another thing that’s starting to be noticed is Meta in particular coming out with their Meta Ray-Bans, the consumer-like device.

I think consumers, once this becomes something that's like visual in your smart glasses plus like a voice app that you can talk to and it is indistinguishable from a human being, that's going to be a real change-the-world moment for— they'll start feeling the AI once they can talk to it all the time.

I mean, it seems like there's really a bifurcation in what we might expect when we have this capability. At the extreme, you're going to have people like Terence Tao pushing the edge and boundary of our understanding of our modelable world. Then, you know, maybe that's actually worth tens or hundreds of millions of dollars of inference to run these 10 trillion parameter models. Then the more likely way this ends up being useful for the rest of us is actually in distillation.

So taking, you know, there's some evidence that, for instance, Meta's 40B was mostly useful to make their 70 billion parameter model much, much better. And so, and you actually see this today. There was sort of this moment there where we thought that, you know, people might just go to GPT-4 and distill out all the weights and it seems like there’s some evidence that certain governmental entities are doing that already. But GPT-4 itself and for, you know, it became 40, OpenAI itself has now enabled distillation internal to its own API.

So you can use O1, you can use even GPT-4 or 4-40 to distill it down into a much cheaper model that's internal to them like GP4-40 mini, and that's sort of their lock-in capability. Yeah, I don't think this is talked much about, but it is interesting that you have these giant models like the 400 or 500, whatever billion parameter models that are basically the teacher models because they're the mega-trained with everything and took forever, and they are the teacher model, master model that teaches the student models, which are these smaller ones that are faster and cheaper.

Because doing inference for a 405 billion parameter model, it's very expensive. So we have evidence that all these distillation models are working. Companies in the batch, they're building from the latest and greatest. They're not going for the giant model with all of the parameters and give me the biggest thing to do so that it works the best. We have evidence that's not the case. People are not going for the big model.

We actually have stats in the batch. I mean, Harsh, we kind of talked about them. Yeah, Jared had ran some numbers on this and it's fascinating. But I think the bigger meta point is even the fact that talking about the startups or the founders building this stuff are choosing like the smaller models versus the bigger models. They just have choice. Even like a year ago when this entire industry started existing, everything was built on top of ChatGPT, right?

There was 100% market share—the ChatGPT wrapper meme. I feel like we've, especially over the last six months, seen people start talking about the other models like Claude and Sonnet being sort of this word of mouth for almost being better at Coen than ChatGPT, and people just starting to use different models. And so the numbers that Jared ran for the Summer 24 batch are fascinating because it seemed to that trend just continued: we have more diversification of LLMs and models that developers are building on top of.

Some of the stuff that really stood out is Claude has even just in six months from the winter batch to the summer batch has gone from like 5% developer market share to like 25% of companies in the batch, yes—of companies in the batch, which is huge. That's like never seen a jump like that, right? L has gone from 0% to 8%. Like one thing that we know from running YC for a long time is that whatever the companies in the batch use is a very good predictor of what like the best companies in the world are using and therefore what products will be most successful.

A lot of YC's most successful companies, you could have basically predicted which ones they would be based on just looking, basically just running a poll of what the companies in the batch use. If we just take Opening Eye's latest fundraise off the table and the latest like the O1 model off the table for a second, it would seem like among developers and builders, OpenAI was losing. Like they went from being the only game in town to just like bleeding market share to the other models at a pretty rapid rate.

The interesting thing though is maybe they are coming back. Like was the stat that you peered, seemed like, or 15% of the batch are already using O1 even though it's not like fully available yet? It's only like 2 weeks old now? Yep, yeah, and we're seeing some interesting things with O1. We're actually hosting right now in person, right now, as we speak downstairs, a hackathon to give YC company's early access to O1, and Sam himself was here—did the kickoff. There's a bunch of OpenAI researchers and engineers working on it, and it's only been about four hours of hacking and we already heard of—I already saw actually some demos as I was walking by to see some teams and they already built things that were not possible before with any other model.

Do you have some examples? One of the companies I'm working with is Freestyle. They're building a cloud solution fully built with TypeScript with, if you’re familiar with durable objects, with this really cool framework that makes front-end and back-end seamless to develop, and it’s really cool to use. What was cool about them is they've just been working on it for a couple hours and I saw a demo that was mind-blowing. They basically got a version of repet agent working with the product. All they had to prompt O1 with was all their developer API, some of their developer documentation, and some of their code.

They could just prompt it, "Build me a web app that writes a to-do list," or this, and it would just boom, just work. And it was able to reason and infer with the documentation and took a lot longer, but it arrived and built the actual app. What's interesting for us to talk about is if O1 is this magical, what does it actually mean for Founders and Builders?

And one argument is it's bad for Builders because maybe O1 is just so powerful that OpenAI will just capture all the value and everything that could be valuable and built on top of this stuff will just be owned by them. You mean they're going to capture a light cone of all future value? Yeah, there you go. They'll capture a light cone of all present, past, and future value. My God.

The alternative, more optimistic scenario is we see ourselves how much time the founders spend, especially during the batch, on the tooling around getting the prompts—like getting prompts to work correctly, getting the outputs to be accurate, human in the loop—like all of this time spent just getting the core product working is now deterministic.

But if it becomes more deterministic and accurate, then they can just spend their time on bread and butter software things, you know, like better UI, better customer experience, more sales, more relationships, in which case it may be a better time to start now than ever because you don't even have—like maybe all of the knowledge you learned around how to get the prompts accurate and working was just temporary knowledge that's no longer relevant as these things get more powerful.

Actually, we had this conversation with Jake Hel from Casex where getting the legal copilot to work to 100% was the huge unlock and it was really hard. Yep, he, like, you know, we hated this whole talk about all the things he had to do to actually get the thing to be accurate enough.

Yeah, imagine if he didn't have to do any of that. If they just on day one, you could be guaranteed 100% accuracy, like as though you're just building a web app on top of a database, the barrier to entry to build these things goes way down. There's going to be more competition than ever, and then it will probably just become look more like a traditional winner-takes-all software market.

Jared has an example. So there's a company at Dry Merch, yeah, that you work with, and they went from 80% accuracy to pretty much 99% or for intents and purposes, 100% using A1 and unlocked a bunch of things you want to talk about them. Yeah, just by swapping out GPT-40 for O1, I think there might be even more bullish version, Harsh, which is that there are use cases right now that people are not able to use LLMs for because even though they're trying to get the accuracy high enough, they just can't get it accurate enough to actually be rolled out in production.

Like, especially if you think about like really mission-critical jobs where the stakes are dire, it's pretty hard to use LLMs for that. As they keep getting more accurate, those applications will start to actually work. I guess there is a lot of evidence inside the YC greater portfolio, you know, I was meeting a company from 2017. I think I tweeted about them. They were, you know, $50 million annualized revenue at that point but growing 50% a year a year or two ago. They were not profitable.

They knew that they needed to raise more money, but in the year since, they automated about 60% of their customer support tickets, and they went from something that needed to raise another round imminently to something that was totally cash flow break-even while still growing 50% year-on-year. That's sort of like the dream scenario for building enterprise value because you’re big enough that you know you’re a going concern and then you’re literally compounding your growth with like no additional capital coming in.

So it’s companies like that that actually end up becoming like half a billion, a billion dollars a year in revenue and like driving hundreds of millions of dollars in free cash flow. I mean, that’s sort of the dream for founders at some level, and I think that that’s one of the more dramatic examples that I've seen thus far. And I think it’s sort of not an isolated case. You know, how we're sort of talking, it's, you know, 2024 now and we're still in this overhang moment where companies sort of on this path raised way too much money at, you know, 30X or 40X, you know, next 12 months multiple revenue, like seemingly struggling.

But you know, also never going to raise another round like this is actually pretty good news for them because they actually can go from like not profitable to, you know, break even to then potentially very profitable. I think that narrative is not out there, and I think it's really, really good news for founders. I probably start to catch attention. I didn't—the Klar CEO got a lot of attention a few weeks ago for—I mean, it's not unclear how much of it is real or not, but at least they’re pitching that they're just, you know, replacing their internal systems of records for HR and sales with home-built LLM create apps, at least was like the insinuation.

Yeah, what is it? They got rid of Workday. Yeah, that was it. Yeah, that's pretty wild, honestly. I mean, so that's good if you treat OpenAI as the Google of the next 20 years. You want to invest in Google and all the things that Google enabled. Like, Airbnb—Google could do Airbnb; it probably won't, yeah, just from like, I don't know, CEO's theorem of the firm. Probably it’s just like too inefficient and too difficult; it requires too much domain expertise to actually pull that off.

So what are sort of the new Googles that are getting built out? There are these vertical agents. What are some examples that we'll have that we could talk about? I loved working with this company called Tax GPT from the last YC batch. They started off actually really literally a wrapper and like you know, it’s in the name—Tax GBT—but my favorite example about them is like, you know, it turned out that tax advice, you know, doing basic RAG on, you know, it’s sort of like Casetext, actually it was, you know, being able to do RAG on existing case law and existing policy documents from the IRS or internationally.

That was just sort of the wedge that got them in front of, you know, tens of thousands of accountants and accounting firms. And now what they’re doing is building an enterprise business on document upload. So you sort of, you know, get them for cheap or free for the thing that people are Googling for, and then once they know about you and trust you, you get like this 10 or $100,000 a year ACV contract that then takes over real workflow that, you know, actually extinguishes tens to hundreds of hours of work per accountant.

And another interesting thing about the O1 model is we were saying originally ChatGPT was the only thing you could build on top of; OpenAI was the only game in town. Then there were all these models. I think the sort of alpha leak we have here, like right now in this room, is downstairs, people are building at the cutting edge of O1 that even the public does not have access to.

And what we're seeing is that this is a real major step forward. Like O1 is going to be a big deal for any programmer or engineer who is building an AI application. The interesting thing is will this cycle repeat, where it will give OpenAI a temporary lead? The market share will just like go, you know, back up towards 100% but then within 6 months, LLaMA will be updated, Claude will come up with its new release, Gemini will keep getting better, and there’ll just be like, you know, four different models that have equivalent levels of reasoning? Or will this be like the first time OpenAI has like a true breakthrough?

And I would just define a true breakthrough as something that's actually defensible. Like, if no one else can replicate it, then that puts them in a really powerful position. But we don't know, and I think that's what's interesting. It’s like OpenAI seems like it is continually the one pushing the envelope. They always seem to be the first ones to make major breakthroughs, but they have never been able to maintain the lead so far.

I think the other thing that's interesting about O1 is that it makes a lot of the GPU needs even bigger because it's moving a lot of the computation needs a lot higher for inference because it's taking a lot more time to do a lot of the inference. So I think it’s going to change also a lot of dynamics underneath for a lot of the companies building AI infrastructure as well, which is something food for thought. It seems like there are two different types of use cases.

I believe they did just enable distillation from O1 into 40, and so it's conceivable that for relatively rote and repeating use cases, you could just sort of use O1 for the difficult ones, and then you distill it out and then you pay 40 or 40 mini prices from there. And then again, there’s this other type of problem that is very specific. I mean, I imagine most companies coding situations are a little bit more like that, where you need to pay like for the full O1 experience because it’s, you know, fairly detailed and specific.

Depends if you're building F2 right. If you're an enterprise software and you can pass the cost on to your customer, and they can tolerate higher latency and don't care as much about it being instant, then you can just use maybe O1 a lot. If you’re putting like consumer apps, probably not. But talking of consumer apps, I mean, the other thing that was striking about OpenAI is their releases, like this real-time voice API.

Super cool. It's pretty remarkable. And I think the most telling thing to me is that the ongoing usage-based pricing is $9 per hour and it sort of points to a sort of powerful thing. Like, if I were a macro trader, I would be very, very bearish on countries that rely very heavily on call centers right now because $9 an hour is sort of right there at what a call center would cost. This is another thing we're definitely seeing within the batch, right?

Like it's clear that voice is almost like a killer app. Like arguably, like, there's a company I just worked with in this batch or in my group at least that do just do AI voice for debt collection, and their track is just phenomenal. It’s working incredibly well. A whole bunch of the voice apps in S24 were just like some of the fastest-growing, like just like explosive companies. It was a clear trend for S24, and I remember working with companies in the prior batches to try to do voice, and it just like wasn't working well enough.

Like the latency was too high, exactly, like it got confused with interruptions and things like that, and it’s like just turned the corner where it’s finally working. There is another company I work with, Happy Robot, that landed on this idea that was doing a voice agent for coordinating all these phone calls for logistics. Think of a truck driver that needs to go from point A to point B. These are all like people just calling to check where you are. There's no like "find my friends" for it, and they started getting a lot of usage on this.

And I think we talked a bit about this before, that at this point AI has passed the Turing tests and is solving all of these very menial problems over the phone. That’s pretty wild. I guess one thing that is maybe under-discussed is to what degree the engineering teams that are in these sort of incumbent industries—it feels like it's pretty binary—either, you know, the vast majority of companies and organizations, especially the ones that were maybe founded four or more years ago, they actually don't take any of this seriously.

Like they have literally no initiative on this stuff, and I sort of wonder how generational it is. Like I’m realizing that managers and VP-advancements, like they’re probably my age now. I’m 43 now and, you know, if I wasn't here seeing exactly what was happening, I would be sort of tempted to say like this is just the same old thing. "AI, yeah, yeah, yeah." But I think it's the rate of improvement that people don't get if they're not as close to it as we are.

I just think your average corporate enterprise person is certainly used to technology disrupting things, but over pretty long time frames. And if anything, they become cynical. "Oh, yeah, the cloud." Like cloud was such a buzzword for a long time. It totally did change how enterprise software is built and delivered, but it took like a decade or so. And so I suspect everyone's feeling that way about AI.

It's just your natural default mode is to be cynical, "Oh, yeah, it's not going to be ready for a while." And then probably if you looked at this stuff even six months ago—like we were just talking about—if you looked at an AI voice app six months ago, you’re like, "Oh, this is years away from being anything that we need to take seriously." It’s like actually, like, three to four months later, like it’s like it’s some real major inflection point.

And I think that’s what takes even people within tech—it’s surprising all of us how quickly this stuff is moving. It’s the fastest any tech has ever improved, I think. Certainly faster than processors, certainly faster than the cloud, and it's kind of fun to actually watch. It's been remarkable to actually see. Another example of this in the batch, so a lot of the technical founders, sometimes I sit with them and I just watch how they code—the before and the after.

Before all of this wave of AI, just standard, you have your IDE and things on the terminal. People ship fine, but the demos and products that we're seeing founders build during the batch is like a next level of polish. And when you see them and see them code, it’s like they’re living in the future. They’re really not just like at this point; GitHub Copilot is already kind of old news.

They're using the latest coding assistant— a lot of them perhaps using something like Cursor or further. Right? This is something Jared pulled out as well when we asked the founders the IDEs, right? Yeah, we surveyed the Summer 24 founders, and half the batch is using Cursor compared to only 12% that’s using GitHub Copilot. That was surprising to me.

They’re not even using the fully agented coding agents. Like Replit’s still sort of like Copilot phase stuff. But even just going from like GitHub Copilot to Cursor, which is like the next step up in terms of like how much the actual AI does, is this like incredible breakthrough they see very quickly. I mean, this is evident today in the hack. I was impressed with what they built.

I was looking at their editor—like, "See? Like, cool." It’s another sign why like the founders have the advantage, right? Like it feels to me again like when GitHub Copilot first came on the scene, it seemed—it's GitHub plus Microsoft. It has all the developers, plus it has all the capital, plus it has all the access to like the inside track on OpenAI. How could any coding IDE compete with them?

It will just get subsumed. And like Cursor has come out of nowhere and is like, you know, according to our numbers, like five times the size of like GitHub Copilot within the batch, which again, like you were saying earlier, like the startup founders are actually usually the tastemakers on this kind of thing. I think there are certain types of businesses where it doesn’t make sense to maybe go after startup founders as your early customers, but for developer tools, it definitely does.

Like Stripe and AWS both wanted to own YC batches in particular, and that worked out really well for them. So it’s probably a really good sign for Cursor, honestly, that they have such good penetration within the YC batch. Yeah, I would definitely say Cursor is pretty awesome, but AltaVista was awesome too. Yeah, I remember using that as a search engine and there was another version and the next version was ten times better.

And so this is the way it's going to go. I mean, which the only people who win are actually developers because of all this competition. So I think, again, it takes us to like the optimistic view of all of this stuff, which is as the models get more powerful, the winners will just be whoever builds the best user experience and gets all these nitty-gritty details correct. And so that’s why Cursor can beat GitHub Copilot that has all the advantages.

AltaVista is a great example. Like, there's still—like Google still came along and crushed them, right? So there's still room for someone to keep doing to Cursor what Cursor has done to GitHub Copilot. So let's get back to 10 trillion parameters. What world do you think we will live in with this made real—with ASI or something approaching it? You know, what will humans actually do and how much more awesome will it be?

Well, I'll give a steel man for a really bullish case, which is that the thing that is holding back the rate of scientific and technological progress is arguably the number of smart people who can actually analyze all the information that we already know about the world. There are millions of scientific papers already out there, an incredible amount of data, but like try reading all of it. It's far beyond the scale of any human's comprehension.

And if we make the models smart enough that they can actually do original thinking and deep analysis with correct logic, and you could let loose an infinite— a near-infinite amount of intelligence on the near-infinite amount of data and knowledge that we have about the world, you can just imagine it’s just coming out with just crazy scientific discoveries: room temperature fusion, room temperature superconductors, time travel, flying cars—all the stuff that humans haven't been able to invent yet.

Like, with enough intelligence, maybe we’ll finally invent it all. Sign me up for that future; sounds great. I totally agree with you. I think, you know, what this might be is not merely a bicycle for the mind; it might actually be a self-driving car or even crazier, maybe a rocket to Mars. So with that, we'll see you guys next time.

[Music]

The 10 Trillion Parameter AI Model With 300 IQ

More Articles