Better AI Models, Better Startups

28m read

·Nov 3, 2024

Every time there's an Open AI product release now, it feels like there's a bunch of startups waiting with baited breath to see whether Open AI is going to kill their startup. This is actually a really crazy moment for all startups. Adding more types of modalities and more capabilities per model, the better off every startup is. You have to be on top of these announcements and be kind of know what you're going to build in anticipation of them before someone else does versus being worried about Open AI or Google being the ones to build.

Welcome back to another episode of the Light Cone. I'm Gary, this is Jared Harge and Diana, and we're some of the group partners at YC who have funded companies that have gone on to be worth hundreds of billions of dollars in aggregate. Today, we are at an interesting moment in the innovation of large language models, and that we've seen a lot of really new tech come out just in the last few weeks. Whether it's GPT 4.0, it's Gemini 1.5. Harge, how are you thinking about, you know, what does it mean for these models to be so much better?

Anytime I see a new announcement from one of the big AI companies with the release of a new model, the first thing I think about is what does this mean for the startups and in particular YC startups? When I was watching the Open AI demos, it was pretty clear to me that they are really targeting consumer. Like all of the demos were cool consumer use cases and applications, which makes sense. That's kind of what Chat GPT was—a consumer app that went really viral. I just wonder what it means for the consumer companies that we're funding and in particular, like how will they compete with Open AI for these users?

What did you think? Like even if we take it back, like how do consumer products win from like first principles? Is it more about the product or the distribution? And how do you compete with Open AI on either of those things?

Yeah, that's a great question. I mean, I think ultimately it's both. And then how I want it to be is that the best product wins. How it actually is, is whoever has the best distribution and a sufficiently good product seems to win either way. I actually think we're at sort of in this moment where the better the model becomes, if you're already using 4 and suddenly 4, you know, you can change one line of code and suddenly be using 40. You basically just get smarter by default every generation, and that's really, really powerful. It means that you, I think we're entering this moment where the IQ of these things is still, you know, 4 is arguably around 85, it's not that high, and then if the next generation, if CLA 3 really is at 100 or, you know, the next few models end up being closer to, you know, 110, 120, 130, this is actually a really crazy moment for all startups.

The most interesting thing is like adding new capabilities. Having the same model be great at coding, for instance, that means that, you know, you might have a breakthrough in reasoning, not through just the model reasoning itself, but you could have the model actually write code and have the code do better. Even right now, it seems like there's a lot of evidence that if instead of trying to prompt the model to do the work itself, you have it write code and you execute the code, it can actually do things that reasoning alone could not do. So adding more types of modalities and more capabilities per model, the better off every startup is.

I mean, the cool thing about 4.0 is that you can get better structured output. In this particular case, they are better at getting JSON, which is getting signs of getting large language models not just outputting English but more language for computers so that you can build even better applications on top, which is signaling that this better model can be better for startups and make it easier to integrate. Because one of the challenges for startups has been always coing LMS to output the right thing so you're actually processing it in regular business logic.

The other thing I kind of thought about when I was looking at the demos is as it relates to startups if only one of these companies has the most powerful model by some distance, then that is indeed bad for startups because you have to depend on them being friendly and having like a nice API for you to build on top of. If there are multiple equivalently powerful models, you're much safer off as a startup. It was funny, maybe coincidental, maybe not that like Open AI announcement was like what, two days before, one day, one day before Google's, right? What's the difference between the so under the hood the way that GPT 4.0 works and then Gemini 1.5 works, and do you have any opinions on their relative strengths?

Yeah, so the thing about 4.0 why it was so interesting, it was adding the speech modality and also video processing on top of text. The way they do that is still primarily a text-based transformer model underneath, basically GPT 4. And what they done is bootstrap and added modules so that it has different paths to handle this different type of data. Open AI famously also implemented and launched Whisper, which is one of the state-of-the-art for automatic speech recognition. Probably that's what they're doing. They took the architecture of Whisper and then bolted it into GPT 4 and they also bolted DALL·E and combined these and that became 4.0. This is why in terms of the reasoning capabilities, 4.0 isn't better per se than 4 by any margin. It's how it works, it's kind of adding modules.

How they describe it on the white paper, the difference versus Gemini 1.5, which actually on the technical aspects and merits, I'm actually more excited by the Gemini one. I know it's counterintuitive because 4.0 and Open AI has captured the gist of everyone, and they're so good at the demos. Right? Singing Happy Birthday a bit off key, that's like so humanid. Happy birthday to you, happy birthday to you, happy birthday dear Jan, happy birthday to Jordan. Google I/O kind of missed the mark in terms of demo, but in terms of reading their white paper, what's interesting about Gemini 1.5 is that it's actually a true mixture of experts and that is a technique that's new where they actually train from the ground up a giant model with the actual data of text, image, audio, and the whole network activates a specific path for these different data types.

Instead of the Open AI model that has kind of modules, this one truly is a one-all model. What it does is different parts of the network activate depending on the data input, so it becomes very energy-efficient. I think the reason why Google was able to do it is because they have the engineering hammer. They have TPUs where they can really afford to put a lot of data, because it's very expensive to put not just all texts, images, and video and train this giant thing in a distributed cluster. They have TPUs, like their I think it's their fifth generation now, and it's pretty cool what they've done is that the first big model release that's using more experts.

I think they talked a bit about it on the previous BN but everyone was a bit disillusioned after the demo of the duck was not real. It is a duck, yes. But this one was described better. I mean, the interesting thing is that I think this time they learned their lesson, and I think it's actually working. The other cool thing about Gemini is it has a context window of a million tokens, which is huge. The GPT 4.0 is 128,000. So imagine what you can do with that because that's about like five books of 500 words or more. The cool thing about the Gemini 1.5 was their white paper has to say that on research they proved it to work on a 10 million token window, which brings a question for all of you: what does that mean for startups, especially a lot of the startups that we're funding with infrastructure that do a lot of RAG?

There could be the controversial argument that all these startups building tooling around RAG, which is a whole industrial right now, maybe they become obsolete. What do you all think about that? I feel like the people who care a lot about data privacy and where the data is stored are still going to want some sort of RAG system. They want the data stored somewhere they control versus all in the context window. It's not clear that that's going to be the biggest part of the market. Like in general, people who care this much about any behind the scenes architectural thing tend to be like early adopters but not like mass market consumer. So my guess is people just want like a massive context window because then you can start building the kinds of consumer apps people are excited about, right? Like the assistant that just has all this context on me that knows everything about me.

Currently, I think the best way you can do that is you like run Llama or one of these open source models, and then you like throw a bunch of your personal emails at it. That's like a project that the hobbyists on Reddit are doing a lot of is just try and get like your personal AI that's got all the information on you. But if you had like an infinite context window, you wouldn't need to do all of that. I think you'd still need RAG to be able to sort of store everything, and that's like sort of the long-term permanent memory. And then what you actually want is a separate workflow to pull out the interesting things about that user and their intentions, and then you actually have a little like summary bullet point of things that you know about the user. You can actually kind of see some version of this even now in Chat GPT. If you go into the settings under 4.0, it actually now has a memory.

So you can actually see a concrete version of this inside Chat GPT. I was just using it to sort of generate some like Where's Waldo images for my son, and it wasn't quite doing what I wanted. It kept using like making like really deformed faces, so I kept prompting it back to back. I was like, no, no, no, I really want no deformed faces. And then for a while, it was like I said I wanted a red robot in the corner, and it kept making all of the characters like various forms of red, and I said no, no, no, I really don't want you to do it. I repeated it four or five times, and then I went and looked at my settings, and it was like, Gary really doesn't want deformed faces in his generated images. We should also try not to use red.

It was interesting to see that like literally from even like maybe 10 or 15 different chat interactions. I was getting frustrated, but it was definitely sort of developing some sort of memory based on my experience with it. The most interesting thing was that you could see what the machine had like sort of pulled out from your interactions thus far, and you could like sort of delete it as necessary. Maybe an infinite window doesn't necessarily mean that the retrieval is actually accurate. Yeah, and this is more, I mean, more anecdotal in practice from what founders have told us versus what the actual research paper benchmark is, which is a very kind of lab setting. So in practice, I do tend to agree that a RAG pipeline infrastructure is still very much needed, exactly for what you said: privacy and people wanting to fine-tune models on their own data and not getting that leaked out over the wire over the internet.

The other thing is, yeah, maybe there still more accurate to do it on your own when you really want that very precise information. I think you still need RAG. And I think the analogy I like to think about this is sort of like processors back in the day in the 90s, as when Moore's law was actually more law scaling, it was not just a CPU processing speed getting faster but also memory cache levels were also getting bigger and bigger. But now, more than 30 years later, we still have a very complex architecture with how we do different kinds of caching for retrieving data out of like databases. Out of databases, you have maybe like a fast memory store with like Redis for high availability, and then you still have things stored in your browser cache. There are still very much lots of layers of how things will be cached, and I think RAG is going to be this foundational thing that will stay, and it'll be like how we work with databases normally now, just like lots of levels.

The tricky thing about the context window, I mean, Gemini may have, the team may have already fixed this by now, but certainly a lot of the founders I talked to, they said it's sort of, you know, the million token context window sort of lacks specificity. Literally, if you ask for retrieval from its own context window from, you know, or the prompt, it actually sometimes just like can't seem to recall it or can't seem to, you know, pick out the specific thing that you already fed into it. The tricky thing there is like you'd rather have a 128k context window that you knew was pretty rock solid rather than a system where you know it's still a bit of a black box. You don't really know what's going on, and then for all you know, it's just like sort of randomly picking up like half a million of the tokens.

And that, you know, again, like probably fixable. I can't imagine that that's like a permanent situation for, you know, a million or 10 million token context window. But something that we're seeing from the field for now also in enterprise, like in business use cases, people care a lot about like what specific data is being retrieved, who's doing it, like logging all of this stuff and permissioning around data. So, yeah, you can imagine having some kind of, yeah, a giant context window is not necessarily what you want in enterprise use case. You actually probably want in particular sensitive data stored somewhere else and retrieve like when it's needed and know who's making the requests and filter it appropriately.

Exactly, I think that will stay. I was really encouraged what you said actually about how the Google technology is maybe better than the Open AI itself. It feels very googly actually. It's like hey, they've got the technology but they just like don't know how to get like the polish around it correct. That means Open AI does not have this like leap forward unsalable tech advantage. If Google has something comparable, then we should expect to see like Anthropic come in. We should expect to see like Meta come in, and what we're seeing at the batch level is just the models are pretty abstracted out, right? On a day-to-day basis, like founders are already using different models to prototype versus like build and scale. The ecosystem of model routers and observability ops software around this stuff just keeps progressing really quickly.

So just funny, my initial reaction whenever I hear like the model releases is not to worry for the startups actually so much because they're all like we never talk about how reliant they are on any one model. I worry if there's one model that's very, very good and it'll be dominant and sort of take over the world. I'm less and less worried if there are many different alternatives, because then you have a market. And a marketplace equals, you know, non-monopoly pricing, which means that, you know, a thousand flowers can actually bloom. Like, other startups can actually make choices and have gross margin of their own, and I'd much rather see, you know, thousands of companies make a billion dollars a year each rather than, you know, one or two, let alone seven companies worth a trillion dollars.

I think we have a dark horse that is yet TBD. We don't know when Llama 3 with 400 billion parameters comes out because that's still being trained, and that's like one that's like, wow, it could really turn tables as well. The interesting thing about Meta is, I mean, they have probably one of the largest clusters. Certainly, I think I was reading, you know, in terms of who has paid Nvidia more money in the past year, Meta apparently is number one by a decent bit, actually. The funny thing is they have this giant cluster not because they necessarily have foreseen this whole shift that happened recently in the last couple of years with large language models. They acquired lots of GPUs because they needed to train their recommendation models, right? That use actually similar architecture with deep neural nets to actually compete with TikTok. Because to build these like really good recommendations on Instagram, that's just like very classic tech innovation and disruption, right? Like they're basically worried about competing with TikTok, they stockpiled a bunch of GPUs, and it turns out the GPUs are just really valuable for this like completely different use case that's going to change the world.

Jared, like on that note, if you zoom out just like how does this cycle of hey, like we're worried, startups are worried about the elephant in the room, in this case it's Open AI, maybe Google competing and crushing them, how does it play out to when we first moved out here? Even like in that era where Facebook was rising, Google was starting to go from the search engine company to like the multi-product company. Do you see any similarities or differences?

Yeah, it reminds me of that a lot. Like every time there's an Open AI product release now, it feels like there's a bunch of startups waiting with baited breath to see whether Open AI is going to kill their startup. And then there's all this internet commentary afterwards about like which startups got killed by the latest Open AI release. It reminds me a lot of when we got to YC, the three of us, in the like 2005 to 2010 era. There were all these companies who were innovating in the same idea space as Google and Facebook, building related products and services where the big question was always like what happens if Google does this? When startups were pitching to investors, that was like the main big question that they'd always get from investors: is like, oh, like but isn't Google gonna do this? The best response to that, by the way, was like, well what if Google gets into VC?

Which it did. That's a great VC. A lot of the people who are building AI apps now, this is the first hype cycle they've ever been in, but we've all been through multiple hype cycles. And so, I think it's interesting actually for the people who are in the middle of this hype cycle now where all this is new, to look back on the past hype cycles and see how the history of what happened there can inform their decisions about what to work on. If we take Google as an example, one thing that's interesting is if you look back, there was competing with Google in a very head-on way, which was hey, we're going to build a better search engine.

And YC definitely funded a lot of companies trying that. I feel like the approach people would go after was the vertical engine, say we're going to build a better Google for real estate, for example. Some of those made it, did they? I mean, you could argue that something like Redfin or Zillow clearly did have vertical access to data and then, or Kayak for travel, I guess, or Algolia for company enterprise search.

That's true. Okay, those, yeah, I hadn't thought of. I hadn't thought of Zillow as a search engine, but yeah, it's essentially that. It's exactly that. It's vertical search. But you have to monetize not necessarily through the same way a search engine would. You have to have other services. You have to become a broker. You have to, you know, basically make money in all these other ways. CET different, it doesn't look at all like Google.

And the data integrations is very different. Like you have to really poke and connect to MLS and a regular search engine wouldn't. Page rank wouldn't necessarily work with MLS. Yeah, Redfin's very interesting because I'm very addicted to Redfin, and it has actually absolutely caused me to buy property that I normally wouldn't buy. So, you know, in that respect, like those are interesting consumer scenarios. Ultimately, a great consumer is actually about buying just like a little bit of someone's brain such that during the course of one's day, I mean, it doesn't have to be every day, but ideally, it is, you sort of think to use it.

No one of those companies would have said that they had better tech or they beat Google on technology, right? Like anyone who went up head-on against Google for like the better general-purpose search engine just got crushed. In general, most of the vertical search engines didn't work, and certainly nothing that looks anything like Google worked. The ones that I remember the most were more ones that were in the vein of Google apps. Like when Google expanded beyond search and started launching Google Docs and Sheets and Slides and Maps and Photos and all these like separate apps, there were a lot of companies that we funded at YC that were either going to be crushed or not by the next Google product.

Yeah, that's like the Santa Casa when you can bundle software in. I mean, this is what Microsoft did to Netscape, right? Like once you can start bundling in software, especially in the enterprise, it's like people don't necessarily want to buy like 10 different solutions from 10 different vendors all the time. If you can offer a good enough product across several different use cases and bundle them together, enterprises often want that. I mean famously, Dropbox was in that potential rogu kale right because Drew actually talks about it when he comes back and gives the dinner talks about the fear when with Google Drive and Google had his other product Carousel thing, right?

Yeah, in fact there was a time when Dropbox had launched. This was after the batch and Google was working on Google Drive but hadn't launched. It was called G Drive, it was like the secret project inside of Google, and news of it leaked to the press. The whole world just decided that like Dropbox's goose was cooked, like it was over. Google was going to launch G Drive, and because it was Google, they had infinite money. They were going to do the same move that they're doing now, which is just throw like infinite money at the product and give away like infinite storage for free.

How could a startup possibly compete with Google spending billions of dollars to give away infinite storage for free? That was infinite tokens. Yeah, and now it's infinite tokens. What are the big companies trying to do right now that maybe you should avoid doing? And the super obvious one is well, Open AI seems to have released 4.0, which is multimodal, and then it also simultaneously released the first version of the desktop app. But that version of the desktop app is merely sort of a skin on the web experience.

But if you put two and two together, surely it's going to look a lot more like Siri. I mean they've been really sh—Scarlett voice. They just pulled that right? Yeah. They're like, oh shoot. You know, who knows? Are they getting sued? Who knows? That's what Twitter says today anyway. But I think if you look at the details of that, you can sort of sketch out what's going to happen with LLMs on the desktop, and the desktop sort of has access to all your files, has access to not just that but all of your applications. It has access to your IDE locally. It has access to your browser.

It can do transactions for you. That's starting to look like basically the true personal assistant that is directly consumer, and then that sounds like a whole category. Like, you know, we're going to interface with computers using potentially voice and certainly like, we will have the expectation of a lot of smarts, and that, you know, that seems like where they're going, and that's going to be one of the fights. When I was thinking back to like this first era of companies, I guess one thought I had is that it was fairly predictable actually what Google would build.

Not 100% predictable, like Dropbox was like, it was like unclear if Google would win that space. But like a lot of them are actually pretty obvious in hindsight, like ad tech for example. Like all of ad tech just like never stuck around because it was like too strategic to Google and Facebook, and so they just had to own all of it. Almost all of vertical search just didn't really survive. It's pretty easy to imagine what the next version of Open AI product releases is going to be, and if you can easily imagine that, what you're building is going to be in the next Open AI release, you know, maybe it will be using that framework. It's like Open AI really wants to capture just like the imagination, like the sci-fi imagination of everyone.

So it's like, yeah, like the general purpose AI system that you just talk to and it figures out what you want and does everything. It seems hard to compete with them on that. That's like competing with Google on search, right? That's clearly going to be like the core because that was the early signs of why what Chat GPT is being used for as well, just like a very, very rudimentary, right?

Yeah, which is the same thing with Google. They always wanted to own products where billions of people would all use the same product. Anything that was like that was going to be really tough as a startup. Yeah, when I think of it for products I use, like Perplexity, norwi company. But Perplexity is a product I use a lot because it's much better for sort of research. If I need to fix a toaster, it's way easier for me to type in like the model of the toaster into Perplexity and get back like specific links and YouTube videos.

Just the whole workflow, it was Diana who told me about it. Actually, I've been using it a lot as a replacement for actually my regular search. Yeah, that's what I never, I was trying to use Perplexity for a while and I couldn't get it, and I was, because I was trying to use it in the same way I would use like the Open AI the Chat GPT app, and I was like, oh, but like Chat GPT is just so much better because I just like type in fuzzy things, and it figures it out, and it comes back with smart things, and Perplexity just wasn't as good for that use case.

But the specific, hey, I have this task that I want like source material back and links for it, works much, much, much better. It doesn't capture the imagination, right? Like Open AI is not going to like release a model that they demo that, oh look, like if you search it, like it gives you the links back or it like shows you the YouTube videos that it's referring to. The demo is not as cool, actually. Gemini 1.5 has that feature. And nobody really talks about the demos from Google I/O. They're kind of like. So maybe one way to figure out how not to be roadkill is to like if you can build the valuable but unsexy thing that Open AI aren't going to demo on stage because it doesn't like capture the sci-fi imagination, you might survive.

Yeah, that's definitely a whole line of thinking. Like Google was never going to do Instacart or DoorDash's business or Uber, so all of that was fair game, and all of those turned out to be, you know, decacorn, or, you know, potentially, you know, even Airbnb, like hundred billion dollar company. The other thing people always underestimate is just, I think, the size of new markets. I remember for a long time people didn't believe LinkedIn could be a big company because, like, well, like why see, like Facebook won social networking? LinkedIn's just a social network. It's just going to be like you have your work tab on your Facebook profile.

Like, why would you need something else? Same thing with Twitter. I remember when I first moved to San Francisco in 2007, some of the first people I met were the early Facebook employees, and they were like, they saw Twitter growing, and they were like, ah, yeah, we're going to like release status updates or something and just like Twitter's going to be done as just a feature. But yeah, it turned out like Twitter was like a whole other thing. Instacart and DoorDash I think are another great example of this. Again, I remember when iPhone comes out, Android becomes pervasive, it's like, oh, there it's just going to be like Apple and Google dominate mobile. But there were all these things that they would never build. Same in this AI World probably, right? There's all these things that the big companies are never going to build.

And we probably have more appetite for using multiple AI agent type apps than just like the one Open AI one. A huge meta category that is basically almost anything that's B2B. Like Google basically never built anything B2B. They basically only built mass consumer software. And so if you look at the YC unicorns, like a ton of them built, you know, some like B2B thing like Segment or something that like Google was never going to build. Segment? That's just like not interesting to them.

I want, because I think in B2B, people really underestimate the human part of it. Like so much of it is actually the sales machine, and it's being willing to go out and figure out who you sell to. Do the sales, like listen to someone like give you all the things they're unhappy about and note them down and take them back to your engineering team and say, oh, yeah, we need to like tweak this, this, and this, and this, and all these details, right? Like I think of one of our AI companies at YC that's doing really well is called Permit Flow, and they literally just expedite the process for applying for construction permits, not just for individuals but for like big construction companies now as well. It's like, yeah, like really hard to imagine that being the next Open AI release, right?

Like hey guys, we built a feature for filing your construction permits. Can you, yeah, can you imagine turning up for your first day work as an Open AI engineer and they're like, okay, you're going to work on the construction permit workflow feature? They think it works that way. Well, I guess if you join those two ideas together, something interesting happens though. It seems sort of inevitable sometime in the next two to five years, you know, assuming the Open AI digital assistant comes out and then it's going to be on your desktop, it will actually know everything about you. It'll know what you're doing, and it'll know, it'll know minute to minute what task you're trying to complete.

And then it's conceivable, you know, if you mash that with sort of a launch that I think they probably didn't invest enough into, which was like the GPT store, you could sort of imagine that might extend into B2B as well, and then they would sort of charge that vig. But I think the thing that I don't think is going to work for B2B actually is I think there's a lot of sensitivity around the workflows on the data because they're highly proprietary, especially with spaces with fintech and healthcare. I mean for good reasons, they should be very regulated and a lot of privacy data to protect the consumers. So I think the other area that we've been having also success for AI B2B applications has been in fintech.

We found that Greenlight that's doing KYC using AI to replace all the human behind that does a lot of the validation of consumer identities. Or we also have Greenboard, right? Right. They both start with green. Greenboard that was also doing a lot of the compliance things for banks as well. Yeah, Bronco is doing it in AR, and there are a bunch of more companies doing things in payments and just any of the boring day-to-day that, you know, someone, I mean, is sort of rote doing it. This can just basically supercharge that and, you know, have one person do the work of 10.

Yeah, we call this episode better models better startups. I think that is literally true for B2B companies where it's like the underlying models like B2B software business models are so much about how do I upsell, like how do I make more money per customer next year than I did this year? And it just, hey, like every time the model gets better, you can just pass that along as like an upsell premium feature or an upgrade to the software, and your end user doesn't care, right? Like they just care about what the function the software can do for them.

So I think there's a world where the models keep getting better, you've got your choice of which one to use, and the additional functionality you just charge MS your customers for and you make more money. Yeah, that's definitely what we're seeing at YC. I mean last batch, people were making $6 million a year right at the beginning of the batch, and it ended up being north of 30 million by the end of the batch. So that's some really outrageous revenue growth in a very, very short amount of time, three or four months. And that's sort of on the back of what, you know, a few people working on B2B software, you know, they can focus on a particular one that makes a lot of money, and then people are willing to fork out a lot of cash if they see ROI pretty much immediately.

There's not as many founders working in this area as there should be, given the size of the opportunity. Like, to your point Harge, like people often underestimate how big these markets are. Like using LLM to automate various jobs is probably as large an opportunity as SaaS, like all the SaaS combined, right? Because like SaaS is basically the tools for the workers to do the jobs. The AI like equivalent of SaaS is like, it just does the jobs tool plus the people.

So like it should be just as large, and yeah, there should be like a lot more people working on this. So there might be, you know, billions to trillions of dollars per year going into transactional labor revenue that's on someone's, you know, sort of, you know, cash flow statement right now, but it'll turn into software revenue at 10x, which will be interesting for market caps over the next 10, 20 years.

I was doing office hours with a startup this morning that asked me this question about hey, like you probably saw the GPT-4 launch, like should we be worried about it? Yeah, my reply was you should be worried about it, but you should be worried about the other startups that are like competing with you because ultimately it's all of the stuff we're talking about. It's whoever builds the best product on top of these models with all the right nuances and details is going to win, and that's going to be one of the other startups in the space. So I just think the meta thing as a startup now is you have to be on top of these announcements and be kind of know what you're going to build in anticipation of them before someone else does versus being worried about Open AI or Google being the ones to build them.

Let's talk a little bit about consumer because we did talk about what could be potentially real for consumer startups if you're going against basically assistance, some sort of assistant type of thing. Open AI is hinting, well, strongly directing they're going in that direction of what about opportunities for consumer AI companies? What are those things that they could flourish?

Well, here's an edgy one. Anything that involves legal or PR risk is challenging for incumbents to take on. Microsoft giving money to Open AI in the first place, you could argue, was really about that. I mean, when image models, image diffusion models first came out at Google, they were not allowed to generate the human form for PR and legal risk reasons. This is a large part of what created the opportunity for Open AI in the first place as Google was too scared to jeopardize their golden goose by releasing this technology to the public. The same thing could probably be true now for startups.

Things that are increasingly edgy are often the places where there's great startup opportunities. I mean, things like Replica AI, which was an AI NLP company working in this space for many years even before LLMs were a thing, still one of the top companies doing the AI boyfriend or girlfriend. And the wild thing about Replica is that they've been in touch with their sort of AI boyfriend or girlfriend for many years. Earlier we were talking about, you know, a million token context window. You can imagine that virtual entity knowing everything about you, like for many, many years, like even your, you know, deepest, darkest secrets and desires. I mean, that's pretty wild stuff, but it's going to look weird like that.

And, you know, people might not be paying attention. I mean, Character AI has really, really deep retention, and people are sort of spending hours per day sort of using things like that. So, you know, whatever happens in consumer, it might be non-obvious, and it might be very weird like that. So there's a lot of kind of more edgy stuff around deep fakes that are applied in different spaces. So there's a company that you work with, Jared. Infinity AI, right?

Yeah, Infinity AI lets you turn any script into a movie, and that movie can involve famous characters, and so like enables you to make famous people say whatever is in your mind, which is edgy, which is part of what makes it like interesting and cool. Google would never launch that. Would never launch that, and I think even, you know, the same move that Open AI did to Google, which is being willing to release something that's really edgy. Well, Open AI is now the incumbent, guys. They now can't release super edgy stuff like that anymore.

We're going to see a lot of that during election season in particular, right? Because it's interesting when you think about it. Like anything that's on the, hey, like IS explicitly like a famous person, this is explicitly using the likeness of a famous person for a profit, is going to get shut down. On the other hand, you have like, if I make a meme with Will Smith and some like a caption, like no one's going to sue me for that. And a lot of this content is like right in the middle, right? It's like I'm not trying to build like a video that's literally I want people to believe that it's like these people saying these things, but what if it's like a joke about a joke or a satire? Like where does that fit?

And yeah, you can't see, you can't imagine Facebook or is going to roll this out on Instagram anytime soon, right? Like they want it. They want to stay well clear of that. But you're already seeing this version of memes, sort of 2.0 that are basically deep fakes that are making the rounds, and they're becoming viral tweets, right?

Yeah, hey, why don't we clear out by going to a question that one of our audience asked us on Twitter? So thank you, Sandip, for this question. The question is, what specific update from Open AI, Google, Meta excited each of you and why? I'll give one. The thing that really excited me about the Open AI release was the emotion in the generated voice, and I didn't realize how much I was missing this from the existing text-to-speech models until I heard the Open AI voice.

Oh, a bedtime story about robots and love. I got you covered! Once upon a time in a world not too different from ours, there was a robot named Byte. It's amazingly better compared to the incumbent text-to-speech model because it actually knows what it's saying. The existing ones, by contrast, sound so robotic. They're like, they're totally understandable, but they're just very boring to listen to.

And the Open AI one felt like you were talking to a human. My one was the translator demo, the idea of basically having a live translator in your pocket. It's personal for me because my wife is Brazilian; her parents don't speak English, and so I've been learning Portuguese, but it's coming along very slowly. The idea of having just like a translator that's always in my pocket that makes it easy for me to communicate with anyone anywhere in the world is really exciting.

Hey, how's it been going? Have you been up to anything interesting recently? It's a massive idea. I mean, it could change the world. You could go live in a foreign country where you don't speak the language. It has huge consequences. Douglas Adams' Hitchhiker's Guide to the Galaxy made real is a pretty cool one, I guess for me.

What's funny about 4.0 is it sounds like maybe it was actually just a reorg. Basically, there was a reorg at Open AI, and they realized they want all of the teams rowing in the same direction. And then what that means is probably really good things for both their Assistant desktop product, but also eventually robotics, which might be a really big deal down the road. This Chinese company called Unry announced a $116,000 human biped robot, though Twitter warns me that it's another $50,000 if you actually want Open API access. Previously they made a $114,000 version of that robot.

But I think unified models means more and more likelihood that practical robotics is, you know, actually not that far away. Famous last words, of course, we've been saying that pretty consistently for many years in a row but this time it's different. I think for me, maybe a bit more of a technical one. I know it doesn't sound too fancy, but really the half the cost is like a huge thing. And if you extrapolate that, what that means is probably a lot of these models are hitting some kind of asymptotic growth of how much better they can get, which means also that they're becoming more stable, and it can open up the space for actual custom silicon to process all of these and enable a lot more low power processing to enable robotics and build the device that you mentioned and actually have it in your pocket and not be tied to the Internet.

So all these things that we could perhaps see excitement of new tech product releases because I kind of missed those days when every product tech demo was like very exciting. Now it's just like kind of like a feature. True. We could be excited about new things coming up. Well, we're going to be really excited to see what you guys all come up with. That's it for this week. We'll see you next time.

Better AI Models, Better Startups

More Articles