IPFS, CoinList, and the Filecoin ICO with Juan Benet and Dalton Caldwell

31m read

·Nov 3, 2024

Hey, this is Craig Cannon, and you're listening to Y Combinator's podcast. Today's episode is with Dalton Caldwell, who's a partner at YC and Wamba Net, who's the founder of Protocol Labs, a YC company that's working on IPFS, Filecoin, and CoinList.

If you're just getting into cryptocurrency, I highly recommend listening to episode 244 of Tim Ferriss' podcast, which does a pretty good job of covering all the terms and explaining how they all connect to each other. Before we get started, I want to let you know that this is a really long episode, so it's pretty much broken up into three parts. Part one starts right after this, and it's an explanation of IPFS and Filecoin. Part two is our conversation with Dalton, and that starts around minute 11. Part three is Dalton answering questions from Twitter, and that starts around one hour and 40 minutes in.

All right, here we go. Let's just start with a description of all the words we've been talking about: IPFS, Protocol Labs, etc. So Protocol Labs is a research, development, and deployment lab for networks that I started to really build the IPFS project, build Filecoin, and create a place where we could create the kinds of projects that could turn into something like a faster Filecoin or other things.

I really wanted to build an organization that someone like Satoshi could have seen as a way to build a project through it and be like, "Oh yeah, instead of doing this on my own anonymity, I could go and like the open protocol apps." It is born out of a personal frustration where, when I was starting that defenseless project, I didn't have such an organization that I could go to and build a project. The only option was either university or Google.

In the university case, it would have been killed in the publisher-perished world where the response was, "Hey, this is way too ambitious; focus on one little thing, and maybe publish that and move on to the next thing." It would have not been an implementation project, similar to how the web could have never really been built as a grad student project.

Then the flip side of that is that this kind of tech is stuff that Google might be interested in funding from the perspective of Google funding a lot of protocols and funding research, but it also kind of runs counter to basic Google positions around data control and how information flows. It’s like hitting direct opposition, so it’s not that that probably shouldn’t have been funded or in direct control by Google. It’s the kind of stuff that has the potential to really rebalance power on the internet.

So, I figured I would start an organization separate from Google. Protocol Labs is really a group that is trying to create a number of these projects and protocols around things that we think are broken on the internet. The charge we have for ourselves, the mission we have, is to go and improve and upgrade a whole bunch of the software and protocol machinery that we have running the internet, both in low-level parts of the internet and like more user-facing pieces.

We have a very open-ended perspective of like, "Hey, we just want to improve computing in general and improve the pipeline of going from research to product that people use." It just happens that so, for now, and for the next few years, we're super focused on how information moves around the internet, how to distribute it better, how to change and rebalance power associated with that information, give people sovereignty over data, and make the internet more efficient.

To make it route around things like attackers and hostile censorship, we want to make information have more permanence. A whole bunch of questions around this. The two main projects are, you know, one of them is IPFS—the Interplanetary File System. It’s used by a ton of organizations: both businesses and projects and blockchain networks and governments, and so on.

It's used in a whole bunch of cases, and I think in a short way to describe it, because saying "Hey, it’s a large-scale content addressed distribution thing," and it is a protocol, a peer-to-peer protocol for moving around any kind of content—files, data, you know, hypermedia, whatever—in a peer-to-peer way, with proper content addressing and cryptographic verification and all this kind of stuff.

A whole bunch of tooling around the guts of making all of that work, which is peer-to-peer networks, and the ability to work across a lot of different transports. There’s no end to the important pieces of the peer-to-peer machinery that you have to build. The IPFS project was really about that too. It’s used by many people both in the blockchain space and outside. It’s being used in the blockchain space because it fits really well with the model where you have authentic data structures and you have hashes, and you address things that way.

So it’s outside because people want to distribute things in a better way. People want to address things by what they are, not where they are. It’s really time to make that transition from location addressing to content addressing, and in a big way.

We’ve been appointed to do so, and we have to slog through the really hard work of doing that, and we’re doing it. It’s great, and we’re committed, but you know, there’s more to go with what's the current status of making it all human-readable, because I know that was an issue early on.

Oh, like making human-readable names? Yeah, so human-readable names are interesting. Human-readable names should map to content, and people should use them when they're aware that a name is now subject to a consensus protocol. In a way, human-readable names either require a consensus protocol that is global in scale and agrees to make everyone agree on the value of what the name is or its relative meaning.

I think there’s GNS, which is like the new naming system that relies on a trust graph. So it maps more to how humans think about names, where you know, I might call a friend Jeremy, and I know him as Jeremy. But in reality, he might have a last name as well, and he has other names that he goes by on the internet. Other people call people Jeremy, right?

So, GNS, it’s an interesting approach of using trust graphs and social networks to name people. It’s a really interesting and good one, but it doesn’t give you the URIs or names that you can print on a billboard that a ton of people can look at and enter into a computer, which is the whole point of human-readable naming. So you really are stuck with consensus.

When you construct with consensus, you either have something hierarchical like DNS or you have something like watching naming—like Namecoin or ENS or block stacks—and you have a situation where human-real naming is important for people to type. But I think we have a massive addiction to human-readable naming, where it shouldn’t be used in a lot of places, because it brings in baggage around needing a consensus system, a network stack, and a whole bunch of things that normally you shouldn't need to just address or point to some information.

We still want hashes to be the main thing people use to link to things, just maybe allowing human readability as an entry point to all of that. Okay, huh. Do you want me to describe my opponent first or do you want to dive deeper?

Okay, yeah. So then the Filecoin project was born out of IPFS as a way to incentivize the distribution of content in the IPFS network. When you think about the problem of storing bytes of data in the world, you have a situation where there are a lot of people with disks and a lot of people with data. It’s effectively a market where people want to buy storage, and some people want to provide storage and provide a valuable service.

So, in the old peer-to-peer tradition, people would just share resources and kind of try and hope to achieve the right balance. It works for some use cases, but it doesn’t work for others. What was really missing there was understanding that this is actually a spectrum. On one end, some people contribute massive amounts of storage and don’t really need to use it, and on the other end, you have people that are asking for massive amounts of storage, want to store all their data, and don’t plan to contribute any storage.

This is basic, like "Hey, introduce a currency," and now you’ve mediated this market. So that’s what Filecoin is about—creating a currency that can mediate this market. Now, there’s a whole second aspect to it, which is you could look at a network like Bitcoin as an entity that managed to get tons of people on the planet to amass massive amounts of computing power to maintain the Bitcoin blockchain.

Can you create a different proof-of-work function that maintains the blockchain but instead of just trying to crunch hashes and find a low target, it causes a valuable side effect? That valuable side effect is: you have to store a whole bunch of files in order to have power in the consensus.

So a way of framing it is that the Filecoin consensus, if you want to participate in the Filecoin consensus and maintain the blockchain, what counts is not your CPU power as your influence over the consensus but rather the amount of storage you are providing to the rest of the network.

For that, we use proofs of storage, and specifically a new kind of proof we came up with—what we call a proof of replication—and that checks and verifies that content has been correctly and independently stored. It means not just different physical hardware but rather that a different array of bytes somewhere is being used to store this, and you can't duplicate that.

You can't cheat it, meaning you can't pre-generate a lot of the content and cheat this. We have a specific problem, but the thing there is that with this new work function, we can organize massive amounts of storage to sell in the network. So you get a lot of people to mine the currency, and then you can sell all the storage that is supplied to users and set up to mediate a decentralized storage network.

So let’s take it off. What’s your first question?

So my first question is, maybe start with the timeline of you as a founder, what your initial idea was, why you started the company, and just how we got here. Sounds good. So it’s probably late 2013 or so. I’ve been working on a whole bunch of knowledge tools. So this means software tools that can help you learn faster or help scientists figure out what’s in papers and so on better.

I found this really annoying problem, which is datasets, like scientific datasets, were not well-versioned, not well-managed, and so on. There’s a whole bunch to that problem, but it struck me as this hugely lacking thing that computer scientists have. We have versioning, and we also have BitTorrent. We know how to move large amounts of content data around very efficiently in a peer-to-peer way.

What really seemed to be missing was the sort of combination of getting BitTorrent to really enable these datasets to be distributed worldwide, well-versioned, and so on. That sent me on a path of re-engaging with a whole bunch of stuff I’d been thinking about prior—like many years before—all the peer-to-peer stuff. My background is in systems and networking; I studied at Stanford.

At the time, I was looking into things like wireless networks and why peer-to-peer networks like Skype work. I always thought that was a very untapped area of potential. It just seemed like the potential there was vastly underutilized.

All the problems of usability—I don’t know if you know my whole background, but the first company I started was distributed social networking. A lot of these ideas keep recycling every two years. One thing we noticed is how hard it was for users to get through the negative side effects of just having something that is peer-to-peer.

BitTorrent works pretty well, but even Skype—Skype kept it really… unless you're upstream bandwidth is saturating and you get a nasty letter from your ISP or something, you had no knowledge as a user. Sort of my takeaway during that era was that usability always trumped the elegance of peer-to-peer models.

Then when I saw YouTube take off—YouTube is exactly the sort of thing you would expect to be built on top of BitTorrent, but in fact, it was entirely centralized and they were streaming everything themselves. Holy cow, because it worked so well, and Flash video worked so well.

The culmination of those events happened, and my kind of knowledge going into this is that usability, to me, is a such an important concept to have these distributed systems used by end-users—absolutely great, without a question.

Very famously, I think Drew Houston has pointed out how there were a whole bunch of clunky sync file-sharing things that just did not work. The big thrust of Dropbox, for a while, was get usability right, get the user experience flawless.

It mattered what you do underneath the hood, as long as you make sure the experience was flawless. Back in the day, everyone said, "Well, we have our sync; that’s good enough; we don’t need Dropbox; we have our sync,” right?

But then there’s this other fundamental difference. Yes, absolutely, building these systems is hard, and you have to pay attention to the UX, but there’s a whole bunch of places where economically it makes a ton of sense to do something better and to do something that has a different arrangement.

There was a whole bunch of problems, like there was a period of time basically from 2003 to 2009 or so where peer-to-peer was sort of dead. I sort of call it like a peer-to-peer winter—similar to the AI winter, like there have been a series of AI winters. It was kind of like the peer-to-peer winter, and there probably was more peer-to-peer than just before because peer-to-peer is actually a pretty old concept.

A lot of people have been struggling with the differences between making things peer-to-peer or centralized since the beginning of the internet. There’s a whole bunch of reasons why a lot of the companies that were getting built around that time were products or why they were very few success stories.

I think Skype and BitTorrent were probably the biggest success stories from that entire time. I think that Skype, you know, didn’t really talk about peer-to-peer very much. BitTorrent, yes, a slide from Blizzard and a few others—like it was mostly used for moving around a lot of movies.

That said, though, it doesn’t negate the actual computer science behind it, like the actual engineering reasons for choosing to do something peer-to-peer make a ton of sense, and these sections connect very well with Protocol Labs as a company.

The key thing is to understand deeply what the benefits of using some technology are, what are the underlying, from a research and theory perspective, the theoretical difference between doing one thing one way or another, like between centralized models or decentralized models, between doing things peer-to-peer or doing things hierarchically, structured ways.

Those different properties can give you a different range of opportunities. Now peer-to-peer is a lot harder to build because you don't have a lot of control. When you build centralized things, it’s a lot easier for people to get going. It’s a lot easier with lots of established ways.

When you’re rolling out changes, I mean we can enumerate all these. When you're going along, it's easy to roll out a website; it’s hard to distribute software. One upgrade. I would argue that it’s easy to roll out a website today because you’re working on decades of centralized—yeah, whereas we haven’t had the same level, deep level of engineering on the peer-to-peer side.

So the majority of groups that end up going into peer-to-peer have to create a lot of stuff from scratch because it either hadn't been done or had been done in a way that wasn’t reusable. This was actually one of the big threats of the Antichinus project: create a whole bunch of reusable infrastructure—create a huge toolkit that people can use to build applications in peer-to-peer land without having to reinvent everything from scratch.

This was a huge frustration for us, like, "Okay, great. It's 2013-2014 at the time, and we have to go back and rewrite tons of normal peer-to-peer stuff that could have been written ten years before, mostly because the language and tooling have changed, or we want to do a few different things that can reuse tons of libraries that are out there," or the libraries made a whole bunch of assumptions about reality that were broken.

People get hung up on how things scale, but when you actually think about the total magnitude of data in a problem, sometimes you realize, "Oh yeah, just throw that into one server, and like you have one server." Whether you replicate that to five servers that are copies of the index and, like, you’re done, right?

Sometimes people just think about this to put it in context. In a lot of ways, history is repeating itself. The same ideas cycle back. Marc Andreessen said this before, but you know, Webvan kept funding ideas that didn’t work over and over again, because eventually it’ll work, like Instacart to Webvan.

So it seems like a lot of these ideas are well-known to researchers and computer scientists who are trying them again. There are a bunch of things that are different. You listed a few of them, but just to enumerate them, the tools are better; is that one of them?

Yeah, massively so. Just the tools are better—something about the hardware infrastructure of it—yeah, like bandwidth plus CPU computing changes, just that you have numbers.

The actual raw numbers that people have either decreased due to Moore's Law-type stuff. It's not just Moore's Law because, you have to account—so there’s accelerating returns in computing and storage, and not so much in bandwidth, right?

So an interesting point to compare is realizing that storage is decreasing cost super rapidly, whereas bandwidth is not; it always feels like the internet is really slow because we continue to build larger and larger applications from larger media. But then we can’t get to moving it around as much.

Wait, say that again. There is a trade-off between certain bandwidth where storage is significantly declining, and it's getting cheaper at a rapid rate, whereas bandwidth is not. Because of that, you end up with the feeling that constantly you're saturating your pipe and that constantly the internet is slow and so on, but you're just putting a lot more data through it.

And then also bandwidth is just not improving as fast. Eventually, we're going to get to a point where it might actually be cheaper to ship around stuff to consumers. I drive something through the internet—it’s crazy.

I mean, already, if you look at how large companies move data, they do not send it over the internet. They send it over packages or move it around physically in some other way, like direct fiber—like something like if you do data center transfers, you have a direct fiber line, and it's not actually on the internet.

That's right! Like, if you have fiber elite, it's not really an uplink because you're in the core; you're using a test link between two data centers. But for simple, if you're a company, you’re trying to put data into Amazon. They’ll say, "Hey, like, ship it with a hard drive, and we’ll put it on for you."

So there’s that packet switching, and there’s also package switching. Okay, so are those the big differences that I was trying to list in enumerating? Am I forgetting any other major factor about why this time—like we’re running the same play again, but this time, it’s going to work?

Well, I don’t think this play is vastly different. I think that when you think about what the projects are trying to do and what they're building and what applications are, people are going for it. It’s very different. So I think, like, maybe Mojo was one exception where they were really far ahead thinking about cryptocurrency and resource sharing and all that.

Because remember, it was hard drive space. Again, as a user, I could rent out my hard drive space; I could rent out my CPUs—there were three things. It was bandwidth and CPU—you earned Mojo from each of those things this rate.

So if you think about people, there were a few people around that time, especially like Cipher, mailing lists. You can go back and read a bunch of ideas that have just become reality in the last few years. There were definitely a lot of people already thinking about the things that we’re doing now, but they were nowhere close to doing them.

So there's one big difference between this wave and the last wave: being able to access a range of applications that were kind of dreams and ideas back then, but we’re kind of far away makes this wave actually quite different in goals.

When you think about peer-to-peer and what was working really well with peer-to-peer networks at the time, it was mostly pretty simple peer-to-peer structures. There were people using DHTs; there were definitely people doing some amount of distribution of files and so on—but it was mostly around very simple file-sharing problems.

To summarize, the use case really matters—that’s what you’re saying?

Well, I think both the tooling and the use case that people got to are very different. You didn’t yet have smart contracts; you had the beginnings of what smart contracts were going to do, but you didn't have them in the level of trust as seen as very amusing today.

That is a very important piece of infrastructure. Once you deploy something like Ethereum, a whole bunch of other things become instantly possible, which you know did not have all the time. You did not have this kind of worldwide computer effectively that allows you to run some very expensive but trustless code, and you don’t have to trust the computers running this code on their output.

You have a way to verify that all the computations are working correctly, and you could call this concept. Let’s try to use the same thought experiment there was—for example, infinite demand for free music. I remember I was in college with Napster. There was a product that everyone wanted. Yes, it was illegal, but there was infinite demand for that.

What is the closest analog for the current generation of things that you think there is inherent consumer demand for that can drive the industry equivalent? I think it pushes this wave!

There’s a lot there because, first of all, it’s not about consumers, okay? The peer-to-peer wave, the reason why it’s massive is not because consumers are using it. I think that’s one of the things that Silicon Valley has failed to understand.

I think in 2013 and 14, a lot of the blockchain tech was being built in New York and Europe, far ahead of Silicon Valley. I remember having a lot of conversations with people here in New York and Europe, and just the level of thought outside of Silicon Valley was vastly superior.

It was very surprising and annoying to me because I’m like, "Wait, I’m supposed to be the place where all the tech gets developed and so on." The reality is that it’s not that.

There were more people thinking. Certainly, people in Silicon Valley understood all of that and thought about it and so on, but the understanding about what businesses or what value propositions might actually be useful in Silicon Valley was dramatically centered around consumers.

In reality, what Bitcoin and Ethereum did was allow you to create any kind of financial instrument extremely cheaply and with almost free verification of the correct proceeding of this financial instrument, which is not normally a consumer need.

Okay, well let’s take another regard to the consumer part. What is the burning desire or need that you think is best solved, but it’s not one thing? Like, really, it isn’t?

Well, okay, we named five things. So what is the burning, and it’s okay if it’s not consumers, but what is the thing that with Filecoin that is going to make, whether it's businesses or consumers, people get really excited about using it?

In fact, when it’s not representative of the entire industry, right, Filecoin is one example where the point is being able to solve a whole different argument, which I think makes sense with or without a peer-to-peer winter or summer.

The fact of thoughts around Filecoin is about thinking about the massive latent storage that’s out there and putting it to good use. There’s exabytes of storage that are not in use right now, and if you were to add them to the market, you would drive the price down significantly.

Trying to end up being true, whether there is currently a peer-to-peer wave or whether or not people are excited about peer-to-peer in any way, is there’s a point that you can build a network like Filecoin that can use decentralization and can use financial assets created cryptographically to then organize a massive group of people on the planet to then do this.

Before getting all of the excitement around decentralization, just think about Bitcoin as a way to incentivize people to add tons of hardware to a network. Nothing like it has generated a massive, massive amount of computing power dedicated to trying to find hashes that are a low target.

You have tens of thousands of people that worked very hard to add a bunch of hardware to this, and you end up with this insane hash rate where, when you work out the amount of power and computation it’s using, it’s one of the most powerful computer networks on the planet.

So when you take that idea and create a very strong financial incentive for people to do something around the planet, you then couple it with building some other kind of resource-sharing network, something like Filecoin, you can organize a massive network as well.

You can put all of that latent storage that is already depreciating and going to waste into valuable use, and that, so far, kinds of business that you have to think about.

These networks are services and businesses that are solving some set of problems, but it’s fundamentally different than trying to box it in to say there’s one thing the entire industry is trying to do.

In fact, when it’s completely different than the entire industry, we’re using the things from the industry to create a really powerful service.

The reason I mentioned financial instruments is that it’s the fundamental innovation that both Filecoin and Ethereum might reduce the ability to create financial instruments extremely cheaply without spending tens to hundreds of thousands of dollars.

Instead, it can be done in a few lines of code, and you don’t have to litigate this in court if it goes wrong; it just automatically settles in a computer.

What happened with the blockchain stuff is that software began to eat finance and law in a way that had never happened before. There were a whole bunch of things that were kind of waiting—a lot of ideas that people had for a long time.

Some of them were a few years, some of them decades, that got knocked loose by the existence of a digital currency.

You could argue that the Bitcoin and Ethereum ecosystems have enabled this. Now, a ton of these applications are being able to build on something that you didn’t have all the time. You didn’t have the kind of worldwide computer, effectively, that allows you to run very expensive but trustless code to verify that all the computations are working correctly.

You call it a conceptual. Let’s try to use the same thought experiment. There was infinite demand for free music—like, I remember I was the right age and in college with Napster, where there was a product everyone wanted.

Yes, it was illegal, but there was infinite demand for that. What is the closest analog for the current generation of things that you think has inherent and consumer demand for?

Well, there is a burning desire and it’s okay if it’s not consumers, but what is the thing that, with Filecoin, that is going to make it, whether it's businesses or consumers, people get really excited about using it?

In fact, when it’s not representative of the entire industry, Filecoin is one example—well, the point is being able to solve this whole different market in that sense.

This is a whole different argument, which you may see, with or without peer-to-peer winter or summer, the fact that thoughts around Filecoin are about thinking about the massive latent storage that’s out there and putting it to good use.

There are exabytes of storage not in use right now and if you add them to the market, you could reduce the prices. Trying to end that stream—whether or not there’s currently a peer-to-peer wave or whether people are excited about peer-to-peer in any way.

Look at Bitcoin as a way to incentivize people to add tons of hardware to a network. Nothing like it happened to generate a massive amount of computing power dedicated to trying to find hashes that are low, with tens of thousands of people that worked hard on this.

When you create a strong financial incentive for people to do something around the planet, you can couple it with building some other resource-sharing network like Filecoin.

Let’s take it off. What’s your first question?

My first question is that maybe you should start with the timeline of you as a founder, what your initial idea was, why you started the company, and just how we got here.

Sounds good. So it’s probably 2013, late 2013 or so. I’ve been working on a whole bunch of knowledge tools, so this means software tools that can help you learn faster or help scientists figure out what’s in papers and so on better.

I found this really annoying problem which is datasets—like scientific datasets were not well-versioned, they were not well-managed, and there’s a whole bunch to that problem.

But it struck me as this hugely lacking thing that computer scientists have. We have versioning, and we have BitTorrent, and we know how to move large amounts of data around very efficiently in a peer-to-peer way.

What really seemed to be missing was the combination of enabling BitTorrent to allow these datasets to be distributed worldwide, well-versioned, and so on.

That sent me on a path of re-engaging with a lot of stuff I’d been thinking about prior—many years before; a lot of peer-to-peer stuff. My background is in systems and networking; I studied at Stanford.

At that time, I was looking into things like wireless networks and why peer-to-peer networks like Skype worked.

It always taught me that that was a very untapped area of potential. It just seemed—the potential there was vastly underutilized. All the problems of usability.

I don’t know if you know my whole background, but the first company I mean that I started was distributed social networking. A lot of these ideas keep recycling every two years.

One thing that we noticed is how hard it was for users to get the negative side effects of having something. Just having something that is peer-to-peer; BitTorrent works pretty well.

But even Skype kept it; really, you didn’t know there was unless your upstream bandwidth was saturating and you got a nasty letter from your ISP or something; you had no knowledge as a user.

So sort of my takeaway during that era was that usability always trumped the elegance of peer-to-peer models.

Then when I saw YouTube take off—it is exactly the sort of thing you would expect to be built on top of BitTorrent, but in fact, it was entirely centralized and they were streaming everything themselves.

Holy cow because it worked so well, and Flash video worked so well. The culmination of those events happened, and so my knowledge going in is, usability to me is such an important concept to have these distributed systems that are used by end-users.

Absolutely great, without a question!

I think famously, Drew Houston pointed out how there were a whole bunch of clunky sync file-sharing things that really just did not work. The big thrust of Dropbox for a while was just to get usability right and to get the user experience flawless.

It almost doesn’t matter what you do underneath the hood as long as you make sure the experience is flawless.

Back in the early days, everyone was like, "Well, we have our sync; that’s good enough; we don’t need Dropbox; we have our sync!”

But then there’s this other fundamental difference, which is, yes absolutely. Building these systems is hard, and you have to pay attention to the UX.

But there’s a whole bunch of places where economically, it makes sense to do something better and to do something that has a different arrangement.

I think there was a period of time basically from 2003 to 2009 or so where peer-to-peer was sort of dead. I sort of call it like a peer-to-peer winter, similar to the AI winter—like a series of AI winters that were kind of like a peer-to-peer winter, and there probably was more peer-to-peer just before because it’s actually a pretty old concept.

A lot of people have really been struggling with the differences between making things peer-to-peer or making them centralized since the beginning of the internet.

And I think there’s a whole bunch of reasons why a lot of the companies that were being built at that time were products failed and why there were very few success stories.

So I think Skype in return probably are the biggest success stories from that entire time. I think Skype didn’t really talk about peer-to-peer very much.

Yeah, I think BitTorrent—I don’t know, maybe with Blizzard and a few others, it was mostly used for moving around a lot of movies and so on.

But it doesn’t like the actual CS, the engineering reasons for choosing to do something peer-to-peer. It makes sense, and the sections connect very well with the protocol as a company.

The key thing is to understand what the benefits of using some technology are and what the underlying research and theory perspective theory is, the differences between doing one thing one way versus another—between centralized models or decentralized models.

When doing things with peer-to-peer or doing centralized models that are structured in a way, those different properties can give you different opportunities.

Now peer-to-peer is harder to build because you don’t have that control. When you build centralized things, it’s easier for people to get going. It’s a lot easier, with lots of established ways of doing things.

So when you roll out changes, I mean we could enumerate all of these, but like when you’re going along, it’s easy to roll out a website; it’s hard to distribute software.

What I would argue is that it’s easy to roll out a website because you’re working on top of decades of centralized—a lot of people are not sure this adds up, whereas we haven't had the same level of deep engineering on the peer-to-peer side.

So the majority of groups that end up going into peer-to-peer end up having to create a lot of stuff from scratch, because it either hasn’t been done or has been done in a way that isn’t reusable.

This was actually one of the big threats of the Antachinus project in general: to create a whole bunch of reusable infrastructure—create a huge toolkit that people can use to build applications without having to reinvent everything from scratch.

That was a huge frustration for us, like, “Okay, great! It’s 2013-2014 at the time. We have to go back and rewrite tons of normal peer-to-peer stuff that could have been written ten years before, mostly because the language and tooling have changed, or we want to do a few different things that can reuse a whole bunch of libraries that are out there,” or the library has made a whole bunch of assumptions about reality that were broken, right?

I mean, very famously, a lot of people from engineering perspectives assume that you’re going to work on top of TCP and that the server you have is a TCP port and not a UDP port or whatever, or even you don’t have some other transport. That can make your library completely unusable for a project like yours.

On the road, I remember dealing with traversal problems.

It was wonderful problem we filled. Waves of people.

“Alright, let’s go back! You were interested in distributed systems—this was interesting to you. How did this turn into the company? What was the thing you applied to—YC with?”

Great loads! I applied to YC with the plan of doing this by building both IPFS and Filecoin, a company called Protocol Labs.

From the beginning, it was a large-scale plan to go and build a whole bunch of different things around distributed peer-to-peer systems, a lot on centralization and with the business model of taking portions of currency.

This was in 2014 when this was a very new thing; people weren't doing this. There was basically Ethereum and a couple of other groups that had gotten to the same conclusion.

I mean, aside from a few side projects that we started and so on, we basically delayed our timelines in terms of software taking much longer than expected, but we pretty much followed the plan.

In those two, we had both IPFS and Filecoin, and the, you know, I guess connecting to what I was saying earlier, I had this problem around datasets and versioning and so on, and that led down the rabbit hole of really thinking through how information moves.

I worked on how information moves on the Internet in the first place. How does it address, how does addressing work in general? It turns out with HTTP and so on we do all this location addressing stuff that works for certain use cases, but is absolutely terrible for a bunch of other use cases.

It introduces brittleness to the infrastructure. So this was about exploring a set of ideas that had been well-trodden by lots of groups before me, and before the current wave of peer-to-peer.

Of course, I would run it in my dorm at Stanford. It had a lot of the primaries in there, right? I ran a node, I had storage space on my PC, it was great.

I was not familiar with Mojo until I chatted with Zuko about it, and it turns out Mojo pioneered all of this! I thought it was so cool. I drank the Kool-Aid.

That was in 1999, and I thought that was my favorite. Yeah, it was an era where you had the beginning of, you know, the death of decentralized networking, and a whole bunch of people building peer-to-peer networks…it was very promising—the moment everyone was getting connected to the internet.

You could now build large-scale infrastructures and so on. It just kind of, you know, again—like there was—this is the winter—that there were a whole bunch of reasons why that happened.

People could sit around debating, but I think it had to do with the first primary use case that people were using peer-to-peer for was copyright infringement, and that was not a viable strategy for a lot of companies.

Another thing was it was right around the time of the rise of the normal cloud. Google and other companies invested deeply into building large-scale addition resistance, and they were building hierarchical structures.

They ended up funding a ton of work down the road in a bunch of labs, so a lot of the labs were doing peer-to-peer research, switched entirely to doing cloud infrastructure research.

I think that another point is there was no digital currency, so you couldn’t pay people correctly; you had to have a broad, trusted—the beginnings of Kripacharya—and Mojo; there was a moment for a currency, as I worked over, but you didn’t have the same properties or the same level of trust to sustain it, and you didn’t have the properties quite there with digital currencies.

I think another one was just the hardware that people had did not warrant a peer-to-peer structure yet, meaning it made sense for a number of use cases but a different set of use cases didn’t really make that much sense.

It’s interesting to think about computing and normal computing problems this way. A lot of people always get hung up on how things scale, but when you think about the total magnitude of data in a problem, sometimes you realize, "Oh yeah, like, just throw that into one server, and you have one server."

It’s a much simpler well, whether you replicate that to five servers or all full copies of the index, and you’re done, right?

So when you think about this to put this into context in a lot of ways, history is repeating itself. The same ideas cycle back. Marc Andreessen has said this before—that Webvan kept funding ideas that didn’t work over and over again because eventually it would work, like Instacart to Webvan.

It seems like a lot of these ideas are well-known to researchers and computer scientists who are trying them again. There are a bunch of things that are different. You listed a few of them, but to enumerate them, so I understand, it’s just the tools are better; is that one of them?

Yes, massively so, just the tools are better. There’s something about hardware infrastructure as well, like bandwidth, plus CPU, computing turns into much—just that.

The actual raw numbers that people have, either just for disks—so it’s Moore's Law-type stuff. It’s not just Moore's Law because you have to account—there’s accelerating returns in computing and storage, and not so much in bandwidth, right?

So an interesting point to compare is realizing that storage cost is decreasing rapidly, while bandwidth is not. This always feels like the internet is slow because we are building more and larger applications from larger media but then we can’t get to moving it around as much as needed.

Wait, say that again. We have a straight-up between certain bandwidth where storage is significantly cheaper, and it’s getting cheaper in a really rapid rate, whereas with bandwidth it’s not as increasing in pace, and because of that, what you end up with is the feeling that constantly you’re saturating your pipe and that the internet feels slow.

It’s because we are putting a lot more data through it. And then, bandwidth is just not improving as fast, so eventually, we’re getting to a point where it might be cheaper to ship around stuff to consumers than drive it through the internet. It might actually be cheaper to transport things than to transmit them, which is crazy.

Already, if you look at how large companies move data, they do not send it over the internet. They transport large data physically instead. For example, with data centers, transfers have direct fiber lines that are not even on the internet, that’s a server to server-based connection.
So if I understand you correctly, you are pointing out that there is a tendency in various fields of technology to converge on a similar understanding, whether AI or open-source software?

Yes! You can say that. If you can, please drive that point home.

Okay, so a realization I'm having is that there’s an interesting role AI can play in this context of not just technology, but the evolution of social systems and the ability to leverage the underlying infrastructure in ways that create scalable systems that are adaptive and effective.

In that sense, there's a synergy in how we harness technology whether applied to communication, finance, data storage, or social interaction—it's all interconnected.

Does that make sense?

Absolutely, it highlights how technology can intertwine to shape holistic systems that not only provide solutions but also foster resilience and adaptability.

What you're saying strikes at the core of innovation, blending efficiency, and resilience through decentralization within various ventures.

Yes, it layers, providing a framework whereby the resultant network effects can enhance performance, support integrity of decentralized measures, and ultimately create a more robust and expansive infrastructure for future conjectures in societal mobile computing, certainly integrating a wealth of feedback mechanisms among peer entities as they align across these emerging matrices.

As innovations evolve, so too must the recognition that decentralized systems don't negate authority but instead build collaborative dynamism that afford the liberty to effectively manage, allocate, and harness collective computational resources within networks—aligning sharply with the principles central to paradigms like Open Source ethos and decentralized collaborative economies.

Wow, that’s a compelling synthesis!

I appreciate your response!

I genuinely appreciate your engagement as well.

While pursuing this yield, we uncover more intricate potential navigating the landscape ahead, where our contributions may ripple outward, influencing these interconnections and creating rungs to greater heights—advancing an ongoing understanding of the intricacies involved in decentralized architecture and transdisciplinary implications.

I couldn't agree more! It's about that proactive engagement, iterative exploration of evolution.

Absolutely, it's a continuous cycle of growth and learning—a true testament to the fluid nature of advancement across interconnected disciplines.

Let’s take that momentum to explore your thoughts on some specific models or aspects of decentralized systems that you find particularly fascinating.

Now that sounds like a great way to wrap this up! Thank you for this enriching dialogue.

I am looking forward to diving into that next time! Thank you!

IPFS, CoinList, and the Filecoin ICO with Juan Benet and Dalton Caldwell

More Articles