Fermat's Library Cofounders João Batalha and Luís Batalha

42m read

·Nov 3, 2024

You guys are brothers, right?

Yeah, yeah.

Okay, he's the older one. I'm two years younger.

Okay, and what made you want to start for Matt's library?

Oh, so just for the people that don't know what it is, Vermont is a platform for annotating papers. If you want to think about it, we imagine a PDF view in your browser, and then you have annotations on the side that support LaTeX and Markdown. You can add annotations in parts of papers that you think are particularly tough to understand, or you think there could be more content added there.

So it's something that we've done with the four of us that started Format. We all have a technical background, and so after college, we kept on reading papers. Every once in a while, we had this internal journal club where we'd read a paper and present it to the others. I remember, for instance, presenting a few years back, pretending the Bitcoin paper—Louise and Myka, who don’t have a CS background. You kind of have to go into France for the Bitcoin; you might have to go into, okay, what's a hash function, what's public key encryption? We were already doing this, and we knew that you also have this behavior offline in places like universities.

We wanted to take that experience and bring it online. We thought there was a lot of content that you end up producing while you're trying to read a paper, which can be the most dense piece of content that the human can read sometimes. The language can be incredibly Spartan, and sometimes there's a step in some paper that they say, "Oh, this should be obvious," but then you look at it, and it's like, okay, I don't get it. We knew that there was a lot of content there that you end up producing while trying to understand a paper, and we wanted to bring that online.

But at least you were in physics?

Yeah, before I studied physics together with Mica, and one time I went to MIT. I studied economics, and you studied CS, right?

Yeah, yeah.

So a lot of the papers are around physics, math, economics, biology, CS, right?

Yeah, yeah. Because that was, you kind of like solved the cold start by just annotating yourself.

Exact right, and yeah, that's more about getting the offer in there.

Exactly, the kind of growth. The first paper was the Bitcoin paper, yep. And still the most commented, right?

Yeah, that one has a good number of comments. It's been there for the longest, and it was quoted. Just there are a bunch of news sites that have pointed back to it. It's like, okay, if you want to read it, go to the annotated version. But we had a few people comment there—Lawrence Lessig on the Bitcoin paper, a bunch of people from the Bitcoin community exactly commenting there.

But the larger goal with Vermont is to try to move things in the right direction, meaning move science towards what people call open science. Mhmm, and so that encompasses a number of things from open data, which means just sharing the data that you've used for publishing whatever research you might be publishing. You want to share that and make that easily accessible to people so that if they want to replicate the results of a study you got or use it in their own research, they have an easy time doing that.

So that's open data. You also have just publishing the code that you used, or the algorithms, and making those more easily available to people. There's also open publishing, which means publishing papers that are not behind paywalls. There are a lot of things that are within open science. All of those, and then there's also so we want to push things in that direction and also try to build a platform that makes it easier for people to collaborate.

And we think there are a lot of things that could be happening nowadays where people could be collaborating, scientists could be collaborating remotely a lot more than they are. Or that's at least the way we think. But it's starting to change where we've had for the paper, there are those.

Yeah, I think this is actually a trend we're seeing more and more people collaborating online around papers. For instance, there's this famous example around the problem called the Erdős discrepancy. This problem is a famous problem that was posed by Paul Erdős, which is like this famous mathematician, 80 years ago. Terence Tao, the Fields Medalist, was trying to solve the problem, and he put it on his blog that he was trying a certain approach to solve the problem.

Then there was this guy from Germany that just wrote a comment there, like the size of a tweet, and he said that the Erdős problem had the pseudo-code flavor, and some of the machinery they were using to solve the pseudo-code problem could be used there. That was actually the key to crack the problem, and they ended up publishing the solution to the discrepancy problem, which was probably one of the biggest milestones in number theory in 2016. That was all thanks to a comment on his blog and to the fact that they were collaborating online or solving that problem.

But she's also a polymath. The Polymath Project was a project started by these other Fields Medalists called Tim Gowers, and they were trying to—it was actually a social experiment to see if it was possible to solve math problems online, collaborating around math problems online. And yeah, they were able to solve it. Thanks to that comment, you kind of see it.

You look at GitHub, and then you think of the impact that GitHub has had.

Yeah, for open source. Open source, of course, existed much before GitHub, but it has really allowed a lot more people to come in and be able to get into open source and start contributing. And there are a number of other really interesting platforms—you have Wikipedia just for more general knowledge, or you have Stack Overflow for just programmers helping each other.

We think that there could be something similar to that but for science in general, right?

Well, because did you listen to the Rogan with Peter Attia?

Parts of it, yeah.

Yeah, there was a really good one. He talks about—I don't know if they're talking about the archive in particular around publishing papers, but he talks about having full-time staff.

Oh yeah, just scrubbing the data, looking for interesting information coming out. And again, like in the context of Stack Overflow, that's the place where programmers find specific answers to problems, whereas with the archive, like, good luck finding that stuff.

Yeah, and so have you guys thought about addressing just discoverability in the context of particular fields?

It's a really tough problem.

Yeah, like for instance, paper recommendations, it's really hard because you're just doing one a week right now, in addition to the browser extension. We also have our tool that is used internally at universities and research groups for people that are reading papers together, and they add annotations. But for now, we have the weekly journal, so we release a paper every week that we select, and we annotate it, or somebody in the community annotates it.

Then we have the archive extension that adds a bunch of features on top of the archive, like big tech extraction, reference extraction, and comments. Eventually, we definitely want to add things like a recommendation engine and making it easier to discover papers that are relevant to you.

Yeah, that's something we definitely want to add to our archive extension, but it's a tough problem.

It is, yeah. Initially, we started Formats, as John said, as a journal club. Then we saw that people liked the interface, the formatting interface, and liked reading the annotations. So now we are starting to expand and turn Formats into more of a platform. That's why we decided to do the archive Chrome extension.

Because archive, for people that don't know what it is, is basically a place where papers leave before they go to journals in the form of preprints. So they're like drafts before they go to journals. What we did is we built a Chrome extension that allows people to see all the commenting interface on archive papers. You don’t have to go to another website; you're just reading archive papers, and you see the comments on the side if you have the Chrome extension installed.

A lot of these papers don’t even have comments.

They don’t—It’s like best case, you're emailing the author.

Exactly, yeah, so what archive does is basically they just host papers. That's the core functionality of archive. One of the things that we noticed, especially for areas like machine learning and deep learning, is that archive is super important because the papers—the new papers are coming out at such a high rate that people don’t wait before the papers go to journals before they start working on top of it and using the stuff that other people discover.

So all the papers are published on archive, and so you need a way to distinguish good quality work from bad work if you're reading a paper on archive that hasn't been peer-reviewed or something about machine learning. I think that's why the librarian extension is so important.

Does the librarian extension have a rating mechanism as well? Like, how do you distinguish good from bad work?

Right now, it's only through the comments, but we are actually thinking about implementing some sort of rating system for papers. We're probably going to run a few surveys to our audience too, because you could do it in a number of ways. You could have likes or dislikes or upvotes and downvotes, so you could either just have an analytic rating for the whole paper. You could also imagine rating it on a number of different aspects of the paper.

It could be about, okay, how big is their dataset if they're using some dataset, or what do you think about their methods? So you could have a more complex rating. We've been thinking about that a lot, and we're just trying to figure out what makes the most sense there. But that's also definitely something we would love to add to our archive extension too, yeah.

So how do you think the collaboration plays out then? Because I understand how, you know, say for instance, you're a physicist, you start commenting on someone else's paper, you start a discussion that creates a new project. Do you think you'll go further than that? Are you talking about forking and that kind of stuff?

Yeah, I think you could—there's a lot of things that you could do. Once you have a platform that has more people in it and that they're doing more stuff, and so that's why the way we've been growing Fermat is with the goal far in the future where we are a much broader platform.

Right now, we're focused mostly on solving problems that people have. Nowadays, and actually, we were largely inspired for our archive extension by the survey that the archive guys did, where they added—I don’t know how many people—but they surveyed the people that use archive and then published a paper where they described the problems those people reported while using archive and the features that they most wanted to see.

Then the archive folks just said, "Hey, we're going to be the platform to build upon, and we're not going to do all these things that people want us to do, but here it is, this is what people want to see." There’s anybody else that wants to work on this—here are the results of the survey.

Since then, they've done a pretty great job of building an API and wanting to become more of a platform. So there's a lot of ways that we envisioned that you could have collaboration around science.

So, yeah, like forking a paper or forking some type of research data. Exactly. There are a lot of things that could be done there. It's not something that we're focusing on right now; right now, we're just trying to solve these problems that people pointed out and create a place where people can just post comments and discuss around a paper.

An example of the problems that people mentioned was, like, for instance, reference extraction. If you go to a PDF, you have at the bottom of the paper, you have the references that they used, and most of the time, when people want to search the references there, they have to copy the text in the PDF, put it on Google, and try to find the link to the paper.

One of the things that we did with our Chrome extension is we allow them to just click on a button in the Chrome extension, and then they see a list of references with links to the papers. So that was one of the features that was most requested by the archive users.

Our idea was initially we wanted to convince people to install the Chrome extension and so let's solve the hair-on-fire problems that they are describing here. Once we have people using the Chrome extension, then we can expand into open collaboration around papers since they are already there, yeah.

So that was, do you guys know of anyone working on publishing negative results? This is something that fascinates me.

Yeah, and like basically the problem is that as an academic, you're not incentivized to publish negative results because you want to publish things that have high impact so you can get a job, or a tenure position, or just get people to even care about your work, right?

So, they don't publish. Do you know anyone working on that?

I know of researchers that are studying that field a lot, but unfortunately, that’s a very large problem, and people are becoming more aware of that. With that, you have negative results; you also have people doing a lot of research into p-value hacking.

Yeah, explain that.

Yes, so p-value—it's essentially a standard that people use in order to know if the results that you obtained from some experiment that you've run are worthy of being published. That has worked, for the most part, that has worked fine until now, or I mean that's arguable, but people are looking into it and thinking, okay, should we do things differently, and should we be much more strict with what's considered the golden standard for publishing?

We've thought of doing things there with Vermont too, just so that if you're looking at a paper, you have an idea, okay, how relevant is this paper?

Exactly. This is more specific for certain areas. Like if you're talking about medicine or biology, that's really important—the statistical significance of the results that you're presenting—that’s alright; that’s the most important thing.

So we've thought of doing something with Vermont there, either via some API where you could send us the DOI of a paper, and we would send you some information regarding the p-value or something, or with the Chrome extension, where you’d see that information displayed very prominently, saying, "Hey, there might be some p-value hacking here," or "This is very solid research."

There’s a very big problem, and people are realizing how prevalent it is, especially in things like economics and biology.

Biology and nutrition, I mean—it came about, I was just talking to a friend who's doing a PhD at Cambridge in bio. It’s a big thing, yeah. Only by attending a conference in the States did he realize that there was someone in Australia working on the exact same problem as him concurrently, and they were failing at the same types of experiments.

But because they don't publish them, no one knows the results, no one knows the methods. Essentially, these traveling salesman-type problems that people are so excited about quantum, all these permutations are happening at a smaller scale, but no one's publishing anything, so the progress isn't happening.

Yeah, and it's part of it; it's just the way research is done. You come into it, and you're trying to find some correlation usually. You're going to usually have that bias; you're trying to find some correlations in public.

Publishing that and—you might need to change things dramatically in order to get people to start publishing negatives, which could be incredibly useful for other researchers.

Yeah, but there are a bunch of people working on that. There’s this researcher at Stanford—I’m forgetting his name; it’s John, and then I forget his last name—but he actually just went on this podcast.

Oh, really?

Yeah, Lovett Podcast, yeah. So you should listen to that podcast, and actually, Timer has been talking to that professor.

I think he’s a professor at Stanford, and, yeah, he is analyzed more of this subject, but more in relating to economics, I believe, but yeah, he’s found a lot of the things that we’re talking about here; they’re prevalent also in economics.

Cool, let's go into the Twitter questions. So we have a ton of them. You guys are very popular on Twitter; congrats on your great following!

Let’s see, let’s start with something broad.

Tanner Goblinstein asks, "What are the most interesting papers you read in the past couple of years that are not widely known?"

Yeah, that's interesting. We end up, like, I ended up reading all sorts of papers from different areas.

Like, how do you get the papers?

Actually, it's just like a random walk, funny.

Yeah, or sometimes you'll think—for instance, a few months ago, I got a Fitbit to track my sleep, and so I wanted to read papers about sleep. That just got me into a random walk around research around sleep, and then I found a bunch of interesting things. I ended up annotating a paper about a big study in Finland that was done regarding the association between sleep and mortality. There are a bunch of really interesting things that I learned from there; for instance, that like if you sleep less than seven hours, that’s associated with higher mortality.

But if you sleep more than eight hours, that is also associated with higher mortality.

Really?

Yeah, so have you changed your life based on that?

Yeah, no, I try—well, not that I was usually more, in the end, not sleeping, you know. But there’s also another thing from that research: apparently, sleep quality doesn’t matter as much, at least for mortality, which is kind of counterintuitive. But it seems that sleep quality is very closely related to the amount of sleep that you’re getting.

Okay, so like seven hours of okay sleep versus seven hours of great sleep; that’s kind of hard to distinguish.

Seriously, so you can sleep on an airplane your whole life then!

Apparently, maybe your life will be a little bit more miserable, but it’s hard sometimes to pick the favorites.

But there is one—for instance, there was also kind of a random paper published in the '90s about the Simpsons paradox and the hot hand phenomenon in basketball.

So the hot hand phenomenon in basketball is, right, you think that, okay, because they just made a field goal, like the next one, they have a higher chance of making it. There was this researcher that looked at a dataset from the Celtics to see if for free throws, if you know if that was true.

Before that, they’d asked students at Stanford and Cornell—like a hundred students—if they thought that if they just made the first free throw, for the second one, did they have a higher chance of making it or not?

There were something like sixty-eight of the hundred students there that agreed and thought that was true. And these are people from Stanford and Cornell.

So then they looked at this, and what they found was that actually, that seemed not to be the case. From your second free throw, you’re not more likely to make it if you made the first one, but what they found was that you’re just more likely to make it on your second one objectively, significantly.

Yeah, okay.

And so this was done in the '90s with, like, I don’t know how many free throws—maybe like 5,000 they looked at some data from the Celtics.

Yeah.

Then I went and got a dataset from Kaggle with like 600,000 free throws, and I re-ran the same algorithms that they ran for the study in the '90s and then looked at what the results were, and the pattern is pretty clear. Just on the second free throw, they’re just much better at it significantly, regardless of their first one.

And, yeah, it doesn’t matter if they made their first one or if they missed, yeah. And then their paper kind of then tried to explain why people think that there is a hot hand phenomenon, and that is related to the Simpsons paradox, which for people that don’t know—don't know it, that don’t know what the Simpsons paradox is, it’s also really kind of changed my worldview a little bit once I learned more about the Simpsons paradox.

But it's basically what it says is that you can get two valid conclusions out of the same data depending on how you split it.

So an example is, for instance, that between 2000 and like 2013, the average or the median wage for high school dropouts in the U.S. dropped for high school graduates, also dropped for people with an undergrad degree—it dropped—and for people with a graduate degree or higher it also dropped.

So across the board, for all of those segments, the median wage dropped, but in aggregate, then it went up.

So you look at it, and it's like okay, what's going on here? It turns out what happened is that a lot more people got a degree, so they just shifted towards higher education, and that's why you get, on average, it going up. But for each one of these segments, it goes down.

The Simpsons paradox is that depending on how you cut the data, you might get different results, but there could be valid—in this case, it’s pretty easy to understand what the right way is that you—that you, what's the right way to look at this data?

But in some other cases, it's not clear whether or not you should include this variable or cut the data in some different way. So relating it back to this basketball issue, what it was is that if you looked at the results, they were different depending on whether you looked on a player-by-player basis or if you looked at the aggregate. Once you collapsed it all into the same table, you get different results rather than when you looked at it player-by-player.

So yeah, if you collapsed it, I think I forget exactly the way it went, but a few collapses—I think it might have been that you indeed saw—you didn’t see the hot hand phenomenon, but if you looked at it player by player, you saw it.

So they’re arguing that that’s why people had the idea—and that’s why you get like 68 students out of 100 saying that they believe in the hot hand phenomenon.

Yeah, yeah.

And so, yeah, some of the papers like that are really random. It's funny, you're getting these little tidbits of trivia—has it been relevant to you in terms of physics?

I mean you basically, you're working on software now, right?

Yeah, but I also ended up discovering really cool physics papers. For instance, my two favorite papers are actually written by Freeman Dyson. One of them is when he proposed the concept of a Dyson sphere. It's just one page, and he basically explained how an advanced civilization would need more energy than the energy that we can generate on Earth.

So we would have to go to a star and build a cap around the star to extract the energy of a star. But it's funny because it's like with really simple math and physics equations, he was able to derive okay, is this sphere stable? Is it going to be indefinitely?

So it's really an interesting paper. The other one that I really like is one about Feynman’s derivation of Schrödinger's equation, also written by Freeman Dyson. It just shows, you know, Feynman’s intuition about quantum mechanics, and it's also really simple and easy to read, even if you don't have a physics background.

But one of the things that I noticed from trying to find papers and annotating all these papers was that, you know, in the '60s and all through the 20th century, all these discoveries and all these papers were mostly like one or two pages, and yeah, like it's so fun, and also fairly simple to read.

But the discovery of the neutron—it’s like maybe one column. Just the discovery of the positron. These Dyson sphere papers are really short papers and fairly accessible.

Why do you think they've gotten so long? Is it sort of like, you know, David Foster Wallace citing a million things because he doesn’t have confidence?

I think it's also a consequence of the field developing.

Yeah, just have—you know, more complex questions, and it's harder to write. They're also a little bit more detailed as to the methodology, and the format of papers has gotten a little bit more formal in that sense, where people follow a very specific format, and I think that has added on to it.

But yeah, nowadays they tend—like the gravitation wave paper that we annotate, yeah, that's relatively, that's what, like 15 pages maybe?

It would be interesting to analyze the constraints in terms of size that journals were imposing 50 or 60 years ago compared to what they are doing now and if they are like forcing people to write—they were forcing people to write shorter papers.

Well, there are two papers back then, not sure, but I mean like if the discovery of the positron paper was published today, I bet it wouldn't be just a single column.

Yeah, well, are they intended to be more reproducible now?

Good question—maybe! Maybe, it’s just more complex problems that they're tackling now.

Yeah, it might be the case.

Yeah, it's definitely not going back. It seems you don't really see a trend anywhere of shorter papers.

But yeah, it's interesting when you go back to the '60s and '50s; it was pretty nuts—glory days!

Alright, cool, so let's go to another question.

Polaris 7 asks, "What are the necessary ingredients in a good and impactful science writing?"

Mhmm, this is also a good question. I don’t think that I’m qualified to do—or like I haven’t published that many papers to know that. But one of the things that we noticed, or at least I noticed from reading papers, is that sometimes it’s not like the discovery paper that is the most impactful paper.

For instance, I just remember when quantum electrodynamics was discovered, there were three guys working on that problem—Feynman, Schwinger, and Tomonaga. They were sort of working independently on that problem and publishing papers on quantum electrodynamics.

The most impactful paper was actually published by Freeman Dyson, who at the time took the time to analyze all the work and kind of unified the work of Feynman, Tomonaga, and Schwinger. He wrote a paper that helped other researchers understand what quantum electrodynamics was back then and helped really spread their work.

So it was actually the most impactful paper.

In other words, that clear writing!

Exactly, yeah.

Also, I mean, the question here is impactful scientific writing, and so you have, of course, writing papers, and then you also have just scientific writing in the sense of making some concept more explaining that to a general audience.

I think there’s also, yeah, it’s also the same where you want to make it clear and accessible. But for instance, even like something like the Bitcoin paper where, yeah, it is, like—I mean, I studied photography in college and even it took me a few reads through it to actually get it.

And it’s a beautiful paper, but it’s definitely not—it’s very Spartan language, and you want to read every sentence in there, and so it can be very challenging to approach it.

I think definitely, you always benefit if you can make it as clear and accessible as possible because you never know—the audience that is going to end up reading your paper, of course, you can expect other people in your field are going to read it. But sometimes, things can be useful, especially like interactions between math and physics; things can be useful in different fields.

I think it's always beneficial for science if you try to make it as accessible.

Yeah, what does impact mean?

Yeah, it's a question as well.

Did you see that one?

No, from Adam Babbitt asks, basically metrics for value-add: what does impact mean? You know, if it's the number of citations that you get or just a number of people that learn about a certain subject because of a paper.

So in that way, a review paper can have a really big impact compared to a discovery paper.

And so it’s one of the problems that we also think about a lot; these metrics and what are the incentives in science and what makes people want to publish a paper or why should people worry about clarifying a paper and making it understandable to as many people as possible? Do they have the incentives to do that?

And how can you create incentives to do that?

Right, and then sometimes, if you're just—the metric is just a number of citations, sometimes that's not aligned to making the paper understandable and comprehensible to a large audience.

I mean, is that a question that you guys have to tackle? Because, you know, on one hand, you want to illuminate these papers that people could potentially learn from, and then on the other hand, you're running a site with content, right? And you want things that are going to capture attention.

So I saw you have a Charlie Munger post on there, right?

Mieke annotated the Charlie Munger paper.

Oh, our other co-founders.

Yeah, yeah, so it’s like squarely a non-technical paper, but Charlie Munger has millions of fans across. Exactly right!

Yeah, so you kind of have to balance those two things.

Yeah, it's not easy, and citations are definitely a proxy. If the paper is getting cited a lot, it has some sort of importance, but it's definitely not perfect.

If you look at the most cited papers in these different fields, you might be surprised. There might not be the ones that you expect them to be. I certainly remember looking at the most cited papers in computer science, and they’re definitely very impactful, but you might add some of them—I remember reading through those ten, and some of them I had never heard about before.

And sometimes, it’s very important—well, this is more specific for certain fields—very important concepts or discoveries never really get published in one paper that then gets a ton of citations. That knowledge gets spread in some other way.

So there are citations that are not perfect, but I wouldn’t say that we have a great answer for that.

What's a better proxy?

And now you should go about it, and I don’t think anybody really right now has a better answer to that. Not that we’ve heard about, but yeah, it’s an interesting problem.

You know, who knows what people started using in the future? Because you could measure, yeah, impacts on how many—how many people are talking about it on social media? Critics are reading about this paper, or if you have code, you know, if you have a public repo, how many forks do you have on your repo?

Yeah.

Or like, for soon—and then it depends on field by field, right? So if you take bio, then bio papers can have a very direct—can be used very directly, say, in industry, right?

You can publish a paper about a drug, and then that can be used worldwide and save lives. So there, for that field, maybe you can—they’re a bunch of other metrics that you could use to calculate the impact of a paper, but for the more traditional science like physics and math, sorry, yeah, it's hard.

Hmm, question up top.

Arsalan Yaarvesey asks, basically about working in public and in the speed of publishing. They say since scientific papers usually go through scrutiny and evaluation before getting published, how do you cope with not being always updated and up to speed in a world with daily news and contributions?

This kind of relates to what we were talking about before in relation to people publishing to the archive before they really test it out. Where do you guys fall in that dynamic of like publishing as soon as possible?

Like with something like machine learning, where things are being put out all the time versus going through a peer review before getting something out.

This kind of loops into peer review, which is a whole world unto itself.

Yeah, people are talking a lot about that. For us generally or for a weekly journal, we generally are not publishing the most recent research. There is definitely, like, sometimes there's a lot of us having to catch up to—I'd remember annotating a paper about like this machine learning algorithm to play 101 poker. This was like my league.

I had to go spend a good amount of time there researching it and also figuring out, okay, how relevant is this? I also don’t—because, you know, I’m not in the field, so it’s hard for me to gauge, okay, what’s the impact of this paper?

So, yeah, sometimes it takes us a lot of reading up before we can actually say, okay, this is worth publicizing and making—and having our audience or sorts our stamp of approval and saying, "Hey, you should read this; I think you’ll like it."

It can take a while sometimes, but in the future, looping back to peer review, that's also something that I think the system nowadays does not seem to be perfect, the way things work nowadays.

We would love to see either via Vermont or some other platform to try to tackle that and try to do something to make the peer review a better system or to change it significantly.

I think there's a lot of work left to be done.

Mmm, there.

Which can have a very significant impact on science, right? That’s part of one of the most important aspects of science is just, okay, being a very skeptical mindset, looking at it with a very critical eye and seeing, okay, is this something that we can build upon? Is this something that we’re going to add to our foundations to build more science upon this?

That’s a very important aspect of science, and I think it’s—it’s not perfect, and it could be better.

Yeah, so Anvil Rotterdam asks, "Have you ever thought about building a tool for annotating books?" Something like what Patrick Collison was talking about in this thread where he basically says, "I’d pay a lot more for books if I could see the highlights, annotations, and marginalia of friends or people I follow."

You know, it's actually a really good question, and we have a friend, Jess Riddle from the Perimeter Institute, who is a researcher there that writes about this on his blog. I think that besides annotating academic papers, it also makes total sense to annotate books, and especially kind of introductory books about science.

He gives this example of a book that is used by thousands of students to learn classical mechanics called Goldstein. There’s a section in that book where they talk about this transformation called the Legendre transform, and it does a bad job at explaining what it is.

Apart from that section, the rest of the book is awesome! It’s really nice if you want to learn classical mechanics.

But if I want to write the book that does a better job at explaining the Legendre transformation, it has to be better than the Goldstein book, right? So that anyone will adopt that book. Otherwise, people just keep using the Goldstein book.

So it would make sense for books to be annotated and also be open source.

So in that sense, you would just commit a new chapter, a new explanation for that, and keep all the other chapters and then just change that bit instead of having to write a new book and then convince people to adopt your book just because of that.

So I think it makes total sense to do more introductory books.

No, and we've thought about that, the type of things that you could do if you add some platform where you could write, where you could have books that kept being updated, and you could have, okay, this is the standard for learning calculus where you—you know, this is constantly being up-to-date, you were adding exercises to it.

People are forking and like, if you need more information about this, you're not understanding it, you could deep dive—you could deep dive into it, and you have a bunch of additional content that is attached to it.

It really feels like something that should exist, and we've thought about it like doing something with Vermont for that.

Yeah, there are so many things, but just in terms of copyright: are there massive issues there or is that possible?

I think you might be facing some of the same challenges that Wikipedia is facing, to an extent.

Then, yeah, it would—I think it would depend a lot on the format that is used. I do think for something like this, you’d probably benefit from having some editor or like a team of editors to see, okay, what should we add or should we not? To an extent, to be some curating voice.

In terms of copyright, you could run into some issues. Well, especially the classic books, like on electromagnetism are out of copyright.

Yeah, yeah. I mean, my impression was that these are maybe even like current books coming out—like popular fiction—even as annotated by famous persons.

So, I mean, maybe they gave away their notes for free, and they were just the layer on top, oh yeah, but if you wanted to, you know, resell your own version of the book, yeah, that’s interesting.

There’s also some—there is some legislation—well, there's two fair use where you can use a piece of content if you're adding on to it. This is why you can have a video on YouTube with a snippet from a movie if you're reviewing it; there's some precedent there for doing this type of thing.

But yeah, for more general books, I also agree that it would be amazing because we were just talking about this—we've talked about this for a while now.

Right, because you read a book, and the purpose of that book, it's not only for you to absorb all the knowledge that is there, but it’s also to get you thinking about what's being talked about in the book, and then you might reach some other conclusion. You might go on a tangent, and when you're reading it, that knowledge might never be shared with anybody else. You might just read it yourself, and you think, okay, this just made me think about something else.

It would be really—there’s a lot of knowledge that is being lost, and it’d be great if you could capture it in some way. The Amazon Kindle highlights site is one of the saddest things I've ever seen.

Have you ever done that?

We have Kindles, but do we have anything more than that?

Oh yeah, so there’s a whole web interface for looking at all of your highlights across all of your Kindle books. It’s not good.

So do you use it for anything?

Okay, sometimes I go back. So like the best way that I’ve found for me personally to retain is to buy the audiobook and go through a book a couple of times, and then my retention goes way up.

But occasionally I’ll be just like, how was that passage in whatever book? And I’ll go back onto Amazon and you can link it from Amazon.

Yeah.

You can dig through your highlights from your Kindlebooks. I think I've seen like a startup that does that in a better way.

You lose all your highlights and organize them.

Yeah, the Kindle—I remember looking into this.

But what I've started doing is—well, if I'm ever—I also use Kindle. That's usually—I usually don’t write annotations via Kindles; I’m more highlighting, and usually don’t use it for that. But if I'm reading a physical book, for the past few years—before, maybe I would never write anything. Now I try to write a lot more there.

Yeah, and at some point if I have time, I’ll try to go through the book, see where I wrote things, and then write that in some notebook.

Because there is like just going through that exercise of looking at what you highlighted can be very helpful.

Yeah, I mean, I was an English major in college, so like I've forgotten more books than a lot of people ever read in college.

Yeah!

And one of my professors actually recommended this, which is basically take a five by seven index card, and as you're reading the book, you’re making little notes, right? You’re like, alright, this character does this or this is an important point, and then at the end, you basically write a paragraph to your future self describing your memories of the book and what happens and important ideas—it can really trigger it—

Yeah, Gloria, to retain the past.

That, like, I know.

Yeah, but I remember in school, back in Portugal, we all had to read this epic poem; it's called "The Lusiad," written by a poet back in the day. It's about the Portuguese going from Portugal all the way to India; it covers the Portuguese discoveries.

I remember we had a version—you had the original version, which was pretty thick, and then we also had the version that had annotations on the side for each verse—not for all that, but for a lot of them.

That made such a big difference, right? Because you’re reading this in all Portuguese, which by itself is already hard to tell. I mean, he's using—he's making references that you have no clue about, so much historical context.

Yeah, every word was—write the names of all right—India was not called India. So there's everything that’s different, and you're reading it through the first time; it sounds great—it rhymes—but you don't understand a lot of the contests—the context behind it.

If you go through it, and okay, you read through it, and then on the side you have all this rich content that really only adds on to your experience and makes it much more memorable.

You can map it out in your mind and create many more connections.

It really enriches your experience. Of course, you have this because in this case, this is an epic poem that everybody has to read, and there's a large incentive to publish an annotated version of this book that is no longer under copyright.

So you can have those types of things, but for a lot of more recent books, I think there would be—you could benefit a lot from being that to some extent, right?

You can have, if you want to, if you're reading through these few pages, and you love what the author is talking about here, you want to dig deeper into this topic that he’s talking about right now. There should be some place where you could do that, but yeah, there's just nobody has actually built this!

I think that like defaults toward the blogosphere for most people; they just—some people summarize, yes, and you like write Amazon reviews and yeah.

But then the thing there is that—and sometimes that content does exist, but being able to find it easily, having that like in your fingertips can make all the difference where even if you—you know, maybe you could do spend like a minute searching on Google and you find the content there you’re looking for.

But if it was right there, you could just click, and it would pop up, and you’d see it—then it would be much more likely that you would end up reading that content.

Do you find that annotations sometimes are best done by someone who is not the author of a paper?

What’s interesting is that well, the authors of the paper sometimes, you know, they're not gonna know where people are going to struggle understanding the paper. Oftentimes, I remember when I was annotating the Ethereum white paper, I went through it and then I emailed them and it was super quick to reply, and you reply back with some of the questions that he gets the most about ethereality.

But then when you're writing it, you have no clue; for you, you've worked it out in your mind. Some steps you might skip because you just have internalized them by so much so you only get—you only know where people are going to struggle once you put it out there, and you start getting questions.

And so, yeah, sometimes the authors are not the best; every time we talk with an author, I think it's easier for them to answer questions about their papers than to annotate the paper.

Yeah, but then if you have another person annotating the paper, I think it’s easier for them because—but yeah, the authors, we see that a lot; they just ask me questions; I'll answer them.

But sometimes, I don’t know how to enhance or add content to my own paper.

Yeah, you guys can provide that service for sure; it’s kind of worth noting that this is a side project for you.

Yeah, I mean I have so many questions about how you go about building this thing that's definitely consuming a lot of your time! I mean, it has to, right? Between finding, reading papers, making all those graphics and tweets and stuff that you guys do, how do you find that balance? What’s your whole philosophy around this?

Yeah, it definitely takes its time. It is something that we actively tried to do after college, while we were working, before doing Format; it’s reading papers and staying up to date. We tried to do anyway, and so we were already looking into research before.

It’s just something that we would enjoy, and we found it good to have some sort of peer pressure amongst ourselves to present papers to each other, right? Because that really forces you to understand something well, right?

I think it was Feynman who has some quote where you don't understand something until you can explain it to a freshman in college, yeah.

And so that is very true, and so we try to do that amongst each other. Then we got to Format, and we thought, okay, maybe we can bring this online, and so we were already spending a considerable amount of time doing this type of stuff.

But it is—mostly like late at night I will be trying to fix bugs. People in Akron use don’t seem to think that is a side project.

So, yes sir, there are definitely bugs, and sorry about that! We try to fix them when we have time.

Yeah, but it definitely takes its time. But I think it's also something that all of us really like doing, and I mean I start looking at Wikipedia articles about quantum computing, and then I like spent three hours clicking on articles and articles and articles, and then I found like five papers to annotate, and I’ve produced like ten or fifteen tweets.

So it’s something that we really enjoy doing!

Yeah, and so it’s, you know, I think that that’s the real genius of it, right? It’s basically figuring out a way to turn your—what would be your hobby anyway into this little enzyme and having a forcing function, because yes, this type of thing is really easy to let go, right? Because sometimes, yes, sometimes you might not feel like understanding a paper to the point where you could annotate it.

There’s like— it takes a while to get a good grip, especially if it’s not an area that you’re super familiar with, of course.

That’s definitely not the type of effort you do on Saturday night, right?

Unless you add a forcing function that you know that within a couple of weeks, you're going to be putting this to a lot of people.

That’s my favorite part of the podcast. Like the software stuff, it’s pretty easy for me to just like—it could be anyone in the room, and we can do a podcast, but when we do physics ones or anything Matt or something, I'm just like, okay, Mike has to take a couple days just really!

I don’t—I’m not—I couldn’t even become an expert if I dedicated a week to it, but I want to be conversant to a certain extent, and that part's fun!

Yeah, seems like you definitely feel the pressure when you're writing these annotations because people and people call you up on it.

Okay, this is wrong, or you missed this! And so when you're writing it, you want to be really careful to make sure that what you're saying is correct, and you know that you might have somebody that is reading through that paper, and it’s going to use your annotation to help him understand.

So you have the responsibility; we feel that responsibility toward those people to do a good job at it!

When we put an annotation, we want to stand by it, and you want it to be of quality!

It’s funny; it’s like the more you annotate a paper, this is like a circle, and the more you annotate a paper, there are more people that are at the edge of starting to understand what the paper is about.

So you start getting more and more questions because the circle expands, and then you just have more people that are starting to understand this topic about number theory or physics or whatever.

So you get more and more questions about the paper.

So, it’s like—and then when do you stop explaining a certain concept?

So it's like you want to annotate a paper about number theory. Okay, do you have to explain what a prime number is, for instance? Or do you have to explain what a rational number is?

So it's really interesting once you start thinking about that—how deep do you go?

And you gotta be careful about those videos then because if you get discovered on YouTube as an explainer series.

No, we've done a few of those, but we've annotated a paper that—I think it was a proof of the rationality of the square root of two, and then there was a fourteen-year-old kid from Russia that—because of that paper, he came up with an alternative proof for that, and he sent us that proof, and I read the proof, and it was apparently legit!

And I told him to submit that to a journal—a math journal, and I think he did it. I haven't heard from him, but I reached out to him to see if he actually was able to publish it.

So it’s also nice to see, you know, we can inspire people sometimes to do these types of things.

And I also think, especially with Twitter, one of the things that we learned is that learning something—learning a concept or learning a fact is really, really addictive.

We see that on Twitter almost every day. People come back, and we have hundreds of thousands of users that read our tweets.

I think that’s why people really like when they have a good teacher, and then when they can go to a class and really learn something, I think the problem is that usually that requires a lot of effort from people.

You either have to go to a class or you have to read a book to learn something, and I think what we’re able to do with our Twitter account was to provide that same feeling of acquiring a quantum of knowledge—but at the cost of reading a tweet, which is really easy for the reader.

Sometimes it’s really hard to make those tweets; it requires a lot of reading and thinking about how can you explain something with just so few characters and an image, maybe.

But you know, once you get to that, and once you're able to teach someone effectively a fact or something, people really like that. I think it's something that, you know, there shouldn't be more people exploring that on Twitter.

It’s a very particular medium, but there are a lot of people that are attracted by that. You know, you might not a few years ago. I would have been very surprised, but now you have all these scientific explainers, but you have people that have millions of followers, and what they’re following for is for scientific content, or they just want to learn.

That’s something very uplifting that we’ve learned—that there are a lot of people out there that want to learn!

I think it's too easy to get down on those people—just like, oh, you know, this is like basic fun facts or whatever. But like at the end of the day, that’s good, yeah, people are excited to learn!

And then you extrapolate out a little bit more, and you look at someone like Dan Carlin doing the Hardcore History podcast, or I think if you had objectively like written that down, you’re like, "Alright, I'm going to produce twenty-five hours of content about the Cons," and people are going to be into it!

I would have filled you, "No way!"

Yeah, and then you look at it; it's like millions and millions of millions of downloads.

Yeah, that's pretty cool. There are some things you look at, and it really catches you by surprise.

I mean, this is parallel, but it's like Wikipedia for instance. If somebody had pitched Wikipedia to me—I would have never guessed it would be possible.

Yeah, because you'd write like how are you going to do this?

Like, no incentive, just people are going out of goodwill, they're going to add content to it, and it’s going to be good content—reliable things that you can use to learn.

And that’s just, right, that’s not something that you would initially think would fit with human nature.

But people surprise you positively, right?

And the same goes for Stack Overflow. People just, out of goodwill, they will go out and explain—or try to help you solve your problems.

There's something to be said that humans have some untapped fountains of goodwill that we might not be leveraging as much as we could.

You know, you see bright spots here and there, and like Wikipedia or Stack Overflow on projects that, if you pitched them to me before they existed, I would be very skeptical that they would be able to get to the point that they are today.

Of all the parallel universes, we are in the universe where Wikipedia actually exists!

There's got to be a lot of thorough and survive!

Yeah, well, I mean it’s like when you talk about you guys expanding, you almost don’t have to over-engineer the incentive mechanism.

You know, if you believe that it’s true—like annotating more papers is objectively interesting!

Exactly, yeah!

For sure we have people—I think, you know, we'll always have people that are going to be interested in consuming the content and reading, then you have the other side—how do you create incentives for people to annotate the papers, right? That’s a different game.

But, yeah, some things—it takes some time, and we are totally—when we started this, we knew that it would take time until people cared at all about what we were doing.

Then it takes even more time to make any sort of impact on the issues that we care about, but for a lot of these things, even say if you look at Archive, Archive was started in, like, it’s my guess!

So it started in August 1991, and it has taken a long time to get to where it is today.

If you look at the graph of submissions for Archive, it’s almost completely linear!

There’s no startup exponential growth; it’s like completely linear!

But it’s arguably one of the things that has impacted the making of science or the distribution of science the most.

But it just took a while to grow, and it seems like it's just going to keep growing linearly.

But sometimes, that’s what you need.

And so we are totally mindful of that, and we know that this might take a really long time until you can get to our ultimate vision and to build that out.

But, you know, some things—they just take some time.

So do you feel pressure to achieve profitability or even sustainability in the business?

Not at all! We never really thought about that.

Yeah, because this is a side project.

We never really thought about monetizing or achieving profitability. It is like, for some of these communities, you know, like Stack Overflow, it's a for-profit company, and I think it does a great job at what it does, and I’m probably happy that it is a for-profit company because it just—there's just more independence.

And if they have a good leadership that takes it in the right direction, it’s great because they don’t need donations to keep going.

Wikipedia is a non-profit, and they’ve been doing great, so it’s possible to do it both ways.

We've just, because we have very limited resources, we try to focus all of our attention on the areas that are the most important to work in—what we're trying to achieve.

So right, so that means like our next step is going to be building the Chrome extension for Archive versus doing anything else because we think that’s what has the biggest impact.

So that’s why we never delved into profitability, and we just paid the costs ourselves—there’s just server costs because we do all the work!

So it’s never something that’s been in our minds a lot.

We think you could build these types of platforms either for profit or non-profit, so yeah—just something we kind of defer further down into the future.

It’s a good question, for instance, good if Archive survived.

If they were a startup, for instance, could they raise money with that kind of linear growth if they were not inside a university, right?

It’s a good question!

Yeah, I mean, plenty of companies without, you know, startup growth raise money and become profitable or sustainable webs!

Yeah, right, you know, you’re just like, okay what are you gonna charge for people care?

Because, yeah, I mean, Archive is great because it’s open source, right?

And so many other journals may be dying out because they’re not!

Yeah, one of the trends that we’ve also noticed is a lot of people building journals on top of Archive, and we are even collaborating with a few journals.

One of them being the Quantum Journal, which is an overlay journal on top of Archive on the quantum physics category.

What they do is basically—so what is a journal? It’s just a list of links to papers in—but they don’t have any hosting costs; they just have a page where they just have the links to all the papers that they decided to publish, and all the papers are on Archive.

So it's completely open, and what we’re—we—they—what our partnership with them is basically all the papers they have, the Format library commenting interface.

We are seeing more and more of these journals popping up.

For instance, the Erdős discrepancy solution was published on one of these open journals called Discrete Analysis, and I think it's totally possible that these open journals get to a point where they have, you know, a reputation like Science or Nature, as long as you convince people to publish their papers on these journals.

Your idea of your position, you already have your Fields Medal, your Nobel Prize—just publish on an open journal.

That’s what Terence Tao did with the Erdős discrepancy, and I think that’s what other people are doing.

Tim Gowers, which is the Fields Medalist also, this mathematician, which founded the Discrete Analysis open journal, I think he wrote a blog post a while ago.

His mission was to convince famous mathematicians and people in these situations to publish on open journals, yeah!

Because, right, for the young researcher that's trying to get a position and a goober competitive pressure field, then you need, right? Because if you want to get your postdoc in a renowned university, you need to have that.

So that’s what’s keeping it alive, so these big names endorsing the open journals, I think that’s going to be the growth agent to increase the reputations of these open journals!

Absolutely!

And it’s interesting because it is—it is a problem, and we definitely believe that that's the right direction.

And while you're in the U.S., right? Like while I was studying at MIT, you don’t even realize it because if you're within the MIT network, everything is open!

You’re accessing it, and when I was an undergrad, I didn’t even realize this—oh, in other words, if you’re literally on the MIT Wi-Fi, you have access to these journals that are paywalled, and you don’t even see, okay, this would be $30 if I was like five blocks down that way!

But Louise was studying in Portugal, and so we would talk, and then you compare. Even like even in Portugal, where we're a right—you have well-funded universities, but they just—the research groups might not be able to afford all the journals, and so you just sometimes, you just have a lot of trouble accessing research.

Yeah, and so this is not—in the U.S. It is like big institutions have access to it. But like in a lot of other parts of the world, the fact that a lot of research is being published in non-open journals has a significant impact.

Well, especially when like legit CS papers are written by people who aren't associated with any university, right?

They're just like hobbyists writing things like—why would they have hundred journal subscriptions?

Exactly!

So, ever—I remember even like researchers and other researchers in my research group; sometimes they would have to go through CERN to get through VPN through CERN to get access to these papers.

Yeah, or like, I would have to email you and ask you to send me the PDF.

You’re a good brother.

Okay, so if someone wants to contribute or help out, what can they do to help you guys there?

I think there are a few ways that you can help us out. You can annotate a paper on the Format library, and so email us at t-nut for my, exactly!

If you want to annotate a paper there, spread the word. If you’re at a university, then Xavier, if you have a journal club, yeah.

If you have a research group and you want to annotate papers and share them among your peers—when you create an account in Format, you can also upload your own papers!

You have that option, and then you can share it with whoever you want! You can create your own lists, and so we have people at universities that use already like our—be it for classes, and students have to read papers, and so they will post annotations on Format or just within research groups, and they all decide to read a paper.

So if you’re at a university, like—and if you want to use this, it’s completely free! You just need to sign up.

Yeah, those are the two main ways that you can help us out. We’re also taking cryptocurrency donations; there’s that!

But it really like—most of our costs are just server costs, and we don’t have to pay salaries to anybody!

So yeah, that’s about it; that’s the way to help us!

Cool! Alright, thanks guys!

Thank you for having us!

Fermat's Library Cofounders João Batalha and Luís Batalha

More Articles