HTTP and HTML | Internet 101 | Computer Science | Khan Academy
I'm Jasine Lawrence, and I'm a program manager on the Xbox One engineering team. One of our biggest features is called Xbox Live. It's an online service that connects gamers from all around the world, and we rely on the internet to make that happen. This is no easy task, and there are a lot of things happening behind the scenes. The internet is totally changing how people interact and connect.
But how does it work? How do the computers all across the world actually communicate with each other? Let's look at web browsing first. You open a web browser; it's the app you use to access the web pages. Next, you type in the web address or URL, which stands for Uniform Resource Locator, of the website you want to visit, like tumblr.com.
[Music]
Hi, I'm David Karp, the founder of Tumblr, and we're here today to talk about how those web browsers we use every day actually work. So, you've probably wondered what actually happens when you type an address into your web browser and then hit enter, and it really is about as crazy as you can imagine.
So in that moment, your computer starts talking to another computer called a server that's usually thousands of miles away, and in milliseconds, your computer asks that server for a website. That server starts to talk back to your computer in a language called HTTP, and HTTP stands for Hypertext Transfer Protocol. You can kind of think of it as the language that one computer uses to ask another computer for a document.
Uh, and it's actually really pretty straightforward. If you were to intercept the conversation between your computer and a web server on the internet, it's mainly made up of something called GET requests. Those are really very simply the word "GET" and then the name of the document that you're requesting. So if you're trying to log into Tumblr and load our login page, all you're doing is sending a GET request to Tumblr's server that says "GET SL login."
That tells Tumblr's server that you want all of the HTML code for the Tumblr login page. HTML stands for Hypertext Markup Language, and you can think of that as the language that you use to tell a web browser how to make a page look.
So if you think about something like Wikipedia, which is really just a big simple document, HTML is the language that you use to make that title big and bold, to make the font the right font, to link certain text to certain other pages, to make some text bold, to make some text italic, to put an image in the middle of the page, to align the image to the right, to align the image to the left.
The text of a web page is included directly in the HTML, but other parts, images or videos, are separate files with their own URLs that need to be requested. The browser sends separate HTTP requests for each of these and displays them as they arrive. If a web page has a lot of different images, each of them causes a separate HTTP request, and the page loads slower.
Now, sometimes when you browse a web, you're not just requesting pages with GET requests. Sometimes you send information, like when you fill out a form or type a search query. Your browser sends this information in plain text to the web server using an HTTP POST request.
So let's say you log in to Tumblr. Well, the first thing you do is you make a POST request; that is a POST to Tumblr's login page that has some data attached to it. It has your email address; it has your password that goes to Tumblr's server. Tumblr's server figures out that, okay, you're David. It sends a web page back to your browser that says "success, logged in as David."
But along with that web page, it also attaches a little bit of invisible cookie data that your browser sees and knows to save. And that's really important because it's really the only way that a website can remember who you are.
All that cookie data really is, is an ID card for Tumblr. It's a number that identifies you as David, and your web browser holds on to that number. Then, the next time you refresh Tumblr, the next time you go to tumblr.com, your web browser knows to automatically attach that ID number with the request it sends over to Tumblr's server.
So now, Tumblr's server sees the request coming from your browser, sees the ID number, and knows, okay, this is a request from David. Now, the internet is completely open; all of its connections are shared, and information is sent in plain text. This makes it possible for hackers to snoop on any personal information that you send over the internet.
But safe websites prevent this by asking your web browser to communicate on a secure channel using something called Secure Sockets Layer and its successor, Transport Layer Security. You can think of SSL and TLS as a layer of security wrapped around your communications to protect them from snooping or tampering.
SSL and TLS are active when you see the little lock that appears in your browser address bar next to the "https." The HTTPS protocols ensure that your HTTP requests are secure and protected. When a website asks your browser to engage in a secure connection, it first provides a digital certificate, which is like an official ID card proving that it's the website it claims to be.
Digital certificates are published by certificate authorities, which are trusted entities that verify the identities of websites and issue certificates for them, just like a government can issue IDs or passports. Now, if a website tries to start a secure connection without a properly issued digital certificate, your browser will warn you.
That's the basics of web browsing, the part of the internet we see day-to-day. To summarize, HTTP and DNS manage the sending and receiving of HTML, media files, or anything on the web. What makes this possible under the hood are TCP/IP and router networks that break down and transfer information in small packets.
Those packets themselves are made up of binary sequences of ones and zeros that are physically sent through electric wires, fiber optic cables, and wireless networks. Fortunately, once you've learned how one layer of the internet works, you can rely on it without remembering all the details, and we can trust that all those layers will work together to successfully deliver information at scale and with reliability.
[Music]