Skip main navigation

How the World Wide Web Works: HTTP as a Communication Channel

This video explains the structure of the WWW and how HTTP acts as the communication channel between users and the web.
Welcome to the third and last part of World Wide Web Fundamentals session. In this last part, we will discuss what happens behind the scenes since you type a website address into your browser’s address bar until the content gets rendered on the screen. We will focus the world wide web architecture, clients, server, and the workflow. We will then look to the HTTP protocol in detail and close the session with a brief overview of core web technologies. Using the world wide web requires
three major components: clients, web browser, and smartphones, wearables, cars– but with internet, something advanced, any connect device you can come up with. Servers, a bunch of remote computers running different software pieces that together are responsible to listen and satisfy client requests, and the communication channel where client-server conversation happens. Let’s consider our client is a web browser. We type the address and what then? Well, first the browser needs to translate the website address, for example into an IP address. The IP address is required to open a communication channel between the client and the right server. You can think of this as a yellow pages search. This translation is done querying special servers called DNS, or domain name servers.
There are plenty of them but one doesn’t know how to translate every website address. When that happens, then it will ask another DNS server until it gets a response which is in turn returned to our client. After that, our client is ready to dial in the right server to request the page you’ve typed into the browser address bar. All this within a few milliseconds and behind the scene. But there’s a lot more. Only after accepting the client call, the client and server start talking HTTP to each other. The client says, “I want homepage”. The server reads this message and does whatever it needs to return the page to the client. Meanwhile, the client is waiting.
When the server is ready to return CodeRed’s homepage, then it sends the response saying, “everything was okay with your request. HTML below”. To get the complete CodeRed’s homepage, several requests and responses may be required. Whether other resources are required is a decision made by the client while processing the first and subsequent responses. The HTTP protocol is like a grammar, establishing how words should be composed together to form meaningful sentences. The HTTP protocol establishes two main sections for both requests and the responses, headers, and body, separated by an empty line. In this case, the body is empty, meaning that the client has no content to send to the server. GET is the HTTP request method, also known as verb.
There are several methods with different meanings. When the client has information to send to the server, for instance to submit a form, then the web browser uses the post method. In the link below, you’ll find a list with HTTP methods and what they are aimed for. The forward slash means the home page or server’s main resource. If you’re looking for CodeRed login page, then this becomes slash account slash login. As you may expect, it HTTP has evolved since it was first proposed by Sir Tim Berners-Lee back in 1990. 1.1 is the protocol version of this message. The first HTTP version was 0.9 then 1.0, 1.1, and now there’s HTTP 2.
I recommend you research the differences, especially those introduced by HTTP 2. Nowadays, a single server can host hundreds or even thousands of websites. So the client has to explicitly say what website it wants the page from. This is what the host header is for. There are several standard headers and all of them
follow the host header format: header name followed by a column and then header value. The HTTP request method is an exception. HTTP responses have the same format as requests. They also have headers followed by an empty line and then the content, if any. The first header in the response is special as in the requests, and it is called status header. It includes the protocol version, HTTP 1.1, and the status code, 200, and the correspondent message, “okay”. Again, there’s a list of standard status codes and messages that you should know about and you’ll find them in the link below. The content type header is used to indicate the media type of the body. In this case, the server is returning HTML.
But there are several other formats like JPEG or PNG for graphics, MP3 for audio, or even JSON and XML for data changing. This header provides some guidance to the client so that it knows how it should handle the response. For the sake of completeness, let’s talk about POSTs requests. As I said before, the HTTP POST method is used when the client wants to send content to the server.
It has the exact same structure we discussed before: first headers followed by an empty line, and then the request body. Let’s keep the request and host header since we have already discussed them in great detail. The content type request header tells the server what type of data is actually being sent. In this case, we’re sending data in the x-www-form-urlencoded format. Looking to the request body, we will understand how this format works. Variable name, user, followed by an equal sign, and then the variable value, John. To send multiple variables, we should repeat this pattern and concatenate everything together using a percent as a separator. When a variable value has special characters, then the value has to be URL encoded.
Exactly what happened with the past variable value? Percentage 26 corresponds to an ampersand character, which we saw before has a special meaning as variable separator. Please note that encoding is different from encryption. Your passwords are not encrypted when sent to the server as part of a standard HTTP POST request, even when you’re using HTTPS. HTTPS is in fact HTTP over TLS or SSL. The word over is very important, meaning that what is encrypted is the communication channel. The protocol remains the old plain HTTP as described until now. The content length header indicates the size of the entity body in bytes. We should have 30 characters in the request body.
There’s a lot more about HTTP you should know, but it is out of scope. What we have discussed until now is a good starting point to help you understand most of the web application vulnerabilities out there. I strongly recommend you read more about HTTP protocol. Let’s now briefly discuss world wide web core technologies.

This video explains the structure of the WWW and how HTTP acts as the communication channel between users and the web.

In this video, you will learn about the structure of HTTP. Understanding how data moves from you to the web, and how HTTP facilitates this, will help you understand the vulnerabilities in the WWW structure.

This article is from the free online

Advanced Cyber Security Training: OWASP Top 10 and Web Application Fundamentals

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education