HTTP Programming

Let’s get acquainted with HTTP by writing programs. We’ll even write a little web server. From scratch.

Unit Goals

To become acquainted with HTTP and write some basic programs utilizing the protocol.

Overview

Previously, we created our very own application layer protocols. The date and capitalization application protocols were rather trivial, and the our Tic-Tac-Toe protocol (TTTP) was a bit more interesting. Our chat application also had its very own protocol.

But there’s an existing, well-known protocol called HTTP (read about it at Wikipedia) that we can layer applications on top of!

We’ll study HTTP in more detail later, but for now, let’s use it to make an application where clients simply send a message to the server asking for the current datetime. HTTP is a stateless, request-response protocol. For our app, the client will send an HTTP request like:

GET /date

and the HTTP response will be something like:

2019-03-28T04:49:47.952Z

A Simple HTTP Server

The first HTTP server we are going to write will run on port 59999 and feature two endpoints:

GET / for returning an HTML “page” for the application
GET /date for returning the server’s current datetime as plain text.

datewebserver.js

const http = require('http');

http.createServer((request, response) => {
  if (request.method === 'GET' && request.url === '/date') {
    response.writeHead(200, { 'Content-Type': 'text/plain' });
    response.end(new Date().toISOString(), 'utf-8');
  } else if (request.method === 'GET' && request.url === '/') {
    response.writeHead(200, { 'Content-Type': 'text/html' });
    response.end('<h1>The Date Client</h1><a href="/date">Get date from server</a>');
  } else {
    response.writeHead(404, { 'Content-Type': 'text/plain' });
    response.end('Sorry, that’s not there');
  }
}).listen(59999);

console.log('Date Server running at port 59999');

Discussion:

Because HTTP is so common, Node.js has a built in module called http.
HTTP requests begin with a method (such as GET, PUT, DELETE) and a URI (which is an identifier for the resource being created, retrieved, updated, deleted, or otherwise manipulated.
HTTP responses have a response code, such as 200 for OK, 201 for CREATED, 404 for NOT FOUND, 400 for BAD REQUEST, and dozens more.
Both requests and responses have metadata in their headers. One of the most common header is Content-type, which we used here to distinguish our plain text responses from our HTML ones.
We use writeHead() to write headers, and write() and/or end() to write the response body.

Run the server:

$ node datewebserver.js
Date Server running at port 59999

HTTP Clients

There are at least four ways to use this server. For illustration, we’ll run the server on localhost. For classwork, we’ll put the clients and servers on different machines.

nc

First, we can just use nc (and if you are taking a networking class, you should do this). With the server running on your local box:

$ nc localhost 59999
GET /date

HTTP/1.1 200 OK
Content-Type: text/plain
Date: Thu, 28 Mar 2019 04:49:47 GMT
Connection: close

2019-03-28T04:49:47.952Z

You have to hit the Enter key TWICE after entering the GET line. That is mandated by the protocol, as we’ll soon see.

That response was pretty large.

That’s right, an HTTP response contains a lot of metadata, which we’ll cover later. And yes, part of the response metadata is the server date, which might make you wonder why anyone would ever write a date server....

What about the other endpoint?

$ nc localhost 59999
GET /

HTTP/1.1 200 OK
Content-Type: text/html
Date: Thu, 28 Mar 2019 05:05:11 GMT
Connection: close

<h1>The Date Client</h1><a href="/date">Get date from server</a>

And what about that 404 thing?

$ nc localhost 59999
GET /whatever

HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Thu, 28 Mar 2019 05:06:58 GMT
Connection: close

Sorry, that’s not there

By the way, Node’s http module does some pretty good parsing of requests and handles a good deal of the protocol for you. These examples will just give you an idea of what’s in HTTP:

$ nc localhost 59999
bzusdfyiwuef
HTTP/1.1 400 Bad Request

cURL

The second way is to use curl. Use the -i option to see the whole response:

$ curl -i -X GET localhost:59999/date
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Thu, 28 Mar 2019 05:08:42 GMT
Connection: keep-alive
Transfer-Encoding: chunked

2019-03-28T05:08:42.217Z

Without -i you just get the response body:

$ curl -X GET localhost:59999/date
2019-03-28T05:10:15.931Z

Exercise: Try curl on the other endpoint (/) and on non-existent endpoints.

Web Browsers

Okay, yes, the third way is to use a web browser, which speaks HTTP already. With your server running on your local machine, enter http://localhost:59999 into the browser’s address bar, hit Enter, and see the page. Click on the link to hit the other endpoint.

Also, enter the URI of the date endpoint directly: http://localhost:59999/date. That worked great, didn’t it?

Programmatically, from a Server or Command Line

Your favorite programming language should have an http library for making requests. If you are making requests programmatically, you’ll have to examine the response in some detail and take one of several actions depending on the status code. You’ll also have to check and process all the response headers, and read in the response body. The code can get clunky. Here’s how it looks in JavaScript using the raw http library:

datewebclient.js

const http = require('http');

http.get('http://localhost:59999/date', (res) => {
  if (res.statusCode !== 200) {
    console.log(`Received status code ${res.statusCode}`);
    res.resume();
    return;
  }
  let dateString = '';
  res.on('data', (chunk) => { dateString += chunk; });
  res.on('end', () => console.log(dateString));
}).on('error', (e) => {
  console.error(`Request failed: ${e.message}`);
});

DON’T PANIC!!! If you npm install request then it’s much, much easier:

datewebclient-request.js

const request = require('request');

request('http://localhost:59999/date', (error, response, body) => {
  if (error) {
    console.error(error);
  } else if (response.statusCode !== 200) {
    console.log(`Received status code ${response.statusCode}`);
  } else {
    console.log(body);
  }
});

Oh wait, aren’t promises much more awesome than those silly callbacks?

$ npm install request request-promise --save

datewebclient-promise.js

const rp = require('request-promise');

rp('http://localhost:59999/date').then((body) => {
  console.log(body);
}).catch((error) => {
  console.log(error.message || 'Error');
});

Interesting.... request-promise treats both non-sucess HTTP responses and failures to connect as reasons to reject a promise.

Programmatically, from a Client-Side Script?

Let’s review the different ways we’ve hit the server so far:

Running nc and typing in the HTTP request by hand.
Typing the URIs into a web browser address bar, or (equivalently) clicking on a link in an HTML document. This only worked because we were making GET requests and we did not have any special headers to set.
Running request-containing scripts from the command line, or other server (but not inside of a browser).

Now what if wanted to write our own web client, and not use the web page delivered by that web server? In other words: can we write a web app that uses the date web server just for its data, and not use its HTML?

This is what fetchis for. Let’s try:

datewebclient-fetch.html

<html>
  <head>
    <meta charset="utf-8">
    <title>Date Web Client</title>
  </head>
  <body>
    <p>IP Address of Server: <input id="ip" type="text"></p>
    <p><button>Get Current Date</button></p>
    <p id="response"></p>
    <script>
      document.querySelector('button').addEventListener('click', () => {
        const host = document.querySelector('#ip').value;
        const messageArea = document.querySelector('#response');
        fetch(`http://${host}:59999/date`).then((res) => {
          messageArea.textContent = `${res.status} `;
          return res.text();
        }).then((data) => {
          messageArea.textContent += data;
        }).catch((error) => {
          messageArea.textContent = error;
        })
      });
    </script>
  </body>
</html>

Now if we load this file into the browser, enter localhost into the IP box, and hit the button, we get:

Access to fetch at 'http://localhost:59999/date' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

CORS? That’s Cross Origin Resource Sharing. Basically web browsers stop you from fetching resources from sites (host+port) other than that which served you the JavaScript you are running. How can you get around this? If you really were writing a web service, one that just delivers data and not web pages, then you can return a response header Access-Control-Allow-Origin, allowing access to certain web clients or to everyone (with the value *).

Classwork: Get into groups of three. One person wll run the server. The other two will use the HTML client and try to hit the server. Both should see CORS errors. Then the person writing the server will add
    'Access-control-allow-origin': 'http://ip_address_of_one_of_the_clients'
into the argument to writeHead for the /date endpoint. Now only that one client will be able to hit the server. Next, the server programmer will change that header to
    'Access-control-allow-origin': '*'
and the group should verify that both clients can now hit the server.

The other way to “get around” CORS is to just make these kinds of calls from other servers. It’s only the call from the browser that is impacted by the same-origin policy.

HTTP Requests and Responses

Let’s look at HTTP, the protocol, in a little more detail.

Formats

HTTP Request format:

Method SP RequestURI SP HTTPVersion CR LF

ZERO OR MORE RESPONSE HEADERS

CR LF

OPTIONAL MESSAGE BODY

Example:

PUT https://ehr.example.com/types/Lab.Diagnosis HTTP/1.1
authorization: Bearer hbGciOiJIUz8I1NiJ9JzdWIiOiJh
content-type: application/json
Origin: https://e1ak8fcq37hprs.cloudfront.net
Referer: https://e1ak8fcq37hprs.cloudfront.net/
User-Agent: Mozilla/5.0 AppleWebKit/537.36Chrome/72.0.3626.96

{"display":"Lab Diagnosis","schema":{"code":"string"}}

HTTP Response format:

HTTPVersion SP StatusCode SP ReasonPhrase CR LF

ZERO OR MORE RESPONSE HEADERS

CR LF

OPTIONAL MESSAGE BODY

Example:

HTTP/1.1 200 OK
access-control-allow-headers: Content-Type,Authorization
access-control-allow-methods: GET,PUT,POST,DELETE,PATCH,HEAD,OPTIONS
access-control-allow-origin: *
content-encoding: gzip
content-length: 94
content-type: application/json
date: Thu, 28 Mar 2019 16:10:20 GMT
status: 200
via: 1.1 9f6b9465776576cba700d600678836e.cloudfront.net (CloudFront)
x-amz-apigw-id: XQq8-Edb8625FbxA=
x-amzn-remapped-content-length: 92
x-amzn-requestid: fc22c7c7-5173-11e9-b44d-b393bfa59355
x-cache: Miss from cloudfront

{
  "post_type": {
    "display": "Lab Diagnosis",
    "name": "Lab.Diagnosis",
    "schema": {
      "code":"string"
    }
  }
}

More details at MDN. Also you should check out the RFCs.

Request Methods

HTTP was designed around the idea of an infinite number of types of resources (nouns) that are manipulated with a very small number of methods (verbs). The methods are:

Method	Brief Summary
GET	Requests a representation of the target resource
HEAD	Identical to GET except that the server must not send a message body in the response
POST	Requests that the target resource process the representation enclosed in the request according to the resource’s own specific semantics
PUT	Requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload
DELETE	Requests that the origin server remove the association between the target resource and its current functionality
PATCH	Requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. The set of changes is represented in a format called a “patch document” identified by a media type
CONNECT	Requests that the recipient establish a tunnel to the destination origin server identified by the request-target and, if successful, thereafter restrict its behavior to blind forwarding of packets, in both directions, until the tunnel is closed
OPTIONS	Requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary
TRACE	Requests a remote, application-level loop-back of the request message

Exercise: Read, and study, the official descriptions of every method from the RFC itself. You can also fund summaries at Wikipedia or MDN or the W3C, but really you should know where to find the excruciatingly complete details.

Headers

Both requests and responses can be packed with metadata which are called headers. Here are some of the common ones:

Request Headers	A-IM Accept Accept-Charset Accept-Encoding Accept-Language Accept-Datetime Access-Control-Request-Method Access-Control-Request-Headers Authorization Cache-Control Connection Content-Length Content-Type Cookie Date Expect Forwarded From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Origin Pragma Proxy-Authorization Range Referer TE User-Agent Upgrade Via Warning
Response Headers	Accept-Patch Accept-Ranges Age Allow Alt-Svc Cache-Control Connection Content-Disposition Content-Encoding Content-Language Content-Length Content-Location Content-Range Content-Type Date Delta-Base ETag Expires IM Last-Modified Link Location Pragma Proxy-Authenticate Public-Key-Pins Retry-After Server Set-Cookie Strict-Transport-Security Trailer Transfer-Encoding Tk Upgrade Vary Via Warning WWW-Authenticate

Request Headers

A-IM Accept Accept-Charset Accept-Encoding Accept-Language Accept-Datetime Access-Control-Request-Method Access-Control-Request-Headers Authorization Cache-Control Connection Content-Length Content-Type Cookie Date Expect Forwarded From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Origin Pragma Proxy-Authorization Range Referer TE User-Agent Upgrade Via Warning

Response Headers

Accept-Patch Accept-Ranges Age Allow Alt-Svc Cache-Control Connection Content-Disposition Content-Encoding Content-Language Content-Length Content-Location Content-Range Content-Type Date Delta-Base ETag Expires IM Last-Modified Link Location Pragma Proxy-Authenticate Public-Key-Pins Retry-After Server Set-Cookie Strict-Transport-Security Trailer Transfer-Encoding Tk Upgrade Vary Via Warning WWW-Authenticate

Check out the list of registered headers at IANA. Note that you can always add your own headers for your own organization or application as long as it starts with x-.

Media Types

Information is generated, transmitted, and stored as bit sequences. A media type describes the way in which bits are to be interpreted. Use media types as the values of the Content-type and other headers. Examples:

text/html
image/png
audio/mp4
video/H264

More notes here.

Response Status Codes

Every HTTP response begins with a three-digit status code. Most are listed below. For a nice summary, see Wikipedia; for the official documentation, see RFC 7231, Section 6.

1xx : INFORMATIONAL
100	Continue
101	Switching Protocols
102	Processing
103	Early Hints
2xx : SUCCESS
200	OK
201	Created (usually you should set the Location header for this)
202	Accepted (used for asynch requests)
203	Non-Authoritative Information
204	No Content
205	Reset Content
206	Partial Content
207	Multi-Status
208	Already Reported
226	IM Used
3xx : REDIRECT
300	Multiple Choices
301	Moved Permanently
302	Found (SUPERSEDED BY 303 AND 307)
303	See Other
304	Not Modified
305	Use Proxy
307	Temporary Redirect
308	Permanent Redirect
4xx : CLIENT ERROR
400	Bad Request
401	Unauthorized
402	Payment Required
403	Forbidden
404	Not Found
405	Method Not Allowed (service doesn't support the requested method at that URI)
406	Not Acceptable (server can't give back a representation in a requested format)
407	Proxy Authentication Required
408	Request Timeout
409	Conflict
410	Gone
411	Length Required
412	Precondition Failed
413	Payload Too Large
414	URI Too Long
415	Unsupported Media Type (server can't process the request body)
416	Range Not Satisfiable
417	Expectation Failed
418	I'm a Teapot
421	Misdirected Request
422	Unprocessable Entity
423	Locked
424	Failed Dependency
425	Unordered Collection
426	Upgrade Required
428	Precondition Required
429	Too Many Requests
431	Request Header Fields Too Large
451	Unavailable For Legal Reasons
5xx : SERVER ERROR
500	Internal Server Error
501	Not Implemented
502	Bad Gateway
503	Service Unavailable
504	Gateway Timeout
505	HTTP Version Not Supported
506	Variant Also Negotiates
507	Insufficient Storage
508	Loop Detected
509	Bandwidth Limit Exceeded
510	Not Extended
511	Network Authentication Required

HTTP in Practice

Did you notice that our simple webserver had two endpoints, one returning an HTML page and the other just returning raw data? That’s great. HTTP is designed to accept and return resources, which can be raw data, text, HTML, images, videos, whatever.

Some HTTP-based servers are centered around delivering HTML and related content. These are often called webapps. Some are all about exchanging pure data, usually in JSON (or XML, which is still around). These are often called web services or web APIs. If you follow certain principles in your web API architecture, you might get to call your API a REST API.

HTTP Frameworks

In practice, you’d never write an application using the http module directly. You would use a framework written by someone else. That someone else probably built their framework using http directly so you don’t have to.

In no particular order, here are some frameworks that I happen to know about. They are offered without any endorsement:

Language	Frameworks
JavaScript	Express Koa Hapi Ember Angular React Vue
Python	Django Pyramid Twisted Tornado Flask Sanic
Ruby	Rails EventMachine Sinatra Padrino Hanami Cuba Goliath Scorched
Go	Buffalo Iris Gin Martini Revel Gorilla Echo Web.go Goji Beego
Java	SpringMVC Play Grails Wicket Vert.X
Scala	Lift Play Finch Akka Chaos Scalatra BlueEyes
Rust	Warp Thruster Rustful Rustless Tide Nickel Pencil Rocket Canteen Gotham

Later in the course we’ll learn how to build some interesting web applications, using one or more of these frameworks. Remember that HTTP is just a request-response protocol; to build a complete web app there’s tons to learn about HTML, CSS, and JavaScript (for the front-end), and data stores and similar things (for the back-end). Much more to come.

Limitations of HTTP

HTTP is great for requests and responses. But what about applications like interactive games or chats, where the server has to notify the client at any time (not just as a response)? Here the overhead of the protocol is inefficient.

Exercise: Consider writing the Tic Tac Toe application we saw earlier to run in a browser over HTTP. A client can POST a move and receive a response (not your turn, square already taken, move accepted), but how will the client know when the other player moved? Should the client make periodic requests to the server to see if the other player moved? Why or why not?

For games and notifications, WebSockets are almost always better!

Appendix: Node’s http Module

Because HTTP is so massively popular, Node.js comes with a built-in module called http to help you write HTTP servers! How convenient! Here are some of the module highlights:

class Agent
Values	METHODS STATUS_CODES globalAgent maxHeaderSize
Functions	createServer() request() get()
Properties	freeSockets maxFreeSockets maxSockets requests sockets
Methods	createConnection() keepSocketAlive() reuseSocket() destroy() getName()
class Server
Events	checkContinue checkExpectation clientError close connect connection request upgrade
Properties	headersTimeout listening maxHeadersCount timeout keepAliveTimeout
Methods	close() listen() setTimeout()
class IncomingMessage
Events	aborted close
Properties	aborted complete headers httpVersion method rawHeaders rawTrailers socket statusCode statusMessage trailers url
Methods	destroy() setTimeout()
class ClientRequest
Events	abort connect continue information response socket timeout upgrade
Properties	aborted connection finished maxHeadersCount path socket
Methods	abort() end() flushHeaders() getHeader() removeHeader() setHeader() setNoDelay()
class ServerResponse
Events	close finish
Properties	connection finished headersSent sendDate socket statusCode statusMessage
Methods	addTrailers() end() getHeader() getHeaders() getHeaderNames() hasHeader() removeHeader() setHeader() setTimeout() write() writeContinue() writeHead() writeProcessing()

Summary

We’ve covered:

Writing a trivial HTTP server in Node
A little overview of what’s in Node’s http module
Different ways to hit an HTTP server (nc, curl, browser)
Basics of HTTP requests and responses
What an HTTP framework is
One limitation of HTTP