HTTP Programming

Let’s get acquainted with HTTP by writing programs. We’ll even write a little web server. From scratch.

Unit Goals

To become acquainted with HTTP and write some basic programs utilizing the protocol.

Overview

Previously, we created our very own application layer protocols. The date and capitalization application protocols were rather trivial, and the our Tic-Tac-Toe protocol (TTTP) was a bit more interesting. Our chat application also had its very own protocol.

But there’s an existing, well-known protocol called HTTP (read about it at Wikipedia) that we can layer applications on top of!

We’ll study HTTP in more detail later, but for now, let’s use it to make an application where clients simply send a message to the server asking for the current datetime. HTTP is a stateless, request-response protocol. For our app, the client will send an HTTP request like:

GET /date

and the HTTP response will be something like:

2019-03-28T04:49:47.952Z

A Simple HTTP Server

The first HTTP server we are going to write will run on port 59999 and feature two endpoints:

datewebserver.js
const http = require('http');

http.createServer((request, response) => {
  if (request.method === 'GET' && request.url === '/date') {
    response.writeHead(200, { 'Content-Type': 'text/plain' });
    response.end(new Date().toISOString(), 'utf-8');
  } else if (request.method === 'GET' && request.url === '/') {
    response.writeHead(200, { 'Content-Type': 'text/html' });
    response.end('<h1>The Date Client</h1><a href="/date">Get date from server</a>');
  } else {
    response.writeHead(404, { 'Content-Type': 'text/plain' });
    response.end('Sorry, that’s not there');
  }
}).listen(59999);

console.log('Date Server running at port 59999');

Discussion:

Run the server:

$ node datewebserver.js
Date Server running at port 59999

HTTP Clients

There are at least four ways to use this server. For illustration, we’ll run the server on localhost. For classwork, we’ll put the clients and servers on different machines.

nc

First, we can just use nc (and if you are taking a networking class, you should do this). With the server running on your local box:

$ nc localhost 59999
GET /date

HTTP/1.1 200 OK
Content-Type: text/plain
Date: Thu, 28 Mar 2019 04:49:47 GMT
Connection: close

2019-03-28T04:49:47.952Z

You have to hit the Enter key TWICE after entering the GET line. That is mandated by the protocol, as we’ll soon see.

That response was pretty large.

That’s right, an HTTP response contains a lot of metadata, which we’ll cover later. And yes, part of the response metadata is the server date, which might make you wonder why anyone would ever write a date server....

What about the other endpoint?

$ nc localhost 59999
GET /

HTTP/1.1 200 OK
Content-Type: text/html
Date: Thu, 28 Mar 2019 05:05:11 GMT
Connection: close

<h1>The Date Client</h1><a href="/date">Get date from server</a>

And what about that 404 thing?

$ nc localhost 59999
GET /whatever

HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Thu, 28 Mar 2019 05:06:58 GMT
Connection: close

Sorry, that’s not there

By the way, Node’s http module does some pretty good parsing of requests and handles a good deal of the protocol for you. These examples will just give you an idea of what’s in HTTP:

$ nc localhost 59999
bzusdfyiwuef
HTTP/1.1 400 Bad Request

cURL

The second way is to use curl. Use the -i option to see the whole response:

$ curl -i -X GET localhost:59999/date
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Thu, 28 Mar 2019 05:08:42 GMT
Connection: keep-alive
Transfer-Encoding: chunked

2019-03-28T05:08:42.217Z

Without -i you just get the response body:

$ curl -X GET localhost:59999/date
2019-03-28T05:10:15.931Z
Exercise: Try curl on the other endpoint (/) and on non-existent endpoints.

Web Browsers

Okay, yes, the third way is to use a web browser, which speaks HTTP already. With your server running on your local machine, enter http://localhost:59999 into the browser’s address bar, hit Enter, and see the page. Click on the link to hit the other endpoint.

datewebserverscreens.png

Also, enter the URI of the date endpoint directly: http://localhost:59999/date. That worked great, didn’t it?

Programmatically, from a Server or Command Line

Your favorite programming language should have an http library for making requests. If you are making requests programmatically, you’ll have to examine the response in some detail and take one of several actions depending on the status code. You’ll also have to check and process all the response headers, and read in the response body. The code can get clunky. Here’s how it looks in JavaScript using the raw http library:

datewebclient.js
const http = require('http');

http.get('http://localhost:59999/date', (res) => {
  if (res.statusCode !== 200) {
    console.log(`Received status code ${res.statusCode}`);
    res.resume();
    return;
  }
  let dateString = '';
  res.on('data', (chunk) => { dateString += chunk; });
  res.on('end', () => console.log(dateString));
}).on('error', (e) => {
  console.error(`Request failed: ${e.message}`);
});

DON’T PANIC!!! If you npm install request then it’s much, much easier:

datewebclient-request.js
const request = require('request');

request('http://localhost:59999/date', (error, response, body) => {
  if (error) {
    console.error(error);
  } else if (response.statusCode !== 200) {
    console.log(`Received status code ${response.statusCode}`);
  } else {
    console.log(body);
  }
});

Oh wait, aren’t promises much more awesome than those silly callbacks?

$ npm install request request-promise --save
datewebclient-promise.js
const rp = require('request-promise');

rp('http://localhost:59999/date').then((body) => {
  console.log(body);
}).catch((error) => {
  console.log(error.message || 'Error');
});

Interesting.... request-promise treats both non-sucess HTTP responses and failures to connect as reasons to reject a promise.

Programmatically, from a Client-Side Script?

Let’s review the different ways we’ve hit the server so far:

Now what if wanted to write our own web client, and not use the web page delivered by that web server? In other words: can we write a web app that uses the date web server just for its data, and not use its HTML?

This is what fetchis for. Let’s try:

datewebclient-fetch.html
<html>
  <head>
    <meta charset="utf-8">
    <title>Date Web Client</title>
  </head>
  <body>
    <p>IP Address of Server: <input id="ip" type="text"></p>
    <p><button>Get Current Date</button></p>
    <p id="response"></p>
    <script>
      document.querySelector('button').addEventListener('click', () => {
        const host = document.querySelector('#ip').value;
        const messageArea = document.querySelector('#response');
        fetch(`http://${host}:59999/date`).then((res) => {
          messageArea.textContent = `${res.status} `;
          return res.text();
        }).then((data) => {
          messageArea.textContent += data;
        }).catch((error) => {
          messageArea.textContent = error;
        })
      });
    </script>
  </body>
</html>

Now if we load this file into the browser, enter localhost into the IP box, and hit the button, we get:

Access to fetch at 'http://localhost:59999/date' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

CORS? That’s Cross Origin Resource Sharing. Basically web browsers stop you from fetching resources from sites (host+port) other than that which served you the JavaScript you are running. How can you get around this? If you really were writing a web service, one that just delivers data and not web pages, then you can return a response header Access-Control-Allow-Origin, allowing access to certain web clients or to everyone (with the value *).

Classwork: Get into groups of three. One person wll run the server. The other two will use the HTML client and try to hit the server. Both should see CORS errors. Then the person writing the server will add
    'Access-control-allow-origin': 'http://ip_address_of_one_of_the_clients'
into the argument to writeHead for the /date endpoint. Now only that one client will be able to hit the server. Next, the server programmer will change that header to
    'Access-control-allow-origin': '*'
and the group should verify that both clients can now hit the server.
The other way to “get around” CORS is to just make these kinds of calls from other servers. It’s only the call from the browser that is impacted by the same-origin policy.

HTTP Requests and Responses

Let’s look at HTTP, the protocol, in a little more detail.

Formats

HTTP Request format:

Method SP RequestURI SP HTTPVersion CR LF

ZERO OR MORE RESPONSE HEADERS

CR LF

OPTIONAL MESSAGE BODY

Example:

PUT https://ehr.example.com/types/Lab.Diagnosis HTTP/1.1
authorization: Bearer hbGciOiJIUz8I1NiJ9JzdWIiOiJh
content-type: application/json
Origin: https://e1ak8fcq37hprs.cloudfront.net
Referer: https://e1ak8fcq37hprs.cloudfront.net/
User-Agent: Mozilla/5.0 AppleWebKit/537.36Chrome/72.0.3626.96

{"display":"Lab Diagnosis","schema":{"code":"string"}}

HTTP Response format:

HTTPVersion SP StatusCode SP ReasonPhrase CR LF

ZERO OR MORE RESPONSE HEADERS

CR LF

OPTIONAL MESSAGE BODY

Example:

HTTP/1.1 200 OK
access-control-allow-headers: Content-Type,Authorization
access-control-allow-methods: GET,PUT,POST,DELETE,PATCH,HEAD,OPTIONS
access-control-allow-origin: *
content-encoding: gzip
content-length: 94
content-type: application/json
date: Thu, 28 Mar 2019 16:10:20 GMT
status: 200
via: 1.1 9f6b9465776576cba700d600678836e.cloudfront.net (CloudFront)
x-amz-apigw-id: XQq8-Edb8625FbxA=
x-amzn-remapped-content-length: 92
x-amzn-requestid: fc22c7c7-5173-11e9-b44d-b393bfa59355
x-cache: Miss from cloudfront

{
  "post_type": {
    "display": "Lab Diagnosis",
    "name": "Lab.Diagnosis",
    "schema": {
      "code":"string"
    }
  }
}

More details at MDN. Also you should check out the RFCs.

Request Methods

HTTP was designed around the idea of an infinite number of types of resources (nouns) that are manipulated with a very small number of methods (verbs). The methods are:

MethodBrief Summary
GETRequests a representation of the target resource
HEADIdentical to GET except that the server must not send a message body in the response
POSTRequests that the target resource process the representation enclosed in the request according to the resource’s own specific semantics
PUTRequests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload
DELETERequests that the origin server remove the association between the target resource and its current functionality
PATCHRequests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. The set of changes is represented in a format called a “patch document” identified by a media type
CONNECTRequests that the recipient establish a tunnel to the destination origin server identified by the request-target and, if successful, thereafter restrict its behavior to blind forwarding of packets, in both directions, until the tunnel is closed
OPTIONSRequests information about the communication options available for the target resource, at either the origin server or an intervening intermediary
TRACERequests a remote, application-level loop-back of the request message
Exercise: Read, and study, the official descriptions of every method from the RFC itself. You can also fund summaries at Wikipedia or MDN or the W3C, but really you should know where to find the excruciatingly complete details.

Headers

Both requests and responses can be packed with metadata which are called headers. Here are some of the common ones:

Request Headers A-IM Accept Accept-Charset Accept-Encoding Accept-Language Accept-Datetime Access-Control-Request-Method Access-Control-Request-Headers Authorization Cache-Control Connection Content-Length Content-Type Cookie Date Expect Forwarded From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Origin Pragma Proxy-Authorization Range Referer TE User-Agent Upgrade Via Warning
Response Headers Accept-Patch Accept-Ranges Age Allow Alt-Svc Cache-Control Connection Content-Disposition Content-Encoding Content-Language Content-Length Content-Location Content-Range Content-Type Date Delta-Base ETag Expires IM Last-Modified Link Location Pragma Proxy-Authenticate Public-Key-Pins Retry-After Server Set-Cookie Strict-Transport-Security Trailer Transfer-Encoding Tk Upgrade Vary Via Warning WWW-Authenticate

Check out the list of registered headers at IANA. Note that you can always add your own headers for your own organization or application as long as it starts with x-.

Media Types

Information is generated, transmitted, and stored as bit sequences. A media type describes the way in which bits are to be interpreted. Use media types as the values of the Content-type and other headers. Examples:

More notes here.

Response Status Codes

Every HTTP response begins with a three-digit status code. Most are listed below. For a nice summary, see Wikipedia; for the official documentation, see RFC 7231, Section 6.

1xx : INFORMATIONAL
100Continue
101Switching Protocols
102Processing
103Early Hints
2xx : SUCCESS
200OK
201Created (usually you should set the Location header for this)
202Accepted (used for asynch requests)
203Non-Authoritative Information
204No Content
205Reset Content
206Partial Content
207Multi-Status
208Already Reported
226IM Used
3xx : REDIRECT
300Multiple Choices
301Moved Permanently
302Found (SUPERSEDED BY 303 AND 307)
303See Other
304Not Modified
305Use Proxy
307Temporary Redirect
308Permanent Redirect
4xx : CLIENT ERROR
400Bad Request
401Unauthorized
402Payment Required
403Forbidden
404Not Found
405Method Not Allowed (service doesn't support the requested method at that URI)
406Not Acceptable (server can't give back a representation in a requested format)
407Proxy Authentication Required
408Request Timeout
409Conflict
410Gone
411Length Required
412Precondition Failed
413Payload Too Large
414URI Too Long
415Unsupported Media Type (server can't process the request body)
416Range Not Satisfiable
417Expectation Failed
418I'm a Teapot
421Misdirected Request
422Unprocessable Entity
423Locked
424Failed Dependency
425Unordered Collection
426Upgrade Required
428Precondition Required
429Too Many Requests
431Request Header Fields Too Large
451Unavailable For Legal Reasons
5xx : SERVER ERROR
500Internal Server Error
501Not Implemented
502Bad Gateway
503Service Unavailable
504Gateway Timeout
505HTTP Version Not Supported
506Variant Also Negotiates
507Insufficient Storage
508Loop Detected
509Bandwidth Limit Exceeded
510Not Extended
511Network Authentication Required

HTTP in Practice

Did you notice that our simple webserver had two endpoints, one returning an HTML page and the other just returning raw data? That’s great. HTTP is designed to accept and return resources, which can be raw data, text, HTML, images, videos, whatever.

Some HTTP-based servers are centered around delivering HTML and related content. These are often called webapps. Some are all about exchanging pure data, usually in JSON (or XML, which is still around). These are often called web services or web APIs. If you follow certain principles in your web API architecture, you might get to call your API a REST API.

HTTP Frameworks

In practice, you’d never write an application using the http module directly. You would use a framework written by someone else. That someone else probably built their framework using http directly so you don’t have to.

In no particular order, here are some frameworks that I happen to know about. They are offered without any endorsement:

LanguageFrameworks
JavaScriptExpress Koa Hapi Ember Angular React Vue
PythonDjango Pyramid Twisted Tornado Flask Sanic
RubyRails EventMachine Sinatra Padrino Hanami Cuba Goliath Scorched
GoBuffalo Iris Gin Martini Revel Gorilla Echo Web.go Goji Beego
JavaSpringMVC Play Grails Wicket Vert.X
ScalaLift Play Finch Akka Chaos Scalatra BlueEyes
RustWarp Thruster Rustful Rustless Tide Nickel Pencil Rocket Canteen Gotham

Later in the course we’ll learn how to build some interesting web applications, using one or more of these frameworks. Remember that HTTP is just a request-response protocol; to build a complete web app there’s tons to learn about HTML, CSS, and JavaScript (for the front-end), and data stores and similar things (for the back-end). Much more to come.

Limitations of HTTP

HTTP is great for requests and responses. But what about applications like interactive games or chats, where the server has to notify the client at any time (not just as a response)? Here the overhead of the protocol is inefficient.

Exercise: Consider writing the Tic Tac Toe application we saw earlier to run in a browser over HTTP. A client can POST a move and receive a response (not your turn, square already taken, move accepted), but how will the client know when the other player moved? Should the client make periodic requests to the server to see if the other player moved? Why or why not?

For games and notifications, WebSockets are almost always better!

Appendix: Node’s http Module

Because HTTP is so massively popular, Node.js comes with a built-in module called http to help you write HTTP servers! How convenient! Here are some of the module highlights:

ValuesMETHODS STATUS_CODES globalAgent maxHeaderSize
FunctionscreateServer() request() get()
class Agent
PropertiesfreeSockets maxFreeSockets maxSockets requests sockets
MethodscreateConnection() keepSocketAlive() reuseSocket() destroy() getName()
class Server
EventscheckContinue checkExpectation clientError close connect connection request upgrade
PropertiesheadersTimeout listening maxHeadersCount timeout keepAliveTimeout
Methodsclose() listen() setTimeout()
class IncomingMessage
Eventsaborted close
Propertiesaborted complete headers httpVersion method rawHeaders rawTrailers socket statusCode statusMessage trailers url
Methodsdestroy() setTimeout()
class ClientRequest
Eventsabort connect continue information response socket timeout upgrade
Propertiesaborted connection finished maxHeadersCount path socket
Methodsabort() end() flushHeaders() getHeader() removeHeader() setHeader() setNoDelay()
class ServerResponse
Eventsclose finish
Propertiesconnection finished headersSent sendDate socket statusCode statusMessage
MethodsaddTrailers() end() getHeader() getHeaders() getHeaderNames() hasHeader() removeHeader() setHeader() setTimeout() write() writeContinue() writeHead() writeProcessing()

Summary

We’ve covered:

  • Writing a trivial HTTP server in Node
  • A little overview of what’s in Node’s http module
  • Different ways to hit an HTTP server (nc, curl, browser)
  • Basics of HTTP requests and responses
  • What an HTTP framework is
  • One limitation of HTTP