Introduction to Networks

We already know the social, political, and economic impact of computer networking. We already know it’s popular. We already know it’s cool. Let’s get started with technical details.

Unit Goals

To get a big-picture understanding of networking as a field and how the concept of layering makes the operation of large-scale networks possible.

What is a Network?

Here are a few terms to get started:

A network: A group of devices that can communicate with each other over links. Each device is called a host. Each host has a unique address.
An internet: A network of networks. On an internet, each host has an address of the form n/h where n is the network number and h is the number of the host on network n. As long as all of the networks in the internet have unique network numbers, combining the network number and host number will give unique global names. Therefore from the outside an internet looks like a single network!
A router: A device that appears simultaneously on two or more networks. (Usually this is a computer or device with two or more network interface cards, or NICs.)
The Internet: The biggest internet around. “You know it when you see it.”

Exercise: Make sure you can explain the terms network, host, address, internet, and router without looking at a definition.

Why Do We Need Internets?

You know what doesn’t work? Connecting one billion devices to each other directly. Just connecting 9 devices like that requires 36 bi-directional links; a billion devices would need 499,999,999,500,000,000 (half a quintillion):

If you try such a thing, you will fail, and an internet will just evolve:

An internet may start with a single, global ISP, then multiple ISPs will arise, then some regional ISPs, etc. Then big content providers might build content delivery networks, too.

Internets give us a bunch of advantages:

Heterogeneity: Networks that appear on an internet do not have to have the same type. It is fine to connect a token-ring network to an Ethernet one, or an FDDI to a X.25, etc. Internets aren’t about physical media—they are more abstract. Sotware gives us the illusion of a single, universal network.
Performance: Segmenting a network into an internet using routers can increase performance in many cases because some networks work by broadcasting frames to every other device on the network. Isolating a particularly busy segment reduces the traffic that the rest of the devices have to put up with.
Security: Routers can be programmed to let only certain traffic through to particular segments.
Economy and Scale: Some networking technologies (such as Ethernet and Wi-fi) are super fast but can only work over short distances. Connecting a local network to an internet allows communication over a much wider area. Technologies that enable super fast communication over hundreds of kilometers or more are crazy expensive, so they would be employed not in homes but deep inside the internet.

Exercise: Vint Cerf and Bob Kahn get a ton of credit for pioneering or popularizing internetworking by inventing TCP/IP, something we will learn in detail later on. What are the key ideas of TCP/IP? Why were they so powerful? Also, find Cerf and Kahn’s landmark paper and read it.

How To Study Computer Networking

The field of computer networks is very large and has a few overlapping areas of study. One coarse breakdown of the field into topics is:

Area	Topics
Data Transmission	Hardware Physical media (e.g., wire, satellite, radio, infrared, optical fiber) Data rate, throughput, bandwith Carrier signals Modems How data is encoded and transmitted along links Channels and multiplexing Lots of fun physics and electrical engineering
Packet Switching	Packet formats Packet flow within a network Routing between networks Dealing with loops and congestion Queueing Theory Lots of math
Network Architecture	Intranetwork Topologies Internetwork Topologies Layers Protocols The 4-Layer, 5-layer, and 7-layer Models APIs for each layer Management and Governance Standards
Network Applications	Well known apps, e.g., Email, DNS, FTP, Web Client-server vs. P2P applications Socket APIs Middleware Security Firewalls Performance

It’s hard to study each section on its own; instead some interesting path through the topics should taken. The approach taken by Douglas Comer in the 5th edition of his popular book is:

Internet applications, protocols, layers, client-server architectures, socket programming.
Data communication, hardware, modulation, multiplexing on channels, channel encoding.
Packet switching.
The major protocols of the Internet.
Performance, security, management, bootstrapping, multimedia.

Sections 2, 3, and 5 are of course augmented with case studies of these topics on the global Internet.

The Major Networking Concepts

Before studying each topic in detail, we should get big-picture overviews of the most important conceptual topics that make networking and internetworking possible. These conceptual topics are:

Layers
Packets
Routing
Security
Administration

Layers

In order to be understood by humans, complex systems must be designed in a hierarchical fashion, with clear separation of concerns between layers. Internets are complex. A commonly accepted approach to network design is the four layer model:

APPLICATION LAYER

TRANSPORT LAYER

NETWORK LAYER

LINK LAYER

Conceptually, each layer talks to the corresponding layer on the other host via some sort of protocol. Within a host, layers talk only to the layer just below or above. And they don’t care how any of the other layers are implemented; they use inter-layer APIs (e.g. the link library provides services that the network library invokes).

The layers are (yes I know they are “out of order”):

Application: Applications are programs like the World Wide Web, BitTorrent, or Skype. Applications on different computers talk to each other as if there is a (generally) reliable, bi-directional byte stream to communicate over. Applications use protocols like HTTP to talk to each other. Applications don’t know or care how the data gets from one host to another. Even though applications only see streams, the data is (behind the scenes) actually delivered in packets.
Link: The job of the link layer is to get packets across a single link. How this happens is determined by hardware and other physical characteristics.
Network: The network layer’s job is to get packets all the way across the network from the source host to the destination host. It reads the header in each packet to see where it is going and consults routing tables in the routers to see where to send it next. It uses the services of the link layer to send it one link at a time. It does not care how the link layer works (or whether it is Ethernet or WiFi or DSL or 4G or whatever.) Generally the network layer makes no guarantees that the packets will arrive in order, makes no guarantees they will not be duplicated, and makes no guarantees they will even arrive at all!
Transport: This layer is responsible for (if desired) retransmitting and reordering packets to provide reliability on top of the unreliable network layer, and handles congestion. (TCP does this, UDP does not). While the network layer is concerned with routing packets to the right destination host, the transport layer gets data to the right application running on the host.

So let’s review:

For an Application on host A to send data to host B:

The application asks the transport layer to break up the data stream into transport-layer packets (sometimes called segments), and then send the packets.
The transport layer asks the network layer to deliver the transport-layer packet. The network layer creates an network-layer packet (sometimes called a datagram) with the transport-layer packet inside.
The network layer figures out the first hop and asks the link layer to deliever it. The link layer will wrap the datagram into its own kind of packet (sometimes called a frame) and sends it along the link to the next hop.
At the new hop, which is probably a router, the link layer unwraps the datagram and hands it to its network layer for processing. The network layer will figure out the next hop, among other things (like decrementing a TTL count), then tell the link layer to send it over the next link.
This repeats until the packet arrives at the destination. Then the link layer passes it up to the network layer which passes it to the transport layer. If the transport protcol is reliable, like TCP, it will do try to reassemble packets in the correct order and requiest transmission of missing packets if necessary, and append data to the stream being consumed by the application layer.

Exercise: For another (and slightly more detailed) summary, read this brief article.

Exercise: Explain how the air-travel analogy works. Consider ticketing, baggage, gate, runways, and air traffic controllers. (Here’s a picture that might help.)

Other Layer Models

You are likely to come across a 5-layer model (that splits the link layer and renames a couple):
  5 Application
  4 Transport
  3 Internet
  2 Network Interface
  1 Physical
A much older 7-layer model, called the OSI Reference Model, splits up the application layer to allow for connections and security (in the 4- and 5-layer models, these concerns are part of the apps):
  7 Application
  6 Presentation (incl. encrypt/decrypt)
  5 Session (incl. open/close connections)
  4 Transport (segments, TCP, UDP)
  3 Network (datagrams, packets, IP)
  2 Data Link (frames)
  1 Physical

Packets

Most computer networks are packet switched as opposed to circuit switched. Circuit switching gives you a dedicated, pre-routed, line between the two parties; packet switching breaks up the message into packets and routes them all indepdently throughout the network.

Each packet has a header and a body.

HEADER

BODY

The body contains the data being sent. The header of course varies depending on the type of packet, but typical header items (these may or may not appear in all packet types) include:

Header size and/or body size and/or packet size
Version number and/or packet type
Magic number
Checksum or hash
Source and destination address
Sequence number (is this packet is part of a reliable protocol)

Each packet type specifies the precise location of each value within the header. For example, an IP version 4 packet has the following specification:

Bits	Description
0..3	Version: this is always 4 in IPv4
4..7	IHL: Internet Header Length. The number of 32-bit words in the header. The minimum value is 5. The protocol allows a number of options (extra 32-bit words that go in the header), so if there were, say, two such options, the value would be 7.
8..13	DSCP: Differentiated Services Code Point (see RFC 3260)
14..15	ECN: Explicit Congestion Notification (see RFC 3168)
16..31	Total Packet Length: The total packet size (header + body) in bytes. Note the minimum is 20, because the smallest possible header is 20 bytes. Because this is a 16-bit field, the maximum value is 65536 bytes.
32..47	Identification
48..50	Flags
51..63	Fragment Offset
64..71	Time to Live
72..79	Protocol
80..95	Header Checksum
96..127	Source IP Address
128..159	Destination IP Address
160..(160+oc*32-1)	Options
(160+oc*32..)	Packet Body

Woah! Too much, too soon!

Yes and no. Of course this doesn’t feel like the time to discss the intricate details of IP packets. We’re just in overview mode. However, it does help to see real, concrete examples. Focus for now on what’s in the packet, conceptually, not where exactly everything fits. Get a feel, too, for how the protocol designers allowed for customization of packets in the header.

Packet format documentation is rarely shown in tables, but rather laid out in a more compact form. like so:

0 0	0 4	0 8	1 4	1 6	1 9
Version	IHL	DSCP	ECN	Total Length
Identification				Flags	Fragment Offset
TTL		Protocol		Header Checksum
Source IP Address
Destination IP Address
Options (if IHL > 5)
Body

Here’s something really important, and very cool. Note how each layer’s packet gets encapsulated within the packet of the layer beneath it:

Routing

Routing refers to how the path from source to destination is computed. A routing algorithm determines this. Generally, the routing algorithm is responsible for helping to populate the routing table at each router.

Routing Tables

We’ll oversimplify for now. Each router has a table mapping the destination network to the router it needs to foward the packet to.

Classwork: Let’s do a routing worksheet!

Will build (trivial, static) routing tables for each of the networks in the internet example at the top of these notes. I’ll start with the table for network 2:

Dest. Network Forward to
1 3
2 (local)
3 4
4 4

Create the other three tables. After you finished, we’ll discuss ways to simplify the table (since we can't actually list all of the destination networks in one table.)

Dest. Network	Forward to
1	3
2	(local)
3	4
4	4

Congestion

Routing algorithms have to be adaptive. Routers accept packets and then forward them. Packets may come in faster than they can be sent out, so they are queued in the router’s packet buffer. If too many packets are stored in the queue, incoming packets may have to be dropped. A routing algorithm might then reroute certain traffic because of this.

Exercise: Why else might a routing algorithm need to reroute traffic?

Performance

Network engineers have to take performance into account. There are tons of ways performance can be affected. But there are little calculations you will want to be good at making. Let’s just do a single one for now.

A packet is P bits. The medium transmits R bits/second. The end-to-end delay introduced by the router, if it reads the whole packet into memory before sending it out, is:

The time needed to read the packet in = P/R, plus
The time needed to transmit the packet from memory onto the outgoing link, call this T, plus
The time needed to send the packet along to the next hop = P/R

So 2(P/R) + T.

That was just a trivial example, of course. A lot was rolled into that T.

Security

Networks are shared resources and need to be convenient to use. Convenience is often at odds with security. The big security questions are: (1) How do you attack? (2) How do you defend? (3) How do you prevent attacks?

Some topics we will be considering:

Cryptography to prevent eavesdropping attacks and provide a means to sign messages. Cryptosystems can be symmetric (shared key) or asymmetric (public key).
How can keys be securely distributed?
How do we know users are who they say they are (authentication)?
How do we know which users are allowed to do which operations (authorization)?
How do we build in security at all levels: transport-level security, wireless security, email security, etc.?
How do we detect intruders? How do we detect malware? How do we prevent intruders and malware?
How do we deal with certain kinds of traffic?

Administration

In real life, networks have to be:

Set up
Maintained (analyzed, ugraded, troubleshot, etc.)

Analysis and troubleshooting is done with various tools:

Command line utilities: ping, traceroute, netstat, tcpdump, etc.
Graphical packet analyzers. One is Wireshark.

These will be covered later in the course.

Summary

We’ve covered:

What a network is, and why at large scale we need internets
Issues that come up in the study of networks
Why layering helps us understand networks
The 4-layer network model
The basic ideas behind packets, including an example format (IP)
Issues that come up in packet routing, and what a routing table is
Questions we might ask when worrying about network security