# Cryptology

Crypto stuff? Sounds fun.

## Unit Goals

To understand what cryptography and cryptanalysis is all about.

These notes serve as an introduction to crypto and not a complete curriculum to make you an expert who is licensed to implement ciphers, hashes, signatures, or any such thing in any real-world application. Sure, it’s fine to play around at home, but don’t EVER try to implement cryptographic routines for a customer or your own company. Leave it to the pros. There are good libraries out there. Use them.

However, you should always check, and ask, whether the library you are using implements an algorithm that has not yet been cracked, and has sufficient key length.

This has been an important public service announcement.

## Background

It’s conventional to talk about crypto in a setting with a few characters named Alice (the sender), Bob (the receiver), Eve (the eavesdropper), and Mallroy (a powerful malicious actor).

Alice and Bob can be people, clients and servers, peer computers, data stores, network routers, etc. Eve can observe the communication. Mallory can not only eavesdrop but also spoof one of the participants, add, modify or delete actual messages, hijack the connection, do a denial of service, inject malware, etc.

Goals:

• Confidentiality: It should cost Eve more to recover $m$ than $m$ is worth.
• Authentication: Bob should be able to verify that it was Alice that sent $m$.
• Integrity: Bob should be able to verify that $m$ was not tampered with.
• Nonrepudiation: Alice should not be able to deny sending $m$.

The best cryptosystems assume that Eve and Mallory know $E$, $D$, $c$, and, if $k_e \neq k_d$, then $k_e$ as well. Most cryptosystems do not rely on their algorithms being kept secret, because:

• If your secret algorithm is compromised (someone could leave your group), you have to change it, and that is way harder than just changing a key!
• Public algorithms can be subjected to thousands of experts looking for flaws, so you have some degree of confidence in those that withstand scrutiny.

Further reading: Security Through ObscurityKerchoff’s Principlethis discussion

The study of cryptology includes the design of various ciphers, cryptanalysis methods (attacks), key exchange, key authentication, cryptographic hashing, digital signing, and social issues (legal, political, etc.). See Wikipedia’s topics in cryptography page.

## Definitions

Words to know:

Cryptography
The art and science of making ciphers.
Cryptanalysis
The art and science of breaking ciphers, i.e., the extraction of $m$ given $c$, $E$, $D$, and possibly $k_e$.
Cryptology
The study of cryptography and cryptanalysis.
Exercise: Find out about steganography. How is it different from cryptography?
Cryptosystem
A particular suite of algorithms and protocols for encryption, decryption, and key generation. Examples: Cramer-Shoup cryptosystem, Rabin cryptosystem, Benaloh cryptosystem, RSA cryptosystem.
Cryptographic System
Any system that uses cryptography.
Cipher
An algorithm used in a cryptosystem.
Exercise: How is a “code” different from a “cipher”? Are codes more secure than ciphers? Why aren’t they used as often?
Confusion
The property of having the relationship between the plaintext, ciphertext, and key so complicated as to be useless to the cryptanalyst.
Diffusion
The property of having statistical patterns in the plaintext spread widely throughout the ciphertext.

## Timelines

History is important in any body of knowledge. We learn from the successes and failures of the past.

## Kinds of Ciphers

Here are some useful categories of ciphers. Note that a particular cipher may belong to more than one of these categories.

• Classical: A cipher easy enough to be performed by hand, usually character-based. Also called manual.
• Modern: Pretty much any cipher that isn’t classical.
• Substitution: Each character of the plaintext is replaced with one or more characters to make the ciphertext.
• Transposition: Characters in the plaintext are rearranged to form the ciphertext.
• Monoalphabetic: A substitution cipher in which a character of the plaintext is always replaced by the same character.
• Polyalphabetic: A substitution cipher that essentially uses multiple monoalphabetic substitution mappings.
• Homophonic: A substitution in which one character can map to one of a set of characters.
• Polygraphic: A substitution of blocks of characters for blocks of characters.
• Periodic: A polyalphabetic cipher in which the replacement scheme repeats.
• Non-periodic: Self-explanatory if you understand periodic.
• Block: Encryption takes place not per character but per blocks of characters.
• Stream: A cipher operating on a data stream of unknown length, usually incorporating feedback.
• Secret Key: A cipher in which $k_e$ and $k_d$ are the same or trivially derivable from one another; requires the parties to meet in secret to exchange the keys they’ll be using. Also called symmetric.
• Public Key: A scheme in which everyone’s encryption key is publicly known but their decryption key is kept secret. Also called asymmetric.
Security and Cryptology

Ciphers are fun to study, but they vary widely in terms of their usefulness in security. One of the biggest security flaws in a system is misusing crypto. That’s right: the problem is its misuse, not its omission!

Just because a cipher is listed on this page does not mean you should use it. Our goal is to learn a wide range of ciphers, both historical and modern.

## Secret Key Cryptography

Secret key (a.k.a. symmetric key) ciphers are much faster than public key ciphers, but key management can be a huge problem.

• If $n$ people in a group need to communicate, they need $\frac{n(n-1)}{2}$ keys.
• Keys must be distributed securely (in secret).
• Keys must be kept safe.
• Keys should be changed frequently, which feeds back into the distribution headache.
NOTE

In the character-based examples below, we’ll assume (without any loss of generality) a 26 symbol alphabet (A..Z).

### Caesar Cipher

A completely pathetic and insecure cipher by modern standards. The encryption key $k_e$ is a small integer and $k_d = k_e$. To encrypt, add $k_e$ to each plaintext character; to decrypt, subtract.

For example, with k=5, ATTACKATDAWN becomes FYYFHPFYIFBS

Trivial to crack: just guess $k_e$.

### Monoalphabetic Substitution

Instead of simply adding a fixed offset to each character, you can precompute a substitution table by generating a random permutation of your alphabet. For example:

    ABCDEFGHIJKLMNOPQRSTUVWXYZ
MQHPSVJYCURFTBILAKWNGZDOEX


So ATTACKATDAWN is now MNNMHRMNPMDB.

You don’t crack this by guessing the key (there are $n!$ possible keys), but frequency analysis can crack any monoalphabetic substitution cipher, provided the message is long enough.

For techniques whose key is a permutation, one way to make the key easier to remember is to pick a phrase, lay out its unique letters, then fill in missing letters in order. For example, PREMATURE OPTIMIZATION IS THE ROOT OF ALL EVIL yields this substitution mapping:

    PREMATUOIZNSHFLVBCDGJKQWXY


### Homophonic Substitution

Each plaintext letter maps to one or more symbols in the ciphertext. The number of targets should be proportional to its frequency (to defeat frequency analysis). Example:

    A   12 15 36 50 56 70 81 95
B   51 84
C   16 44 65
D   04 06 48 82
E   01 17 19 34 47 49 58 60 67 77 85 90
F   13 27
G   09 28
H   26 42 53 59 68 71
I   35 73 76 86 91 96
J   18
K   07
L   29 40 54 87
M   25 30
N   21 61 62 69 74 94
O   02 03 08 10 57 75 93
P   41 98
Q   97
R   32 38 43 45 80 83
S   14 22 39 79 89 99
T   00 20 23 33 46 52 72 78 88
U   11 64 66
V   37
W   63 92
X   31
Y   24 55
Z   05


To encrypt, choose randomly among possibilities. For example, one possible encryption of ATTACKATDAWN is:

    56 78 20 95 65 07 12 72 06 50 92 61


### Simple Vigenère

The cipher known as the simple shift Vigenère cipher was not invented by Vigenère at all... it seems to have been first described by Giovan Battista Bellaso. The key is a string that you add to the plaintext with modular addition, like in this example (A=0, B=1, C=2, ..., Z=25):

    Plaintext:  TAKEACOPYOFYOURPOLICYTONORMAWILCOXONTHETHIRDFLOOR
Key:        QUARKQUARKQUARKQUARKQUARKQUARKQUARKQUARKQUARKQUAR
Ciphertext: JUKVKSIPPYVSOLBFILZMONOEYHGANSBWOOYDNHVDXCRUPBIOI


To generate ciphertext by hand you can use a code wheel or a tabula recta.

This scheme isn’t secure since the key repeats. If the key length can be determined, the cryptanalyst can do multiple frequency analyses (one for each shift value in the key). Methods for determining key length are the Kaisiski Method and the Friedman test.

For binary data (i.e., a sequence of bits) modular addition base-2 is just a simple xor. Example:

    Plaintext:  0110000101010000111101001010101010010000001111101
Key:        0000011100000111000001110000011100000111000001110
Ciphertext: 0110011001010111111100111010110110010111001110011


### Auto-Key Vigenère

Vigenère actually created an autokey cipher which is stronger because the key never repeats. Instead the “key” is made up of the keyphrase followed by the plaintext, like this:

    Plaintext:  TAKEACOPYOFYOURPOLICYTONORMAWILCOXONTHETHIRDFLOOR
Key:        QUARKTAKEACOPYOFYOURPOLICYTONORMAWILCOXONTHETHIRD
Ciphertext: JUKVKVOZCOHMDSFUMZCTNHZVQPFOJWCOOTWYVVBHUBYHYSWFU


That one used the plaintext as part of the key. You could also use the ciphertext. See how?

### Modern Auto-Key Ciphers

You can still crack autokey Vigenère ciphers by linguistic analysis, because the key contains text and is thus likely to have high-frequency letters. Modern auto-key ciphers generate the shift values with a random number generator. The key seeds the generator.

Exercise: Implement an autokey cipher in the programming language of your choice.

If the key:

• is as long as or longer than the message being encrypted
• is truly randomly generated
• is used once and only once

Then you have a provably secure cipher called the one time pad. Your actual algorithm can use polyalphabetic substitution or even simple xoring the message with the key, as long as you meet the three criteria above.

The one-time pad can never be cracked. It is a perfect encryption scheme, from a mathematical perspective, anyway.

Exercise: Why aren’t one time pads commonly used, then, given that they are the most secure ciphers possible?

### Playfair

This is an example of a polygraphic substitution cipher. It replaces pairs of characters. The key is a permutation of {A..I,K..Z}, for example:

    Z C B M L
G D A Q E
T U O K H
F S X V N
P I Y R W


To encrypt, write out the plaintext (without spaces or punctuation), sticking in an X between double letters and at the end if necessary to make the text have even length. Then for each pair of letters:

• Let $(a,b)$ be the row and column of the first character and $(c,d)$ be the row and column of the second.
• If $a \neq c$ and $b \neq d$ then return $(a,d)(b,c)$.
• If $a = c$ then return $(a,(b+1) \bmod 5)(c,(d+1) \bmod 5)$.
• If $b = d$ then return $((a+1) \bmod 5,b)((c+1) \bmod 5,d)$.

Example:

    THEN ATTACK FROM THE EAST ⇒
TH EN AT XT AC KF RO MT HE XE AS TX ⇒
UT HW GO FO DB TV YK ZK NH NA DX OF ⇒


Decryption runs the rules in reverse. The Playfair cipher is pretty insecure.

### Four-square

Encrypts digraphs like playfair, but slightly stronger because it allows for double letters and doesn’t yield reversed ciphertext digraphs for reversed plaintext digraphs. Example

    a b c d e    G I V E M
f g h i k    L B R T Y
l m n o p    O D A H C
q r s t u    F K N P Q
v w x y z    S U W X Z

P R E M A    a b c d e
T U O I Z    f g h i k
N S H F L    l m n o p
V B C D G    q r s t u
K Q W X Y    v w x y z


Example:

    THEN ATTACK FROM THE EAST ⇒
TH EN AT TA CK FR OM TH EE AS TX ⇒
NI VL EV FM MO BV DF NI MA VV NX ⇒


Okay, so slightly stronger than Playfair but so what? Computers can crack these things in seconds, or perhaps minutes (given enough ciphertext).

### Simple Block Transposition

The simplest transposition cipher breaks up the message into blocks of size $n$, then scrambles each block according to a permutation of $(1..n)$.

For example, if our key is $(4,1,6,3,2,5)$, the message GETTHATHEALTHINSPECTOR is encrypted as TGATEHATTEHLSHENIPRCOT.

### Columnar Transposition

Write out the message row by row in a grid, then read it out in columns. Totally insecure. The key is just the number of rows. Guess it.

### Rail Fence

The rail fence is no better than the last one, just funkier. The key is the number of rails on which you write the plaintext in an up and down fashion, generating the ciphertext by reading one rail at a time.

Example: To encrypt "fill out and file a WS2475 form" on 4 rails:

    f     t     l     4     m
i   u a   i e   2 7   r
l o   n f   a s   5 o
l     d     w     f


you then read out the ciphertext "ftl4miuaie27rlonfas5oldwf". This is trivial to crack. Just guess $k$.

### Combining Substitution and Transposition

Transposition alone is very weak; substitution alone is very weak; combining them is better. You can mix a lot of the classic substitution ciphers with various transpositions, or use some special combination ciphers like bifid. Also, most of the famous rotor machines and modern ciphers use this combination; in fact they apply these transformations many times.

### Bifid

This one substitutes letters with their coordinates in a grid and does a columnar transposition on the coordinates. Example:

    Z C B M L
G D A Q E
T U O K H
F S X V N
P I Y R W


Write the (row, column) coordinates under each letter of the plaintext (e.g., "A" is at row 1, column 2; "T" is at row 2, column 0, etc.):

    ATTACKATDAWN
122102121143
200213201244


Then read out in rows, group by twos and look up the ciphertext letters:

    122102121143200213201244
A U B A D R T B Q T A W


### Trifid

Like Bifid, but on a cube. Example:

    Z C B     M L F    V N P
G D A     Q E X    I R W
T U O     K H S    Y . J


To encrypt, first write the coordinates:

    ATTACKATDAWN
000001000022
122102121110
200210201221

    000001000022122102121110200210201221
Z  C  Z  O  S  F  H  Q  V  I  N  .


### Enigma

The Enigma was the famous German rotor machine from World War II (actually a family of machines). Most versions implemented a polyalphabetic substitution cipher with a period of 16900 plus a plugboard for scrambling (transposition). The key consisted of the order of the rotors, the starting position of each roter, the ring settings, and the plugboard settings (about $1.6 \times 10^{20}$ possibilities). There was a new key each day (more or less) prepublished in codebooks.

The Allies were able to crack it thanks to some weaknesses in its design...

• No letter would encrypt to itself
• Self-reciprocity meant that there were fewer scrambler setup possibilities

...but more importantly, many weakness in the way it was used...

• It was really easy to find cribs. Most messages began with a weather report.
• Early on, message keys appeared twice in succession.

...and by obtaining codebooks from captured vessels.

You can read about how the Enigma was broken from the NSA, and from Wikipedia.

### Modern Cryptographic Methods

Now that we have Shannon’s information theory, very powerful computers, and centuries of theory and practice behind us, we have modern techniques that:

• Operate on bit strings, not character strings
• Are careful to completely mask patterns and redundancies in the plaintext
• Use random keys (that can be reused, too)
• Ensure that very slight changes in the plaintext affect a large portion of the ciphertext (and vice versa). This is called the Avalanche Effect.

In addition, it’s nice if the cipher is:

• Efficient
• Fault-tolerant

Most ciphers are either block ciphers or stream ciphers. Block ciphers require padding and can operate in different modes (See Schnier’s book or the Wikipedia article.)

• ECB — Electronic Codebook
• CBC — Cipher Block Chaining
• CFB — Cipher Feedback
• OFB — Output Feedback
• CTR — Counter
• BC — Block Chaining
• PCBC — Propagating Cipher Block Chaining
• CBCC — Cipher Block Chaining with Checksum

At Wikipedia

At Wikipedia

At Wikipedia

At Wikipedia

At Wikipedia

At Wikipedia

### AES

At Wikipedia

AES is the new standard, replacing DES. It was the winner of the competition (in 2001), where is was submitted under the name Rijndael, beating out RC6, Serpent, MARS, and Twofish.

## Key Exchange

Diffie and Hellman (the 2015 Turing Award Winners) and their friend Merkle showed in 1976 that it was possible for two people to exchange a secret key without having to actually meet in secret:

• Alice picks a prime $n$ and sends this in the clear to Bob
• Alice picks a primitive root mod $n$ (how to find), called $g$, and sends this in the clear to Bob
• Alice picks a secret integer $a$, and sends $g^a \bmod n$ in the clear to Bob
• Bob picks a secret integer $b$, and sends $g^b \bmod n$ in the clear to Alice
• Alice computes ($g^b \bmod n)^a \bmod n$ and Bob computes ($g^a \bmod n)^b \bmod n$. This is the key! (It’s $g^{ab} \bmod n$)

This is probably secure, provided $n$ is very large and $\frac{n-1}{2}$ is also prime, because although Eve knows $g$, $n$, $g^a \bmod n$, and $g^b \bmod n$, there’s no known efficient way to get $a$ or $b$ from these. That’s the discrete logarithm problem, remember?

Example with small $n$:

• Alice picks $n=208799$ and $g=13$ and sends these to Bob
• Alice picks $a=152335$ and Bob picks $b=98113$
• Alice sends Bob $13^{152335} \bmod 208799 = 73033$
• Bob sends Alice $13^{98133} \bmod 208799 = 147540$
• Alice computes $147540^{152335} \bmod 208799 = 8435$
• Bob computes $73033^{98133} \bmod 208799 = 8435$
• The secret key is $8435$.
Do not actually do this with small values of $n$.

In general, unless you get some kind of certification, don’t try to secure any real-world systems on your own. But of course do go ahead and learn the algorithms and practice coding for now.

## Public Key Cryptography

Public key ciphers solve the key management nightmare of secret key ciphers, at the cost of speed. In a group of $n$ people one needs only $n$ public keys and $n$ private keys.

### RSA Cryptosystem

Diffie-Hellman doesn’t do encryption; it just exchanges a key. RSA can encrypt and decrypt. Here’s how. Each person

• Generates two large primes, $p$ and $q$.
• Chooses a value $e$ relatively prime to $(p-1)(q-1)$.
• Publishes his or her public key $(N,e)$ where $N = pq$.
• Computes $d$ = modular inverse of $e$ relative to $(p-1)(q-1)$, keeping it secret.
• Destroys $p$ and $q$.

Now check this out:

• For Alice to send a message $m$ to Bob, she sends $c = m^e \bmod N$.
• Bob decrypts this easily because $m = c^d \bmod N$.
Exercise: Research the mathematics behind RSA. Why does it work, in detail? Your answer will make use of theorems underlying Euler’s totient function; make sure you show how it reduces to $(p-1)(q-1)$ when $pq$ is prime, among other things.
Exercise: Diffie-Hellman (DH) is sometimes considered a part of public key cryptography, even though it deals with key exchange and is not itself an encryption algorithm. Why, then, do some people consider it public key? (The answer to this requires some research.)

For a trivial example, just so you can see the math in action, let’s do an RSA with 32-bit keys:

WARNING!

This example is for illustration only. Never implement your own crypto algorithm. Also make sure you understand how awful public key cryptography is with such tiny keys. Real keys should have thousands of bits.

First we generate two random 16-bit primes (16 because that’s half the key size, which is 32):

$p = 36469\\q = 50929$

Generate a 16-bit prime for the encrypt exponent (or just use 65537):

$e = 65537$

Now:

$n = pq = 1857329701 \\ d = \mathsf{modInverse}(e, (p-1)(q-1)) = 395695169$

Let’s encrypt the string ¿Dónde está ud.?. Here it is in UTF-8:

c2 bf 44 c3 b3 6e 64 65 20 65 73 74 c3 a1 20 75 64 2e 3f


We want to divide our message up into blocks. The recommended block size is $\lfloor \frac{k-1}{8} \rfloor$. Since we are doing RSA-32 (32-bit keys), we want $\lfloor \frac{31}{8} \rfloor$, or 3. So let’s group our bytes into blocks of three:

c2bf44 c3b36e 646520 657374 c3a120 75642e 3f0202


The 02 02 at the end is PKCS#7 padding to make the message a multiple of 3 bytes.

In decimal, our blocks are 12762948, 12825454, 6579488, 6648692, 12820768, 7693358, 4129282.

Now let’s apply the encrypt function to each::

$12762948^{65537} \bmod 1857329701 = 1674934738$
$12825454^{65537} \bmod 1857329701 = 920121142$
$6579488^{65537} \bmod 1857329701 = 703310795$
$6648692^{65537} \bmod 1857329701 = 1740932196$
$12820768^{65537} \bmod 1857329701 = 512101030$
$7693358^{65537} \bmod 1857329701 = 1327283085$
$4129282^{65537} \bmod 1857329701 = 1468977038$

That’s the ciphertext in decimal, namely 1674934738, 920121142, 703310795, 1740932196, 512101030, 1327283085, 1468977038

To decrypt:

$1674934738^{395695169} \bmod 1857329701 = 12762948$
$920121142^{395695169} \bmod 1857329701 = 12825454$
$703310795^{395695169} \bmod 1857329701 = 6579488$
$1740932196^{395695169} \bmod 1857329701 = 6648692$
$512101030^{395695169} \bmod 1857329701 = 12820768$
$1327283085^{395695169} \bmod 1857329701 = 7693358$
$1468977038^{395695169} \bmod 1857329701 = 4129282$

We get the original message back!

By the way

Since symmetric encryption is so much faster, you can first generate a secret key and transmit it over a line secured by public key cryptography. Now all future communication can use the secret key.
Exercise: Why would it be wrong to use a block size of 4 bytes in the example above?

## Cryptographic Hashing

A hash, a.k.a. fingerprint, checksum, message digest is a bit pattern (usually around 160 bits or so), generated from a message by a cryptographic hash function. For the hash to be secure, or cryptographic, it must be computationally infeasible to

• Find a message that hashes to a given value (onewayness)
• Find two messages that hash to the same value (collision-resistance)
Mathematically, a cryptographic hash function $H$ produces a hash from a message, $H(m) = c$, such that $m$ cannot be efficiently determined from $c$, and one cannot efficiently find $m_1 \neq m_2$ such that $H(m_1) = H(m_2)$,

Usually the change of just a single bit in the message will cause the digest to look completely and totally different.

$cat will This is my will. I leave 1000 dollars to Alice and everything else to Bob. Signed, Eve.$ md5sum will
c18feb890752c9e680c99d1e909fd761  will
$sed "s/1/9/g" will > Will$ cat Will
This is my will.
I leave 9000 dollars to Alice
and everything else to Bob.
Signed, Eve.

At Wikipedia

## Cryptanalysis

This is a large topic and won’t be covered here. Instead, here’s a list of techniques.

• Frequency Analysis
• Known plaintext attack
• Known ciphertext attack
• Chosen plaintext attack
• Chosen key attack
• Linear cryptanalysis
• Differential cryptanalysis
• Theft
• Bribery
• Blackmail
Exercise: Do some self-study on cryptanalysis or find a fun online course.

## Programming Examples

Heh, we are not going to show how to roll-your-own crypto. We are going to look at some actual exsiting libraries.

Are you transmitting data over an IP network?

Use TLS.

### JavaScript Examples

Built in module: Node crypto.

### Python Examples

Third-party: cryptography

### Java Examples

Reference: Java Cryptography Architecture (JCA)

## Security Best Practices for Crytpo

If you need to use crypto in an application

• Never ever ever ever ever ever ever ever ever roll your own
• Use a well tested library that is up-to-date, looked at by experts, and deemed to be safe for now
• Don't use an algorithm that has been cracked
• Don't use an algorithm that theoretically can be cracked
• Don't use an algorithm that has an exploitable flaw
• Use the right kind of algorithm for your needs, e.g., password hashing should be slow, so use something like bcrypt.
• Use key sizes that are long enough
• If storing hashed password, always use salt
• Uses initialization vectors and nonces rather than ignoring them
• Be humble: you’re probably not an expert in this

## Summary

We’ve covered:

• Background and Definitions
• Kinds of Ciphers
• Difference between secret key and public key cryptography
• Key exchange
• Hashing and digital signatures
• Cyrptanalysis
• Programming Examples