Values are the meaningful units of information.
Some values are atomic (meaning they cannot be decomposed); these include numbers, symbols, and characters. Nonatomic (decomposable) values include tuples, sequences, strings, records, sets, dictionaries, and references.
Let’s take a tour.
Numbers describe quantity, order, and measure. The simplest kinds of numbers are one-dimensional (scalar). One-dimensional numbers come in various forms, such as:
What?If you’re not familiar with these terms, or need a refresher, see these notes.
A numeral is a representation of a number. Many systems of numerals exist in various human cultures. The most widely used numeral systems are positional. Positional numerals can be written in binary, octal, decimal, hexadecimal, and perhaps in other bases. Some programming languages allow underscores in numerals to make them more readable. Some languages have lots of different sizes for numbers and some don’t.
Here are some examples taken from various programming languages:
21, 1_000_000, 3838021212885321800888128, 0x17F, 0b101111111, 0o577, 223u8, 32767i16, 0x7FFF_0000u32, 3i32, 55u32, 3i64, 3.14159, 3.14159f32, 88.3f64, 60.2e22, 1.602176634e-19, 3E+11, 9355E-22, 16#5FDE3.A33#e+80, 13#A339#.
Multidimensional numbers, like ratios, complex numbers, quaternions, and octonions, are generally defined with tuples or records, which we’ll cover later.
Symbols, also known as atoms, are indivisible things assigned a meaning by their creator. Most programming languages have the atoms true and false (they might be called True and False or T and F) to represent truth and falsity. Other common atoms are null, None, and nil (for the absence of information), and undefined (for the absence of knowledge—unknown, don’t care, or none-of-your-business). In many languages, you can create your own atoms, for example: left, right, up, down, red, green, blue, ready, sent, received.
Some languages require atoms to be prefixed with a colon, e.g., :sent. Sometimes you need an apostrophe as a prefix, e.g., 'sent. There are so many variations.
A character is a unit of textual information. A character has a name. Examples:
A grapheme is a minimally distinctive unit of writing in some writing system. It is what a person usually thinks of as a character. However, it may take more than one character to make up a grapheme. For example, the grapheme:
is made up of two characters (1) LATIN CAPITAL LETTER R and (2) COMBINING RING ABOVE. The grapheme:
is made up of two characters (1) TAMIL LETTER NA and (2) TAMIL VOWEL SIGN I. This grapheme:
is made up of two characters (1) BICYCLIST and (2) EMOJI MODIFIER FITZPATRICK TYPE-5. This grapheme:
is made up of four characters (1) SURFER, (2) EMOJI MODIFIER FITZPATRICK TYPE-1-2, (3) ZERO-WIDTH JOINER, (4) FEMALE SIGN. And this grapheme:
requires two characters: (1) REGIONAL INDICATOR SYMBOL LETTER C and (2) REGIONAL INDICATOR SYMBOL LETTER V. It’s the flag for Cape Verde (CV).
When characters are included into a character set, they are assigned a code point. In Unicode, a few hundred thousand characters have been mapped to code points already, and more get added from time to time. Traditionally, code points are written in hex (but they don’t have to be). Here are some examples:
25 PERCENT SIGN 2C COMMA 54 LATIN CAPITAL LETTER T 5D RIGHT SQUARE BRACKET B0 DEGREE SIGN C9 LATIN CAPITAL LETTER E WITH ACUTE 2AD LATIN LETTER BIDENTAL PERCUSSIVE 39B GREEK CAPITAL LETTER LAMDA 446 CYRILLIC SMALL LETTER TSE 543 ARMENIAN CAPITAL LETTER CHEH 5E6 HEBREW LETTER TSADI 635 ARABIC LETTER SAD 71D SYRIAC LETTER YUDH 784 THAANA LETTER BAA 94A DEVANAGARI VOWEL SIGN SHORT O 9D7 BENGALI AU LENGTH MARK BEF TAMIL DIGIT NINE D93 SINHALA LETTER AIYANNA F0A TIBETAN MARK BKA- SHOG YIG MGO 11C7 HANGUL JONGSEONG NIEUN-SIOS 1293 ETHIOPIC SYLLABLE NAA 13CB CHEROKEE LETTER QUV 2023 TRIANGULAR BULLET 20A4 LIRA SIGN 20B4 HRYVNIA SIGN 2105 CARE OF 213A ROTATED CAPITAL Q 21B7 CLOCKWISE TOP SEMICIRCLE ARROW 2226 NOT PARALLEL TO 2234 THEREFORE 2248 ALMOST EQUAL TO 265E BLACK CHESS KNIGHT 30FE KATAKANA VOICED ITERATION MARK 4A9D HAN CHARACTER LEATHER THONG WOUND AROUND THE HANDLE OF A SWORD 7734 HAN CHARACTER DAZZLED 99ED HAN CHARACTER TERRIFY, FRIGHTEN, SCARE, SHOCK AAB9 TAI VIET VOWEL UEA 1201F CUNEIFORM SIGN AK TIMES SHITA PLUS GISH 1D111 MUSICAL SYMBOL FERMATA BELOW 1D122 MUSICAL SYMBOL F CLEF 1F08E DOMINO TILE VERTICAL-06-01 1F001 SQUID 1F0CE PLAYING CARD KING OF DIAMONDS 1F382 BIRTHDAY CAKE 1F353 STRAWBERRY 1F4A9 PILE OF POO
When representing character values in a programming language, we are sometimes, but not always, able to use graphemes directly, but we can always use code points. To distinguish character values from symbols, apostrophes are generally required:
'A''\x41''\u0041''\U00000041''\u{41}'You will definitely prefer to use code points for “invisible characters” such as HAIR SPACE, EM SPACE, EN SPACE, FOUR-PER-EM SPACE, THIN SPACE, NO-BREAK SPACE, ZERO WIDTH SPACE, LEFT-TO-RIGHT MARK, RIGHT-TO-LEFT MARK, WORD JOINER, INVISIBLE TIMES, BACKSPACE, HORIZONTAL TABULATION, END OF LINE, FORM FEED, CARRIAGE RETURN, END OF TRANSMISSION BLOCK, ESCAPE, FILE SEPARATOR, GROUP SEPARATOR, etc. Otherwise people looking at your code will be really confused. Many languages provide alternatives for some characters, the most common are:
\n for \u{a} (LINE FEED, a.k.a. “newline”)\t for \u{9} (CHARACTER TABULATION, a.k.a. “tab”)\r for \u{d} (CARRIAGE RETURN)\0 for \u{0} (NULL)\b for \u{8} (BACKSPACE)\f for \u{c} (FORM FEED)\v for \u{b} (LINE TABULATION, a.k.a. “vertical tab”)\a for \u{7} (BELL, a.k.a., “alert”)Here are more extensive notes on characters, that even venture into how characters are encoded into bits for storage and transmission.
A tuple is a value that is a finite, ordered collection of values:
(3, true)(1, 5, 2)(9.3, (8, false, null), (true, 222), dog, 'c')()A sequence, also called a list, is a possibly infinite ordered collection of values. In practice, we usually think of lists as being collections of elements all “of the same type” but this is not strictly necessary. Conventionally, lists are delimited with square brackets while tuples use parentheses. Here are some lists:
[3, 5, 7, 11][(9, 3), (5.5, 8), (3, 0)][true, false, dog, 55.22e7, (2, true), [[(1,1)]]][0..] (infinite sequence)[]A string is a sequence of characters. Strings are used to represent text. Here are some examples:
"Hello, world! \u{263a}""1\t0\t0\n0\t1\t0\n0\t0\t1\n""∀α β. α⊥β ⇔ (α•β = 0)"""In some programming languages, characters and strings-of-one-character are indistinguishable. In others, they are completely distinct things which cannot be mixed up at all. This is interesting. It is one of the reasons why learning programming languages is both 😵💫 and 🤗.
Briefly, a record is a tuple whose components are named. Examples:
(name: "Rex", breed: "G-SHEP", colors: ["black", "tan"])(shape: circle, radius: 3, color: green, center: (2, 2))(name: "Jewel Loyd", team: "SEA", games: 38, points: 939)(id: 7, electric: false, launched: (year: 2021, month: 2, day: 3))Sometimes the delimiters are braces instead of parentheses. Sometimes the separator is an equal sign instead of a colon. There are many variations. Record components go by many names, including fields, slots, attributes, or members. There may be other names. The vocabulary can get pretty rich.
It’s sometimes nice to view records pictorially. Here’s one of the records we saw above:
Interestingly, lists, and even tuples, can be viewed as records, because they are ordered:
In most programming languages, lists and records are quite distinct. In JavaScript, they are blurred together into the concept of an “object”; in Lua, they are blurred together into the concept of a “table.” Regardless of what a programming language does, you should know and understand the language-independent concepts.
A set is an unordered collection of unique values. Examples:
{3, 5, 7, 11}{true, false, dog, 55.22e7, (2, true), {{(1,1)}}}{"Courtney Love", "Eric Erlandson", "Kristen Pfaff", "Patty Schemel"}{}{1, 2}, {2, 1}, and {1, 2, 1} are all the same set.
A dictionary, also known as a map, is an unordered collection of key-value pairs, designed for looking up the value associated with a given key, in which all of the keys are distinct. Here are some examples:
{CA => "Sacramento", HI => "Honolulu", NM => "Santa Fe"}{north => (0, -1), east => (1,0), south => (0, 1), west => (-1, 0)}{3 => 1, 4 => 2, 5 => 5, 6 => 8, 7 => 11}{"the" => 2088, "a" => 2115, "so" => 1022, "I" => 888}In practice, dictionary key types are often limited to symbols, numbers, and strings (we used all three above), but some programming languages allow additional kinds of keys, but still usually a restricted set of kinds. Typically, languages require keys to be “hashable”.
Dictionaries look like records, but the intent of a record is to represent a thing, while a dictionary represents a collection of things.
Here are two records:
Nice, but wait—do these two kids have the same pet, or two different pets that coincidentally happen to have the same name, breed, weight, and colors? The picture suggests two distinct pets. If the kids share the same pet, we’d want this picture:
This picture illustrates a new kind of value, called a reference. A reference value is pictured as an arrow that refers to another value, called its referent. Here is a referent referring to the string value "Hi!":
In some, but by no means all, languages, you make a reference to a value with &, for instance &"Hi!". Given a reference r, you get its referent with *r. These are by far the most common notations, but as always, beware, as many syntactic variations do exist.
It gets worse, though. Some programming languages make it hard to tell whether you even have a reference or not! Some languages implicitly create references for you and implicitly dereference them (that is, get the referent). It can get really confusing. That’s why you should really learn this stuff at a deep level. It is imperative that you understand how each language you program in handles references. We’ll have much more to say about these things later.
Something wickedHere is something just awful:
It is a reference with no referent. It is called the null reference, known also as The Billion Dollar Mistake, and The Worst Mistake of Computer Science. Avoid this demon 👹 at all costs. It is beyond disgusting. It has caused great pain 😖 and economic loss 💸. It should never, ever, have been allowed to exist 🤮😢.
A billion dollars was the estimate in 2009:
null has to be closer to a trillion dollar mistake at this point
— ThePrimeagen (@ThePrimeagen) June 23, 2023
Did you notice that when looking at values just now, we couldn’t help but classify them into numbers, characters, symbols, sequences, records, etc? The classification was very informal, but...it was hard to miss. There is, though, a very rigorous notion behind this classification. Every value, it turns out, has a type.
Types are one of the most important concepts in Programming Language Theory. They feel obvious and simple, but the theory of types is so vast and so fundamental to computing and programming languages, and the theory so deep, we’ll be covering them later.
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
'c', 'x63', '\u0063', '\U00000063', '\u{63}'&5We’ve covered: