cpp-logo.png

Introduction to C++

C++ is an interesting language — a systems language with a large number of features to make it useful in higher-level applications.

Why C++

“C++ was designed so that the author and his friends would not have to program in Assembler, C, or various modern high-level languages. Its main purpose is to make writing good programs easier and more pleasant for the individual programmer.” — Bjarne Stroustrup

Overview

C++

Hello, World

Our obligatory first program:

hello.cpp
#include <iostream>

int main() {
  std::cout << "Hello, World\n";
}

To compile, just enter:

$ g++ -std=c++14 hello.cpp

This produces a.exe on Windows systems and a.out on most other systems. While developing small programs and running through your program-and-run cycle, use (on non-Windows shells):

$ g++ -std=c++14 hello.cpp && ./a.out
Hello, World

More Examples

Let’s get a feel for what C++ programs “look like.” This program prompts you for an integer then displays it with roman numerals:

roman.cpp
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include <stdexcept>

using namespace std;

string to_roman(int n) {
  if (n <= 0) {
    throw out_of_range("Romans don't have negatives");
  } else if (n > 1E6) {
    throw out_of_range("Number is too big to bother with");
  }

  vector<pair<int, string>> mapping = {
    {1000, "M"}, {900, "CM"}, {500, "D"}, {400, "CD"}, {100, "C"}, {90, "XC"},
    {50, "L"}, {40, "XL"}, {10, "X"}, {9, "IX"}, {5, "V"}, {4, "IV"}, {1, "I"}
  };

  string numeral;                 // initialized to empty string
  for (auto p: mapping) {         // auto nicer than pair<int, string>
    while (n >= p.first) {
      numeral += p.second;        // Yes, strings are mutable and += concats
      n -= p.first;
    }
  }
  return numeral;
}

int main() {
  int n;
  cout << "Enter an integer:\n";
  cin >> n;
  try {
    cout << n << " is " << to_roman(n) << '\n';
  } catch (out_of_range e) {
    cout << "unprocessable: " << e.what() << '\n';
    return -1;
  }
}

Compile and run:

$ g++ -std=c++14 roman.cpp && ./a.out
Enter an integer:
9489
9489 is MMMMMMMMMCDLXXXIX

Let’s print some prime numbers:

primes.cpp
// This program displays all the prime numbers up to and including 1000,
// using the famous algorithm of Erathostenes.

#include <iostream>
#include <iomanip>
#include <vector>

using namespace std;

// The sieve is an array of 1001 bool values, indexed from 0 to 1000, all
// initialized to the value true.
vector<bool> sieve(1001, true);

// Write the value false in each slot of the vector corresponding to
// a composite number.  First, 0 and 1 are composite by definition. Then
// for each value starting with 2, if the value is still thought to be
// prime, we write false in each slot corresponding to its multiples.
void check_off_composites(vector<bool>& s) {
  s[0] = false;
  s[1] = false;
  for (auto i = 2; i * i < s.size(); i++) {
    if (s[i]) {
      for (auto j = i + i; j < s.size(); j += i) {
        s[j] = false;
      }
    }
  }
}

// Writes out all the values which correspond to positions in a vector
// containing the value true. Each value is written to standard output in
// a field of 8 characters.
void display_true_indices(const vector<bool>& s) {
  for (auto i = 0; i < s.size(); i++) {
    if (s[i]) {
      cout << setw(8) << i;
    }
  }
  cout << '\n';
}

int main() {
  check_off_composites(sieve);
  display_true_indices(sieve);
}

Here’s the useful map class from the Standard Library:

colors.cpp
// This program asks the user how to say some primary colors in Spanish
// then checks the user's answers to see if they are correct.

#include <iostream>
#include <string>
#include <map>

using namespace std;

// We need a dictionary to store the word mappings. The English word will
// be the key and the Spanish word will be the value.
map<string, string> dictionary = {
  {"blue", "azul"}, {"red", "rojo"}, {"green", "verde"}
};

// Returns the lower case version of a given string. Don't actually use this
// method; it's only here to illustrate that you CAN iterate through characters
// of a string if you want to.
string lower_case_string(string s) {
  string lower_cased;
  for (char c: s) {
    lower_cased += tolower(c);
  }
  return lower_cased;
}

// Asks the question and writes whether the answer was correct or not.
void test_user(string english_word) {
  string answer;
  cout << "How do you say " << english_word << " in Spanish?\n";
  cin >> answer;
  cout << "\"" << answer << "\" is ";
  if (lower_case_string(answer) == dictionary[english_word]) {
    cout << "correct\n";
  } else {
    cout << "is wrong; it's " << dictionary[english_word] << "\n";
  }
}

int main() {
  // Here is an illustration of a reference so we don't copy the key-value pair.
  for (auto& pair: dictionary) {
    test_user(pair.first);
  }
}

Files and streams and command line arguments:

average.cpp
// This program displays the average of a bunch of floating point numbers.
// If no commandline arguments are given, it prompts the user for a file
// name and averages the numbers from that file. If exactly one command line
// argument is given, it takes the argument to be a file name and averages
// the numbers in that file. If more than one command line argument is
// given, it assumes they are all numbers and displays the average of them.
//
// Files of numbers must contain ONLY numbers (and whitespace), if any other
// character appears the program will fail with an error message. When
// averaging command line arguments, each argument must be a legal floating
// point number without trailing whitespace.
//
// The program returns 0 on success, and -1 on failure.

#include <fstream>
#include <iostream>
#include <string>
#include <sstream>

using namespace std;

// Returns the average of the numbers in the file named by filename. Assumes
// the file contains ONLY numbers (and whitespace). If it finds any other
// characters it throws an exception.
double average_from_file(string filename) {

  // Open the file if you can. If you can't, throw a string stating that
  // the file that could not be opened.
  ifstream file;
  file.open(filename.c_str());
  if (!file) {
    throw string("File '" + filename + "' not found");
  }

  // Accumulate the sum and count the values throughout the file.
  // Immediately throw if any problems are found.
  double sum = 0.0;
  int number_of_values = 0;
  double value;
  while (true) {
    file >> value;                      // get the next float value
    if (file.bad()) {                   // always check for trashed files
      throw string("Corrupted file");
    }
    if (file.fail()) {                  // couldn't read a float
      if (file.eof()) {                 // if it's because you're at eof
        break;                          // ... that's actually fine
      }
      throw string("Garbage in file");
    }
    sum += value;
    number_of_values++;
  }

  // Return the answer if you can compute it, otherwise throw.
  if (number_of_values == 0) {
    throw string("Nothing to average");
  }
  return sum / number_of_values;

  // Note there is no need to explicitly close the ifstream in C++ because of
  // something called RAII. The destructor for f will automatically be called
  // at the end of this function, and the destructor closes the stream.
}

// Computes the average of the C-strings in positions 1 .. argc-1 in the array
// argv interpreted as floats.
double average_of_command_line_arguments(int argc, char** argv) {
  double sum = 0.0;
  for (auto i = 1; i < argc; i++) {
    istringstream argument(argv[i]);
    double value;
    argument >> value;

    // Ensure it is a legal float without trailing characters.
    if (argument.bad() || argument.fail() || !argument.eof()) {
      throw string("Non-numeric command line argument ") + argv[i];
    }
    sum += value;
  }
  return sum / (argc - 1);
}

// The main function interprets the command line, defers to one of the helper
// routines to compute the average, then displays either the average or any
// error messages that were thrown.
int main(int argc, char** argv) {
  double average;
  try {
    if (argc == 1) {
      string filename;
      cout << "What file do you want to average the values in? ";
      cin >> filename;
      average = average_from_file(filename);
    } else if (argc == 2) {
      average = average_from_file(argv[1]);
    } else {
      average = average_of_command_line_arguments(argc, argv);
    }
    cout << "The average is " << average << '\n';
  } catch (string s) {
    cerr << "Error: " << s << "\n";
    return -1;
  }
}

This example uses a lambda:

countlowercases.cpp
#include <iostream>
#include <algorithm>
using namespace std;

int main(int argc, char** argv) {
  if (argc != 2) {
    cerr << "This program requires exactly one command line argument\n";
    return -1;
  }
  char* word = argv[1];
  int lowercase_count = 0;
  for_each(word, word + strlen(word), [&lowercase_count] (char c) {
    if (islower(c)) lowercase_count++;
  });
  cout << lowercase_count << " lowercase letters in: " << word << endl;
}

Declarations and Definitions

A C++ program resides in a set of files. Each file is a sequence of top-level declarations. A declaration introduces a name into a program. A declaration can be definitional or non-definitional. A definition actually creates a thing. Here are examples of definitional declarations:

definitions.cpp
// Examples of C++ declarations that are ALSO definitions.

#include <cassert>

// Object declarations + definitions
int x = 100;
int y;

// Function declaration + definition
double triple(double x) {
  return x * 3;
}

// Type declaration + definition
struct Point {
  double x;
  double y;
};

// Namespace declaration + definition
namespace Example {
  int x = 200;
  Point p {3, 5};
  Point q;
}

int main() {
  assert(x == 100 && y == 0 && Example::x == 200);
  assert(triple(2.5) == 7.5);
  assert(Example::p.x == 3 && Example::p.y == 5);
  assert(Example::q.x == 0 && Example::q.y == 0);
}

There are only five kinds of non-definitional declarations: which can be (1) type declarations, (2) object declarations, (3) function declarations, (4) namespace declarations, or (5) directives.

DeclarationExampleExplanation
Incomplete Class
class Dog;
This says a class called Dog exists, and you can use it to declare variables of type Dog* for instance. The class has to be eventually defined somewhere else in the program, of course.
External Variable
extern int x;
This says a variable x is defined elsewhere. You can start using it now though. It must be eventually defined and available at run time. There can only be one global variable named x. If it is used in multiple compilation units, all but one must have an extern declaration.
Function Prototype
double mag(Vector v);
This specifies how the function mag is called and what it returns. It has to be defined elsewhere, of course.
Typedef
typedef struct {
  double x;
  double y;
} point;
Simply means that the word point is an abbreviation for the type expression.
Static Data Member
class C {
  static int x;
};
Somewhat surprisingly, the static x member must be defined outside of the class.

Entities need to be declared before they are used. (That’s why the #include directive is so important). Every top level name must have exactly one definition, so you can’t duplicate a global variable definition, and you can’t declare something extern in one file and forget to supply it in another file when the program is linked.

Identifiers have:

Details here.

Types

There are a lot (More information):

Built-in
bool,
char, unsigned char, signed char, wchar_t, char16_t, char32_t,
short int, signed short int, unsigned short int,
int, signed int, unsigned int,
long int, signed long int, unsinged long int,
float, double, long double, long long double
Pseudo type
void
Type formers
enum
*                      (plain pointer)
[ ]                    (array)
&                      (reference - not really a new type)
( )                    (function)
[](){}                 (lambda function)
union
struct
class

Basic Types

Nothing much surprising here, save for a few C++ syntax quirks to learn by example:

basictypes.cpp
#include <cassert>

int main() {
  bool young = (1 == 2) || true && false;

  char c = '%';                     // Old initialization syntax
  char d('#');                      // Modern initialization syntax
  char e {'$'};                     // Hypermodern initialization syntax

  int x = 50 + int(c);              // Type conversion
  long y = 999999999999999999;
  double z(353.13411);
  short q;                          // Initializer not required

  char16_t thorn = u'Þ';            // L prefix required, codepoint > 127
  char16_t infinity = u'∞';         // Here too
  char32_t winky = U'😉';           // And here

  assert(young == false);
  assert(c == '%' && d == '#');
  assert(q == 0);                   // If no initializer, zero value is used
  assert(double(x) == 87);
  assert(thorn == 0xde && infinity == 0x221e && winky == 0x1f609);
}

The sizes of the types, other than for char, which is defined to be represented in one byte, are not guaranteed. You can write code to find the type range, though, making use of definitions in the climits module. If you like, you can use some fixed-size types from the stdint module, too:

numerictypes.cpp
#include <cstdint>
#include <climits>
#include <cassert>
#include <iostream>

using namespace std;

#define show_type_range(t, min, max)\
    cout << #t << " (" << sizeof(t) << " bytes): " << min << ".." << max << '\n';

int main() {
  show_type_range(signed char, SCHAR_MIN, SCHAR_MAX)
  show_type_range(unsigned char, 0, UCHAR_MAX)
  show_type_range(char, CHAR_MIN, CHAR_MAX)
  show_type_range(short, SHRT_MIN, SHRT_MAX)
  show_type_range(unsigned short, 0, USHRT_MAX)
  show_type_range(int, INT_MIN, INT_MAX)
  show_type_range(unsigned int, 0, UINT_MAX)
  show_type_range(long, LONG_MIN, LONG_MAX)
  show_type_range(unsigned long, 0, ULONG_MAX)
  show_type_range(long long, LLONG_MIN, LLONG_MAX)
  show_type_range(unsigned long long, 0, ULLONG_MAX)

  show_type_range(int8_t, INT8_MIN, INT8_MAX)
  show_type_range(uint8_t, 0, UINT8_MAX)
  show_type_range(int16_t, INT16_MIN, INT16_MAX)
  show_type_range(uint16_t, 0, UINT16_MAX)
  show_type_range(int32_t, INT32_MIN, INT32_MAX)
  show_type_range(uint32_t, 0, UINT32_MAX)
  show_type_range(int64_t, INT64_MIN, INT64_MAX)
  show_type_range(uint64_t, 0, UINT64_MAX)
  show_type_range(intmax_t, INTMAX_MIN, INTMAX_MAX)
  show_type_range(uintmax_t, 0, UINTMAX_MAX)
}

That code used a macro. Woooooaaaahhh. On my MacBook Pro, I got:

$ g++ -std=c++14 numerictypes.cpp && ./a.out
signed char (1 bytes): -128..127
unsigned char (1 bytes): 0..255
char (1 bytes): -128..127
short (2 bytes): -32768..32767
unsigned short (2 bytes): 0..65535
int (4 bytes): -2147483648..2147483647
unsigned int (4 bytes): 0..4294967295
long (8 bytes): -9223372036854775808..9223372036854775807
unsigned long (8 bytes): 0..18446744073709551615
long long (8 bytes): -9223372036854775808..9223372036854775807
unsigned long long (8 bytes): 0..18446744073709551615
int8_t (1 bytes): -128..127
uint8_t (1 bytes): 0..255
int16_t (2 bytes): -32768..32767
uint16_t (2 bytes): 0..65535
int32_t (4 bytes): -2147483648..2147483647
uint32_t (4 bytes): 0..4294967295
int64_t (8 bytes): -9223372036854775808..9223372036854775807
uint64_t (8 bytes): 0..18446744073709551615
intmax_t (8 bytes): -9223372036854775808..9223372036854775807
uintmax_t (8 bytes): 0..18446744073709551615

Structs

Lots to see here. IMPORTANT: structs, like bools and numbers, have value semantics, meaning that on assignment, passing to functions, and returning from functions, copies are made!

structs.cpp
#include <cassert>

struct Point {
  double x;
  double y;
};

void demo_struct_assignment_is_a_copy() {
  Point p {5, 8};                 // Allocate whole struct here in stack
  Point q;                        // Allocated as {0, 0}
  q = p;                          // Assignment makes a COPY
  q.x = 34;                       // So if we change a field of q...
  assert(p.x == 5);               // ...nothing changed in p
}

void some_function(Point p, double dx, double dy) {
  p.x += dx;                      // p is a local variable whose fields were...
  p.y += dy;                      // ...copied, so these change the local only
}

void demo_structs_are_passed_by_value() {
  Point myPoint {13, 55};
  some_function(myPoint, 10, 20); // Because a copy is made on passing...
  assert(myPoint.x == 13);        // ...local point is unchanged
}

int main() {
  demo_struct_assignment_is_a_copy();
  demo_structs_are_passed_by_value();
}

Are there ways to prevent these copies? Yep, you can use references or pointers.

References

References are strange beasts. Basically a reference is an alias. A reference is not a new object; rather, it’s just another name for an existing object. Therefore, you can use it to write a function to modify its argument (if you are into that kind of thing 😨), or pass a (possibly massive) struct without making an expensive copy:

references.cpp
#include <cassert>

void demo_simple_references() {
  int x;                          // Typical variable
  int& y = x;                     // x and y alias the exact same memory
  y = 100;                        // So this "changes x also"
  assert(x == 100);               // And here's the proof
  x = 5;                          // Works the "other way around" of course
  assert(y == 5);                 // No real magic, x and y are one
}

void f(int x, int& y) {
  x = 100;                        // arg was copied to x, so arg is safe
  y = 200;                        // y aliases it arg, so arg will change
}

void demo_reference_parameter_changes_its_argument() {
  int a = 1, b = 2;
  f(a, b);
  assert(a == 1);                 // Unchanged, copy of a was passed
  assert(b == 200);               // Changed, b and the parameter are one
}

// References matter a LOT in the world of huge objects
struct Thing { int x; double y[1000000]; int z[5000]; };

void g(const Thing& thing) {      // Reference parameter avoids the copy
  assert(thing.y[500] == 0);
}

void demo_pass_a_huge_struct_by_reference() {
  Thing t;                        // Tons of memory allocated here
  g(t);                           // No copy made, yay
}

int main() {
  demo_simple_references();
  demo_reference_parameter_changes_its_argument();
  demo_pass_a_huge_struct_by_reference();
}

References come in two flavors, & and &&. The latter is for rvalues only; the former will take either lvalues or rvalues. If overloaded, the rvalue-only one takes precedence for rvalues:

referencesinparameters.cpp
#include <cassert>

// f has only the lvalue version defined
int f(const int& x) { return 10; }

// g had both lvalue and rvalue versions defined
int g(const int& x) { return 20; }
int g(const int&& x) { return 30; }

// h has only the rvalue version defined
int h(const int&& x) { return 40; }

int main() {
  int x = 42;

  assert(f(x) == 10);             // ok, x is an lvalue
  assert(f(42) == 10);            // ok, rvalue can be passed to &
  assert(g(x) == 20);             // ok, lvalue prefers &
  assert(g(42) == 30);            // ok, rvalue prefers &&
  // assert(h(x) == 40);          // won't compile, cannot pass lvalue to &&
  assert(h(42) == 40);            // ok, 42 is an rvalue
}

We’ll see what the big deal is soon.

Pointers

To understand C++ you must master the concept of static storage, stack storage, and heap storage.

Raw Pointers

First, we’ll go over this example, featuring the built-in “raw” pointers, in class in excruciating detail:

raw_pointers.cpp
#include <iostream>
#include <cassert>
using namespace std;

void demo_pointer_basics() {
  int x = 10;
  int* p = &x;                    // p "points to" x
  *p = 7;                         // this changes x
  assert(x == 7);                 // we say *p and x are aliases of each other

  p = nullptr;                    // p pointes nowhere, *p will probably crash!

  p = new int;                    // allocate memory at run time
  *p = 12;                        // write to allocated memory
  int* q = p;                     // p and q point to same place
  *q = 10;                        // changes *p, too
  assert(p == q && *p == 10);

  q = new int {20};               // allocate and initialize at same time
  assert(p != q && *q == 20);     // p and q no longer point to same place
  *q = 10;
  assert(p != q && *p == *q);     // obviously p was not affected

  delete p;                       // the pointer vars were local, must clean up
  delete q;

  // If we didn't free up p and q we would have a memory leak
}

void f(int a, int* b, int* c) {
  a = 101;                        // does not affect the argument
  *b = 102;                       // this change can be seen through the argument
  c = &a;                         // no visible effect at all from the outside
}

void demo_pointer_arguments() {
  int x = 1;
  int y = 2;
  int* p = new int {50};
  f(x, &y, p);
  assert(x == 1);                 // unchanged
  assert(y == 102);               // visible change
  assert(*p == 50);               // neither p nor *p affected
}

int* pointer_disaster() {
  // Horrible C or C++ coding here: returning a pointer to a local (NOOOOO!)
  int x = 8;
  int* p = &x;
  return p;
}

void demo_pointer_disaster() {
  int* z = pointer_disaster();    // Shudder, awful, ...
  cout << *z << '\n';             // this might be 8, but you NEVER KNOW
  demo_pointer_arguments();       // overwrites pointer_disaster stack frame! Yow!
  cout << *z << '\n';             // almost certainly not 8
}

void demo_pointer_to_struct() {
  struct Point {double x; double y;};
  Point* p = new Point {5, 8};
  assert((*p).x == 5);            // technically okay but ugly
  assert(p->x == 5);              // use this beautiful operator instead
  delete p;                       // Important! *p is on the heap!
  Point q {1, 2};
  assert(q.x == 1 && q.y == 2);   // No need to delete! q on stack, not heap
}

int main() {
  demo_pointer_basics();
  demo_pointer_arguments();
  demo_pointer_disaster();
  demo_pointer_to_struct();
}

Here’s a handy picture for what we’ve seen so far:

pointer.png

If the pointer
is called
Then the referent
is called
and the field
is called
p*p(*p).x or p->x
If the referent
is called
Then the pointer
is called
and the field
is called
p&pp.x

Pointers as Members

Wait wait wait wait wait. What if a member of a struct is a pointer? At least three huge problems surface that we need to be aware of:

  1. If the struct goes away (because, say, its enclosing function ends), we have to remember to deallocate its pointer member’s referents otherwise we’d have a memory leak.
  2. When the struct is copied (via initialization, argument passing, function return, or assignment), only the pointer (address), not the referent, is copied. This may result in unintended sharing of state!. Ideally, we must either (1) prohibit copying and assignment, or (2) cause copying and assignment to (magically) make copies of the referents behind the scenes.
  3. If we do arrange for copying to happen automatically, we might introduce efficiency problems! Returning a value and immediately assigning it, for example, would generate an extra copy. So would swapping. In some cases, it is better to move than copy.
Copy
When copying source to target, copies are made, that is, new resources are created, of all of the source’s internal resources and these new resources are assigned to target. The source is unchanged.
Move
When moving source to target, the target can simply pilfer the resources of the source, leaving the source empty. This is done by just setting the pointers inside target to the resources owned by source, then unhooking source’s pointers. Moves are faster because nothing new is created! (IRL moves are common: money is moved between accounts, batteries are moved between devices, physical things that are gifted or borrowed are moved between people.)

Let’s first look at the built-in, low-level mechanisms for dealing with these situations.

Constructors, Destructors, and Assignment

In either case, the existence of pointers is hidden. In the latter case, you have made the entire object look like it has value semantics! To pull the latter scenario off, you need to:

And furthermore, to be a good citizen, you should also:

Learn by example:

managedstructs.cpp
#include <cassert>
#include <iostream>

// This script is for exploration and study; here are some trackers that help.
int destructors_called = 0;
int copy_constructors_called = 0;
int move_constructors_called = 0;

enum Color {RED, GREEN, BLUE};

// A box has a color, and optionally, some contents. Since the box may be empty,
// we represent the contents by storing a pointer in the box representation. A
// null pointer indicates an empty box.
template<typename T>
struct Box {
private:
  Color color;
  T* content;
public:
  // Constructors
  Box(Color color = RED): color(color), content(nullptr) {}
  Box(Color color, T data): color(color), content(new T(data)) {}

  // Copy constructor - note how it makes a new content object. Called when
  // initializing with another box, passing by value, or returning.
  Box(const Box& other): color(other.color), content(new T(*other.content)) {
    copy_constructors_called++;
  }

  // Move constructor - steals ownership of the resource from the other box
  Box(Box&& other): color(other.color), content(other.content) {
    other.content = nullptr;
    move_constructors_called++;
  }

  // Destructor: automatically called when a box's lifetime ends. This could
  // be when a box is a local variable and its enclosing function ends, or the
  // box is in dynamic memory that is being deallocated.
  ~Box() {
    destructors_called++;
    delete content;
  }

  // Copy assignment operator. Assignment differs from construction because in
  // assignment, an object already exists, which we have to clean up first.
  Box& operator=(const Box& other) {
    if (&other == this) return *this;
    delete content;                         // Avoid memory leak
    color = other.color;
    content = other.content ? new T(*other.content) : nullptr;
    return *this;
  }

  // Move assignment operator.
  Box& operator=(Box&& other) {
    if (&other == this) return *this;
    delete content;                         // Avoid memory leak
    color = other.color;
    content = other.content;
    other.content = nullptr;
    return *this;
  }

  // Since we are creating a box that has value semantics, here’s an equality
  // operator that works “by value”
  bool operator==(const Box& other) {
    return color == other.color && (
      (content == nullptr && other.content == nullptr) ||
      (content && other.content && *content == *other.content)
    );
  }

  Color get_color() {return color;}
  bool is_empty() {return content == nullptr;}
  T get_content(T default_value) {return content ? *content : default_value;}
  void set_content(T data) {if (content) *content = data; else content = new T(data);}
  void remove_content() {content = nullptr;}
};

void demo_destructors_called() {
  Box<double> b1;
  Box<double> b2(GREEN);
  Box<double> b3(RED, 23.0);
  // Three destructors automatically called here, no memory leaks
}

void demo_copy_constructor_on_initialization() {
  Box<double> b4(GREEN, 55.0);            // local, defined in stack frame
  Box<double> b5(b4);                     // initializes b5 with copy of b4
  b5.set_content(88.0);                   // This copy should not affect the original
  assert(b4.get_content(0) == 55.0);      // Oh good
}

Box<int> f(Box<int> b) {
  assert(copy_constructors_called == 2);  // Copy constructor called when passing in
  b.set_content(20);
  return b;
}

void demo_constructors_on_calls_and_returns() {
  Box<int> b6(GREEN, 925.0);
  assert(copy_constructors_called == 1);  // Just the one from before
  Box<int> b7 = f(b6);                    // Move constructor on the return
  assert(copy_constructors_called == 2);
  assert(move_constructors_called == 1);
  assert(b7.get_content(0) == 20);
}

int main() {
  Box<double> b0(BLUE, 1.0);              // Just a regular costructor
  assert(copy_constructors_called == 0);  // We didn't copy
  assert(destructors_called == 0);        // And it lives as long as main does

  demo_destructors_called();              // This function will create 3 local boxes
  assert(copy_constructors_called == 0);  // without making any copies
  assert(destructors_called == 3);        // They all got destroyed at end of function

  demo_copy_constructor_on_initialization();
  assert(copy_constructors_called == 1);
  assert(destructors_called == 5);

  demo_constructors_on_calls_and_returns();
}

Just Saying No To Copies

Sometimes, instead of copying, you want to prohibit copying. In this case, you would mark both the copy costructor and the assignment operator deleted:

managedstructswithoutcopy.cpp
#include <cassert>

enum Color {RED, GREEN, BLUE};

template<typename T>
struct Box {
private:
  Color color;
  T* content;
public:
  Box(Color color): color(color), content(nullptr) {}
  Box(Color color, T data): color(color), content(new T(data)) {}
  ~Box() { delete content; }
  Box(const Box&) = delete;
  Box& operator=(const Box&) = delete;

  Color get_color() {return color;}
  bool is_empty() {return content == nullptr;}
  T get_or_default(T default_value) {return content ? *content : default_value;}
  void add_or_replace_content(T data) {if (content) *content = data; else content = new T(data);}
  void remove_content() {content = nullptr;}
};

int main() {
  Box<double> box1(GREEN, 21.0);
  Box<double> box2(BLUE);
  // Box<double> box3(box1);  //  ------> Will not compile, constructor deleted
  // box1 = box2;             //  ------> Will not compile, assignment operator is deleted
}
Exercise: We prohibited copying of boxes with this code. But can these boxes be moved?

Smart Pointers

It’s a hell of a lot of work to track pointers in structs ourselves. So error prone, too! We might forget to delete or override one or more of the copy constructors, move constructors, or assignment operator! We might even forget the destructor and leak memory all over the place. Shouldn’t a resource be able to deallocate itself when it’s unreachable?

Yep. There are two kinds of smart pointers:, unique_ptr and shared_ptr.

unique_ptr
When a unique pointer can’t be referenced anymore, its associated referent will be deallocated. Only one unique_ptr can point to a resource at a time, so it is a compile-time error to copy a unique_ptr. They can be moved, though.
shared_ptr
When the “last” shared_ptr referencing a resource can no longer be accessed, the resource will be deallocated.
smartpointers.cpp
#include <cassert>
#include <memory>
using namespace std;

int destructors_called = 0;

struct Box {
  ~Box() { destructors_called++; }
};

void unique_pointer_demo() {
  unique_ptr<Box> p(new Box);
  // unique_ptr<Box> q(p);            // ------> Compile-time error,
}                                     //  no unique_ptr copy constructor

void shared_pointer_demo() {
  shared_ptr<Box> p(new Box);
  shared_ptr<Box> q(p);               // q & p both reference first box
  if (true) {
    shared_ptr<Box> r(new Box);       // second box
    shared_ptr<Box> s(q);             // now three pointers to first box
  }                                   // second box deallocated here
  assert(destructors_called == 2);    // but irst box Not deleted yet
}

int main() {
  unique_pointer_demo();
  assert(destructors_called == 1);
  shared_pointer_demo();
  assert(destructors_called == 3);
}

Arrays

It takes some effort to learn exactly what is meant by an array in C++. The term is messy because C++ inherited this array thing from C, then provided fancier classes in its standard library.

C-style Arrays

C++ inherited “raw arrays” from the C language:

arrays.cpp
// Illustration of arrays. NOTE: Arrays in C++ are low-level objects and
// should only be used in low-level code, such as when implementing useful
// data structures or which maps to the hardware. They are tricky and do not
// behave like the other types. But you have to learn about them as everyone
// uses them and you have to read other people's code, and even write some
// low-level code yourself. In general, you will use vector and valarray
// from the standard library for application-level code.

#include <cassert>
#include <algorithm>              // for copy and fill

int a[10];                        // an array of 10 ints (global)
typedef int triple[3];            // a type declaration, NOT an object
triple t1, t2;                    // two more global arrays, each {0,0,0}
triple ones = {1, 1, 1};          // global array, explicitly initialized

void demo_local_arrays() {
  // Space for a and b allocated in the stack frame
  int a[50];
  int b[20][10];
  assert(b[2][5] == 0);
  // At the end of this function a and b are automatically gone
}

void demo_dynamically_allocated_arrays() {
  int* a = new int[5]{2, 3, 5, 7, 11};
  assert(a[3] == 7);              // Surprise!! (*a)[2] is actually wrong!
  assert(*a == 2);                // So arrays are like...pointers?
  assert(*(a + 3) == 7);          // Pointer arithmetic, amazing!
  assert(*(3 + a) == 7);          // And it is commutative, too
  assert(3[a] == 7);              // Surprised?
  delete[] a;                     // Must do this to free up the 'new' memory
}

void demo_global_and_local_arrays_are_like_pointers_too() {
  a[7] = 20;
  assert(a[7] == 20);
  assert(*(a + 7) == 20);
  assert(*(7 + a) == 20);
  assert(7[a] == 20);
  bool b[] = {true, false, false, true};
  assert(*(b + 2) == false);
}

void demo_array_assignment() {
  // Arrays are not assignable
  triple a = {10, 20, 30};
  triple b = {1, 2, 3};
  // a = b; ---> Compiler error: "array type is not assignable"

  // But playing with pointers can get you into trouble
  int* p = a;
  p[2] = 100;
  assert(a[2] == 100);            // Well, ugh, perhaps.
  const triple c = {1, 2, 3};
  // p = c; ---> Compiler error: assigning to 'int *' from incompatible type 'const triple'

  // There is a way to make an array copy though, using the copy function,
  // however, this only works if you know the array's length. You better
  // know the length because this will not be checked for you.
  std::copy(b, b + 3, a);
  assert(a[0] == 1 && a[1] == 2 && a[2] == 3);
}

// Hmmm, what do you think this does?
void move(triple t, int dx, int dy, int dz) {
  t[0] += dx;
  t[1] += dy;
  t[2] += dz;
}

void demo_arrays_act_like_pointers_when_passed() {
  triple c = {1, 2, 3};
  move(c, 100, 100, 100);
  assert(c[0] == 101 && c[1] == 102 && c[2] == 103);
  // Why did c change even though the first parameter to move was not a
  // reference parameter?  Because the array name is actually treated
  // as a pointer to the first element of the array!

  // Here's something interesting:
  std::fill(a, a + 10, 100);
  move(a, 10, 10, 10);
  assert(a[0] == 110 && a[1] == 110 && a[2] == 110);
  move(&a[7], 10, 20, 30);
  assert(a[7] == 110 && a[8] == 120 && a[9] == 130);
}

int main() {
  // Show that globals are initialized to zero-values
  assert(a[9] == 0);
  assert(ones[1] == 1);

  demo_local_arrays();
  demo_dynamically_allocated_arrays();
  demo_global_and_local_arrays_are_like_pointers_too();
  demo_array_assignment();
  demo_arrays_act_like_pointers_when_passed();
}

Arrays in the Standard Library

It’s super important to understand the raw arrays in order to feel close to the machine. In practice, you will more often use sequence classes from the Standard Library that are built on top of the raw arrays. All of these classes represent efficiently indexable mutable sequences that occupy contiguous memory blocks:

stlarrays.cpp
#include <cassert>
#include <array>
#include <vector>
#include <valarray>
using namespace std;

void demo_arrays() {
  array<int, 2> a = { 5, 8 };
  array<int, 2> b = { 5, 9 };
  array<int, 5> c = { 1, 2, 55, 2, 8 };
  assert(b[1] == 9);
  assert(a < b && b >= a && a != b);        // comparison of same-sized arrays
  b.fill(2);                                // 2 2
  assert(b[0] == 2 && b[1] == 2);
  b = a;                                    // assignment is a copy
  assert(&a != &b && a == b);               // prove that it's a copy
}

void demo_vectors() {
  vector<double> v = { 3.0, 5.0, 8.0 };
  v.push_back(13.0);                        // 3 5 8 13
  assert(v.size() == 4);
  assert(v[2] == 8.0);
  assert(v.front() == 3.0);
  assert(v.back() == 13.0);
  v.insert(v.begin() + 2, 144.0);           // 3 5 144 8 13
  v.pop_back();                             // 3 5 144 8
  vector<double> w = {3.0, 5.0, 144.0, 8.0};
  assert(&v != &w);                         // do not point to same place
  assert(v == w);                           // equality is by value
  w.pop_back();                             // 3 5 144
  v = w;                                    // assignment COPIES VALUES
  assert(v.size() == 3);                    // copied three elements over
  v.clear();                                // blow away v
  assert(v.size() == 0);                    // nothing left in v
  assert(v.empty());                        // another way to say that
  assert(w.size() == 3);                    // show that w not affected, yay!
}

void demo_valarrays() {
  valarray<int> a;                          // empty array
  valarray<int> b(5);                       // 0 0 0 0 0
  valarray<int> c(9, 5);                    // 9 9 9 9 9
  int primes[] = {2, 3, 5, 7, 11, 13};
  valarray<int> d(primes, 5);               // 2 3 5 7 11
  valarray<int> e = d * 3;                  // 6 9 15 21 33
  e += c;                                   // 15 18 23 30 42
  assert(e.size() == 5);
  assert(e.sum() == 129);
  e.shift(2);                               // 23 30 42 15 18
  e.shift(-1);                              // 18 23 30 42 15
  assert(e.max() == 42);
  e[4] = -2;                                // 18 23 30 -2 15
  assert(e.max() == 30);
  b = -e * d - c;                           // -39 -63 -129 -219 13
  assert(b.max()-b.min() == 232);
  b.resize(3, 10);                          // 10 10 10
  assert(b.size() == 3 && b.sum() == 30);
}

int main() {
  demo_arrays();
  demo_vectors();
  demo_valarrays();
}

Strings

In a managed language, you would think strings would be really hard. Well yeah, they are, at the basic level....

C-style Strings

As in C, byte arrays terminated with a zero-byte are called strings. Annoyingly, because arrays act like pointers these things have reference semantics. Ouch.

strings.cpp
// Illustration of the C-style strings in C++.

#include <cassert>
#include <cstring>
#include <iostream>
#include <iomanip>
using namespace std;

void inspect_8(const char* s) {
  cout << '\n' << s << "\nUTF-8:  ";
  while (*s) {
    cout << ' ' << hex << setfill('0') << setw(2) << (unsigned)(uint8_t)(*s++);
  }
  cout << "\n";
}

void inspect_16(const char16_t* s) {
  cout << "UTF-16: ";
  while (*s) {
    cout << ' ' << hex << setfill('0') << setw(4) << (unsigned)(uint16_t)(*s++);
  }
  cout << "\n";
}

void inspect_32(const char32_t* s) {
  cout << "UTF-32: ";
  while (*s) {
    cout << ' ' << hex << setfill('0') << setw(8) << (unsigned)(uint32_t)(*s++);
  }
  cout << "\n";
}

void demo_the_basic_string_as_byte_array() {
  char just_a_char_array[3] = {'d', 'o', 'g'};
  char actually_a_string[4] = {'d', 'o', 'g', (char)0};
  char this_is_also_a_string[4] = {'d', 'o', 'g', '\0'};
  char s[4] = "dog";

  // String literals are arrays with the zero at the end
  assert(s[0] == 'd' && s[1] == 'o' && s[2] == 'g' && s[3] == char(0));

  // The string takes up one byte per character + one for the zero
  assert(sizeof s == 4);

  // strlen does not count the zero
  assert(strlen(s) == 3);
}

void demo_char_stars_are_utf8() {
  const char* bad = "a\xf1o";         // DISASTER! Code point of ñ is F1 but...
  assert(strlen(bad) == 3);
  inspect_8(bad);                     // ...Prints wrong because invalid UTF-8

  const char* correct = "año";        // C++ knows what to do
  assert(strlen(correct) == 4);       // Because ñ is encoded as C3 B1
  assert(correct[0] == (char)0x61);   // 'a'
  assert(correct[1] == (char)0xc3);   // First byte of 2-byte encoding of ñ
  assert(correct[2] == (char)0xb1);   // Second byte of 2-byte encoding of ñ
  assert(correct[3] == (char)0x6f);   // 'o'
  inspect_8(correct);                 // Prints beautifully!
}

void show_strings_as_different_sizes() {
  inspect_8("año");
  inspect_16(u"año");
  inspect_32(U"año");
  inspect_8("kōpa`a");
  inspect_16(u"kōpa`a");
  inspect_32(U"kōpa`a");
  inspect_8("привет");
  inspect_16(u"привет");
  inspect_32(U"привет");
  inspect_8("世界");
  inspect_16(u"世界");
  inspect_32(U"世界");
  inspect_8("🐕🍱");
  inspect_16(u"🐕🍱");
  inspect_32(U"🐕🍱");
  inspect_8("🇦🇽🏃🏽‍♀️");
  inspect_16(u"🇦🇽🏃🏽‍♀️");
  inspect_32(U"🇦🇽🏃🏽‍♀️");
}

int main() {
  demo_the_basic_string_as_byte_array();
  demo_char_stars_are_utf8();
  show_strings_as_different_sizes();
}

Strings in the Standard Library

But good news, the standard library has a string type. It has value semantics and manages its own memory. Good times:

stlstrings.cpp
#include <cassert>
#include <string>
#include <iostream>
using namespace std;

void demo_the_basics() {
  string s1 = "Hello", s2("Goodbye"), s3, s4(s2, 4, 3);

  assert(s4 == "bye");
  s3 = s1;                                  // Assignment
  s3[1] = 'u';                              // Strings are...mutable?
  assert(s3 == "Hullo");                    // Yep, prove that they are mutable
  assert(s1 == "Hello");                    // Prove that assignment was a copy
  assert(s2.length() == 7);                 // Show the length method
  string s5 = s1 + ',' + " then " + s2;     // + is concatentation
  assert(s5 == "Hello, then Goodbye");
  s5.replace(5, 6, " and");                 // At pos 5, delete 6 chars, insert " and"
  assert(s5 == "Hello and Goodbye");
  assert(s1 > s2);                          // Lexicographic compare FTW
}

void demo_additional_functionality() {
  string s = "abcdefghij";
  assert(s.front() == 'a');
  assert(s.back() == 'j');
  assert(s.at(5) == 'f');
  assert(s[5] == 'f');
  s.insert(5, "OOO");
  assert(s == "abcdeOOOfghij");
  s.push_back('k');
  s.erase(5, 3);
  assert(s == "abcdefghijk");
  s.pop_back();
  s += "ZZZ";
  assert(s == "abcdefghijZZZ");
  assert(s.find('Z') == 10);
  assert(s.rfind('Z') == 12);
}

void demo_lengths() {
  string s1 = "José is a 🏄🏽 from 🇲🇽";
  u16string s2 = u"José is a 🏄🏽 from 🇲🇽";
  u32string s3 = U"José is a 🏄🏽 from 🇲🇽";
  assert(s1.length() == 33 && s2.length() == 24 && s3.length() == 20);
}

int main() {
  demo_the_basics();
  demo_additional_functionality();
  demo_lengths();
}

Functions

Functions are values in C++. You can use (1) regular functions, (2) function objects, and (3) lambdas.

Defined Functions as Values

functions.cpp
#include <cassert>

int square(int x) { return x * x; }
int times_two(int x) { return x * 2; }

typedef int fun(int);

int twice(fun f, int x) {       // more readable than twice(int f(int), x)
  return f(f(x));
}

int main() {
  assert(twice(square, 3) == 81);
  assert(twice(times_two, 10) == 40);
}

Function Objects

Any object of a class overloading operator() is called a function object and is callable:

functionobjects.cpp
#include <cassert>

class Line {
  double m;
  double b;
public:
  Line(double slope, double intercept): m(slope), b(intercept) {}
  double operator()(double x) { return m * x + b; }
};

int main() {
  Line line1(5, 3);
  Line line2(-2, 8);
  assert(line1(3) == 18 && line1(-2) == -7);
  assert(line2(3) == 2 && line2(-2) == 12);
}

Lambdas

A quick example first:

lambdas.cpp
#include <cassert>

auto do_nothing = [](){};
auto times_two = [](auto x){return x * 2;};

double square(double x) {return x * x;}

auto compose = [](auto f, auto g) {
  return [f, g](double x){return f(g(x));};
};

int main() {
  do_nothing();
  assert(times_two(21) == 42);
  auto square_then_times_two = compose(times_two, square);
  auto times_two_then_square = compose(square, times_two);
  assert(square_then_times_two(10) == 200);
  assert(times_two_then_square(10) == 400);
}

:Lambda expressions (in their basic form) look like this:

[ capture ]( parameters ) -> type { body }
[ capture ]( parameters ) { body }

The capture clause can be [] to capture nothing, [=] to capture everything by copy, [&] to capture everything by reference. You can also specify exactly what to capture, e.g., [x, &y] to capture x by copy and y by reference.

There’s more to it, but this will do for now.

Standard Library Algorithms and Functionals

TODO

Class Inheritance

Here’s a pretty standard example. A lot is going on, so we’ll go over it in class. One good thing to know: the C++ uses the terms base class and derived class instead of superclass and subclass.

animals.cpp
#include <sstream>
#include <string>
#include <cassert>
using namespace std;

class Animal {
protected:
  string name;
  virtual string sound() = 0;
public:
  Animal(string name): name(name) {}
  string speak() {
    return (stringstream() << name << " says " << sound()).str();
  }
};

class Cow: public Animal {
protected:
  string sound() override {
    return "moooo";
  }
public:
  Cow(string name): Animal(name) {}
};

class Horse: public Animal {
protected:
  string sound() override {
    return "neigh";
  }
public:
  Horse(string name): Animal(name) {}
};

class Sheep: public Animal {
protected:
  string sound() override {
    return "baaaa";
  }
public:
  Sheep(string name): Animal(name) {}
};

int main(int, char**) {
  Animal* h = new Horse("CJ");
  assert(h->speak() == "CJ says neigh");
  Animal* c = new Cow("Bessie");
  assert(c->speak() == "Bessie says moooo");
  assert((new Sheep("Little Lamb"))->speak() == "Little Lamb says baaaa");
}

Macros

Macros are evaluated during a preprocessing phase, before the program is even compiled. While they have quite a few uses, we’ll just look at how to use them for lazy evaluation:

ifthenelse.cpp
#include <cassert>

int f_calls = 0;
int g_calls = 0;

int f() {
  f_calls++;
  return 1;
}

int g() {
  g_calls++;
  return 2;
}

template <class T> bool if_then_else(bool x, T y, T z) {
  return x ? y : z;
}

#define IF_THEN_ELSE(x, y, z) ((x)?(y):(z))

int main() {
  if_then_else(true, f(), g());
  if_then_else(false, f(), g());
  assert(f_calls == 2 && g_calls == 2);
  IF_THEN_ELSE(true, f(), g());
  assert(f_calls == 3 && g_calls == 2);
  IF_THEN_ELSE(false, f(), g());
  assert(f_calls == 3 && g_calls == 3);
}

Templates

TODO

Initialization

Initialization is a big topic in C++. To fully understand it, you have to study the reference manual. And probably study it a long time. As a programmer, you can try to stay sane by always providing initial values in your definitional declarations, e.g.,

Point p {5, 8};
int x = 200;
Stack< s(100);
Player* p = new Player(name, max_health, category);

If you do not specify an initial value (e.g., int x;, new Player()), then you will need to be very keenly aware of (among others!):

While you’d have to see the official language standard for full details, remember these big picture ideas:

Exercise: Research Zero Initialization and Default Initialization. Also research Constant Initialization, Value Initialization, Direct Initialization, Copy Initialization, List Initialization, Aggregate Initialization, Ordered Dynamic Initialization, and Unorderd Dynamic Initialization.

C++ Program Structure

Good to know:

Example:

multiplecppfiles.png

To run:

$ g++ -std=c++14 util.cpp app.cpp && ./a.out
6
Exercise: Make sure you understand why .h files are not compiled.

Standard Library Modules

The standard library modules are:

algorithm array atomic bitset cassert ccomplex cerrno cfenv ciso646 climits clocale complex condition_variable csetjmp csignal cstdalign cstdarg cstdbool cstddef cstdint cstdio cstdlib cstring ctgmath ctime cuchar cwchar deque exception forward_list fstream functional initializer_list iomanip ios iosfwd iostream istream iterator limits list locale map memory mutex new numeric ostream queue random regex scoped_allocator set shared_mutex sstream stack stdexcept streambuf string thread tuple typeinfo unordered_map unordered_set utility valarray vector

Notes on the Standard Library.

Things We Did Not Cover

C++ is a massive language. This introduction skipped a lot. A whole lot. Things not covered include, but are not limited to:

More Information

Here are some fine sources for more information: