Classes

What is a class? Why are classes important? How do you go about designing and writing classes?

Types vs. Classes

Software systems manipulate different kinds of things like accounts, calendars, cards, contacts, rational numbers, dates, windows, animals, carrots, buildings, countries, players, toolbars, menus, songs, artists, and playlists. Humans naturally think about these things, in terms of what they can and cannot do. You can sing songs but not windows; players can move but numbers cannot; accounts have a balance but carrots don’t. What things can and cannot do gives rise to the notion of type.

In many (but not all!) programming languages, there’s a related notion of class. Every object, in these languages, has a unique class, although it many have several types (because it can have multiple behaviors). The class defines a structure, or implementation, for the objects of the class, and serves as a factory for creating objects with that structure and behavior. A type, on the other hand, refers only to behavior.

Type

Behavior. The allowed operations. An object can have multiple types.

Class

A factory for creating objects. An object is created by, and “has”, exactly one class.

A class defines the properties (state) and operations (behavior) for its instances, and may include constructors for creating instances. It may even include some metadata, too. A convenient way to show off a class, in a language-independent fashion, is to diagram it (here I’ve used a notation from UML):

Here the intent is to make points immutable and polygons mutable.

Security
Secure Software Development stresses we should favor immutability when at all possible. When something needs to be mutable, we must control the mutability, either by prohibiting copies or by always making defensive copies. Also, we should validate the arguments to every method. In the examples that follow, we will try to follow those principles.

What is the UML, you may ask?

A good resource is Scott Ambler's, check it out.

In most languages, a class gives rise to a type. In Java, for example, given:

interface Printable { ... }
interface Runner { ... }
class Animal implements Printable, Serializable { ... }
class Dog extends Animal implements Runner { ... }
var winner = new Dog(...);

the object bound to winner has exactly one class, namely Dog, but it has many types: Dog, Runner, Animal, Printable, Serializable, Object.

Classes are unrelated to Object-Oriented Programming
You can do object-oriented programming very well without classes. You can even have classes without object-orientation. People often get them confused, since most explanations of OOP feature classes prominently, but again, this not need be the case.

So Many Kinds of Classes

A class is a factory for objects, and every object it creates has a given structure and behavior. Thus the class serves as a type for the instances that it creates. Classes come in many variations:

An enumeration class has a fixed set of instances.
A singleton class has exactly one instance.
An abstract class has no instances of its own, and must be subclassed.
A sealed class has a fixed set of subclasses.
A final class is not allowed to have any subclasses.
The instances of a data class are immutable, tested for equality by value, and usable as keys of a dictionary.

Naturally some programming languages have abused the idea of a class, using it for other things. For example, Java’s desire to make everything a class led to the idea of a utility class, which isn’t a factory or a type at all, but rather a big namespace for functions, which it calls “static methods.” Java’s designers probably thought they had a great idea at the time, but no, it’s not good. And it gets worse perhaps: the inability of Java to house any code outside of a class means any app you write, even a short command line script, must be housed in a class, with code lauched from a static method called main. And you may have guessed it, such a class is called an application class.

Let’s See Some Code

Let’s implement our point and polygon classes, which we diagrammed above, in a few languages.

Note that Point is a data class—immutable, with value semantics—while Polygon is your average plain old mutable everyday class. Because we care about security, WE WILL MAKE DEFENSIVE COPIES both on construction and when retrieving vertices.

Always handle mutable objects securely

If you are making a class for mutable objects, make sure you either (1) prevent these objects from being copied at all, or (2) make defensive copies of their fields. Otherwise you will end up with unintended sharing.

Also, in our implementations, we’ll write very conventional code, paying attention to whether camelCase or snake_case should be used, and whether we need extra methods such as computing custom equality and/or hash codes.

Ruby

Ruby does classes pretty cleanly! Instance fields are marked with @ and are scoped entirely to the class. You need to write methods to access them, or use attr_reader to automatically generate the accessor methods. Class fields are named beginning with @@. Instance methods and class methods are easy to identify.

polygons.rb

class Point
  attr_reader :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  end
  @@origin = Point.new(0, 0)
  def self.ORIGIN = @@origin
  def self.midpoint_of(p, q) = Point.new((p.x + q.x) / 2, (p.y + q.y) / 2.0)
  def distance_from_origin = Math.hypot(@x, @y)
  def reflection_about_origin = Point.new(-@x, -@y)
end

class Polygon
  def initialize(*points)
    raise 'Need at least three points' if points.length < 3
    @points = Array.new(points)
  end

  def perimeter
    result = 0
    @points.each_with_index do |p, i|
      q = @points[(i + 1) % @points.length]
      result += Math.hypot(p.x - q.x, p.y - q.y)
    end
    result
  end

  def area
    result = 0
    @points.each_with_index do |p, i|
      q = @points[(i + 1) % @points.length]
      result += p.x * q.y - q.x * p.y
    end
    result / 2
  end

  def vertices
    @points.copy
  end

  def add_vertex(index, x, y)
    raise "Cannot add at #{index}" if index < 0 or index > @points.length
    @points.insert(index, Point.new(x, y))
  end

  def update_vertex(index, x, y)
    raise "Cannot update at #{index}" if index < 0 or index >= @points.length
    @points[index] = Point.new(x, y)
  end

  def remove_vertex(index)
    raise "Cannot remove at #{index}" if index < 0 or index >= @points.length
    @points.delete_at(index)
  end
end

Exercise: In the code above, (a) explain why points are immutable, (b) note how class methods vs. instance methods are defined, (c) explain why @@origin exists, and (d) check that defensive copies are made for the polygon vertices both on construction and on query.

You can add new methods to Ruby classes after the fact, and existing instances will pick them up. 😮😮😮😳

JavaScript

While there’s no way to add new fields to an object after the fact (though you can add new methods to an existing object), JavaScript fields are always accessible to the outside unless you mark them private. You can prevent the addition or deletion of fields and methods via freeze:

polygons.js

class Point {
  constructor(x, y) { Object.assign(this, { x, y }); Object.freeze(this) }
  get distanceFromOrigin() { return Math.hypot(this.x, this.y) }
  get reflectionAboutOrigin() { return new Point(-this.x, -this.y) }
  static ORIGIN = new Point(0, 0)
  static midpointOf(p, q) { return new Point((p.x + q.x) / 2, (p.y + q.y) / 2.0) }
}

class Polygon {
  #points

  constructor(...points) {
    if (points.length < 3) {
      throw new Error('Need at least three points')
    }
    this.#points = points.slice()
    Object.freeze(this)
  }

  get perimeter() {
    let result = 0
    for (let i = 0; i < this.#points.length; i += 1) {
      const [p, q] = [this.#points[i], this.#points[(i + 1) % this.#points.length]]
      result += Math.hypot(p.x - q.x, p.y - q.y)
    }
    return result
  }

  get area() {
    let result = 0
    for (let i = 0; i < this.#points.length; i += 1) {
      const [p, q] = [this.#points[i], this.#points[(i + 1) % this.#points.length]]
      result += (p.x * q.y) - (q.x * p.y)
    }
    return result / 2
  }

  get vertices() {
    return this.#points.slice()
  }

  addVertex(index, x, y) {
    if (index < 0 || index > this.#points.length) {
      throw new Error(`Cannot add at index: ${index}`)
    }
    this.#points.splice(index, 0, new Point(x, y))
  }

  updateVertex(index, x, y) {
    if (index < 0 || index >= this.#points.length) {
      throw new Error(`Cannot update at index: ${index}`)
    }
    this.#points[index] = new Point(x, y)
  }

  removeVertex(index) {
    if (index < 0 || index >= this.#points.length) {
      throw new Error(`Cannot remove at index: ${index}`)
    }
    if (this.#points.length === 3) {
      throw new Error('Removal would make this polygon degenerate')
    }
    this.#points.splice(index, 1)
  }
}

Noticed we made a couple of the properties getters so they look like data properties.

We’ve prevented the addition of fields and methods to the point and polygon objects, but we can still modify the static (class) properties (try changing Point.ORIGIN) and even add new properties to each class. If we want to prevent even that, we can add:

Object.freeze(Point)
Object.freeze(Polygon)

Java

We start with the point class. We want points to be immutable (and have value semantics), just the kind of thing Java has the record keyword for:

Point.java

public record Point(double x, double y) {

    public static final Point ORIGIN = new Point(0, 0);

    public Point {
        if (Double.isNaN(x) || Double.isNaN(y)) {
            throw new IllegalArgumentException("Coordinates can not be NaN");
        }
    }

    public double distanceFromOrigin() {
        return Math.hypot(x, y);
    }

    public Point reflectionAboutOrigin() {
        return new Point(-x, -y);
    }

    public static Point midpointOf(Point p, Point q) {
        return new Point((p.x + q.x) / 2.0, (p.y + q.y) / 2.0);
    }
}

When using records, Java safely generates equals, hashCode, and toString methods (though we can override if we want). Because you should know what is going on behind the scenes, here is the actual class that Java generates for the record above:

// Don't write this yourself, we're only showing what the record produces.

import java.util.Objects;

public class Point {
    public static final Point ORIGIN = new Point(0, 0);

    private final double x;
    private final double y;

    public Point(double x, double y) {
        if (Double.isNaN(x) || Double.isNaN(y)) {
            throw new IllegalArgumentException("Coordinates can not be NaN");
        }
        this.x = x;
        this.y = y;
    }

    public double x() {
        return x;
    }

    public double y() {
        return y;
    }

    public double distanceFromOrigin() {
        return Math.hypot(x, y);
    }

    public Point reflectionAboutOrigin() {
        return new Point(-x, -y);
    }

    public static Point midpointOf(Point p, Point q) {
        return new Point((p.x + q.x) / 2.0, (p.y + q.y) / 2.0);
    }

    @Override
    public boolean equals(Object o) {
        return (o instanceof Point other) && x == other.x && y == other.y;
    }

    @Override
    public int hashCode() {
        return Objects.hash(x, y);
    }

    @Override
    public String toString() {
        return "Point[x=" + x + ", y=" + y + "]";
    }
}

Polygons will be mutable, but only through its own methods. As always, we have to be super, super careful here because the internal state contains a collection. We have to do prevent users of our class from modifying our points by obtaining a reference to our internal list of points. So just like in previous examples:

On accepting points from the caller, we have to make a defensive copy of the initial points.
When a caller asks for the list of points, we have to either give them a defensive copy, or provide a completely read-only view.

Polygon.java

import java.util.List;
import java.util.ArrayList;

/**
 * A mutable polygon containing at least three vertices, where the vertices are
 * assumed to be listed in counter-clockwise order.
 */
public class Polygon {

    private ArrayList<Point> vertices;

    public Polygon(List<Point> vertices) {
        if (vertices == null || vertices.length < 3) {
            throw new IllegalArgumentException("Need at least 3 vertices");
        }
        for (var point : vertices) {
            if (point == null) {
                throw new IllegalArgumentException("Null points are not allowed");
            }
        }
        // Important to make a defensive copy!
        this.vertices = new ArrayList<>(vertices);
    }

    public double getPerimeter() {
        var result = 0.0;
        for (var i = 0; i < vertices.size(); i++) {
            Point p = vertices.get(i), q = vertices.get((i + 1) % vertices.size());
            result += Math.hypot(q.getX() - p.getX(), q.getY() - p.getY());
        }
        return result;
    }

    public double getArea() {
        var result = 0.0;
        for (var i = 0; i < vertices.size(); i++) {
            Point p = vertices.get(i), q = vertices.get((i + 1) % vertices.size());
            result += p.getX() * q.getY() - q.getX() * p.getY();
        }
        return result / 2.0;
    }

    public List<Point> getVertices() {
        // Give the callers an immutable fixed-size list
        return List.copyOf(vertices);
    }

    public void addVertex(int index, double x, double y) {
        if (index < 0 || index > vertices.size()) {
            throw new IllegalArgumentException("Cannot add at index: " + index);
        }
        vertices.add(index, new Point(x, y));
    }

    public void updateVertex(int index, double x, double y) {
        if (index < 0 || index >= vertices.size()) {
            throw new IllegalArgumentException("Cannot update at index: " + index);
        }
        vertices.set(index, new Point(x, y));
    }

    public void removeVertex(int index) {
        if (index < 0 || index >= vertices.size()) {
            throw new IllegalArgumentException("Cannot remove at index: " + index);
        }
        if (vertices.size() == 3) {
            throw new IllegalStateException("Removal would make the polygon degenerate");
        }
        vertices.remove(index);
    }
}

Kotlin

Here are the same classes in Kotlin:

Point.kt

data class Point(val x: Double, val y: Double)

TODO

Python

Here are the same classes in Python. Python is interesting because (1) there’s really no serious concept about hiding the “fields” from the outside (though there are conventions to do so), and (2) the receiver of the method is an explicit parameter to the constructors and instance methods:

polygons.py

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
  
    @property
    def distance_from_origin(self):
        return math.hypot(self.x, self.y)

    @property
    def reflection_about_origin(self):
        return Point(-self.x, -self.y)

    @staticmethod
    def midpoint_of(cls, p, q):
        return Point((p.x + q.x) / 2, (p.y + q.y) / 2.0)

Point.ORIGIN = Point(0, 0)


class Polygon:

    def __init__(self, *points):
        if points.length < 3:
            raise ValueError('Need at least three points')
        self.points = list(points)

    @property
    def perimeter(self):
        result = 0
        for i in range(len(self.points)):
            p, q = self.points[i], self.points[(i + 1) % self.points.length]
            result += math.hypot(p.x - q.x, p.y - q.y)
        return result

    @property
    def area(self):
        result = 0
        for i in range(len(self.points)):
            p, q = self.points[i], self.points[(i + 1) % self.points.length]
            result += (p.x * q.y) - (q.x * p.y)
        return result / 2

    @property
    def vertices(self):
        return list(self.points)

    def addVertex(self, index, x, y):
        if index < 0 or index > self.points.length:
            raise ValueError(f"Cannot add at index: {index}")
        self.points.splice(index, 0, Point(x, y))

    def updateVertex(self, index, x, y):
        if index < 0 or index >= self.points.length:
            raise ValueError(f"Cannot update at index: {index}")
        self.points[index] = Point(x, y)

    def removeVertex(self, index):
        if index < 0 or index >= self.points.length:
            raise ValueError(f"Cannot remove at index: {index}")
        if self.points.length == 3:
            raise ValueError('Removal would make self polygon degenerate')
        self.points.splice(index, 1)

Aspects of Classes

When we design classes we pay attention to three aspects:

Specification

The protocol, interface, "contract", or behavior. Given primarily by constructor and method signatures.

Representation

(Should be hidden) The low-level structural details. Given by the field declarations.

Implementation

(Should be hidden) The bodies of the constructors and methods.

Exercise: Identify the specification, representation, and implementation in the Point class above.

Structure of Classes

The most visible structural components of classes (in most languages) are:

Properties (also called fields, slots, attributes, or member variables)
Operations (also called methods or member functions). Some people like to talk about different kinds of operations like
- Constructors
- Destructors
- Accessors (a.k.a. selectors): read-only operations for reading state
- Modifiers (a.k.a. mutators): read-write operations for modifying state
- Iterators (for classes representing some kind of container): perform an operation on each of the elements of the container object.

Exercise: Identify and describe the properties and operations of the Point class above.

Multiple Inheritance

Usually the term multiple inheritance refers to a class being derived from, or extending, multiple classes. This is a much-debated feature—some languages have it, some do not. A lot of people hate it! The problem is that classes can have implementation, and implementation inheritance has a lot of ambiguities associated with it; it turns out to be very complex and messy. Java does not have this: a class can only extend one superclass.

Consider

and assume that d is an object of class D. We have quite a few decisions to make.

There is a name clash inheriting the field y from classes B and C, making us wonder what the expression "d.y" might mean. We could make it a compile time error to have the multiple inheritance because of the name clash. We could allow the the extension, but throw out the field y. We could also envision d having either one or two fields called y (that is we can "merge" the fields). If we don't merge them, we might need some syntax like d.B::y and d.C::y (that is, require "qualification").
There is a name clash inheriting the field z from classes B and C. Like in the previous item we could disallow the extension or simply throw out the field z. Or we can envision having either one or two fields called z. If one, WHICH one, B's or C's? That is, would "d.z" have type Integer or String? Do we resolve the ambiguity by alphabetical order or by the name of the class that appeared first in the "extends" clause of class D's declaration? If two, we might need some syntax like d.B::z and d.C::z (that is, require qualification).
There is a name clash inheriting the operation g from classes B and C. Like in the previous items we could disallow the extension or simply throw out g. If we allow the extension what would "d.g(c)" mean? Which operation would it call, B's g or C's g()? We could resolve this as in the previous item. Or maybe both should be called, but which one FIRST? Or maybe we inherit both bodies but require explicit qualification in the call, such as d.B::g(c) and d.C::g(c).
Note that there is no name clash in the inheritance of h from B and C; d obviously has two h's that overload each other.
Does d have two fields called x or just one?
Does d have two methods called f or just one?

Pure interface Inheritance doesn't have these ambiguities. Since the methods in an interface don't have bodies, a class implementing multiple interfaces has only one implementation for a method even when there appears to be a name clash!

Does Java have pure interface inheritance? Well, not really. A Java interface may contain:

static fields and methods
abstract fields and methods
default methods

Some of these are susceptible to name clashes.

Exercise: Which ones? And how?

Some Design Considerations

A few tips for class designers follow.

The most important thing about operations

Think: every operation should succeed or fail. If it succeeds, great; if it fails, throw an exception or return an optional. Do not send back failure codes only to clients if you can help it. Doing so puts the burden of checking on the client and many times the client programmer forgets the check.

Exercise: Describe various ways implement an operation to get the integer value of a string, knowing that an arbitrary string might not look like a integer at all. Talk about the consequences of each of the design alternatives.

Hide the Representation

Hiding the representation is a really good idea, for these four primary reasons:

It is easier to write the client: the client code will not be cluttered with low level details about the type.
The representation may be more general than necessary, and some protocol is required to restrict values to a legal range. For example you would not want to expose a balance field in an account class, since someone could just assign a negative value. You should guard field updates with methods that can check these kinds of things.
Field updates may require side effects that must be done every time, so the updates should only be permitted through ADT operations that can ensure that they are done. For example, we don't want direct updates to a balance field since this would bypass security checks and transaction logging.
It allows the representation to change without having to rewrite the client! For example if a bank maintained a list of accounts, and we later wanted to change to a map instead, nothing in the clients programs would have to change since they never referenced the list directly.

In practice this means:

Always keep the data members (fields) private, and always initialize them if it makes sense to do so.
Note that you don't have to, and often don't want to supply getter and setter methods for every field. Tell, don’t ask, as they say. In general, don't "expose" too much; private methods are sometimes great!

Keep Interfaces Small

Don't stuff the class full of too many fields or too many methods. For example if your Customer class has fields called street, city, state and zip as well as name and account number, you should introduce a new class called Address. If you have way too many methods, you may need to think about factoring the responsibilities of the class into two or more classes.

Consider Immutability

Immutable objects (objects whose values never change after they have been created) can be awesome. They are:

Simpler than mutable ones
More secure
Inherently thread-safe
Able to be shared freely (you only need one instance per value)
Never need to be defensively copied
Able to have their internals freely shared

Exercise: Read the section on immutability in Bloch's text.

Exercise: The point class above is already immutable, which is nice. Modify the polygon above so that it is immutable too. (You won't be able to have mutating methods to add and remove points, but if you learn about persistent data structures you will be able to do some pretty cool things! Some people [citation needed] think persistent data structures are the best.)

Consider Factory Methods

Somtimes you'll want to hide constructors and instead expose methods that return new objects (called static factory methods). Advantages:

You can get around the problem of not being able to have two constructors with the same signature: use two methods with different names.
They don't have to create new objects; they can return a cached copy.
They can return an object of a subtype.

Exercise: Read the section on static factory methods in Bloch's text Effective Java. You can also find some rebuttals to this advice online.

Exercise: (Somewhat contrived) Rewrite the point class above to use a private constructor and add a factory method. Add a caching mechanism so that separate point objects with the same value are never created.

Summary

We’ve covered:

Types specify behavior, classes are factories for creating objects
UML Diagrams for Classes
Many kinds of classes (enumeration, singleton, abstract, sealed, final, data)
Examples in several languages
Specification, representation, implementation
Kinds of class members (properties, constructors, destructors, methods)
The issue of multiple inheritance
Best practices in class design