Secure Coding Constructs

Don’t forget that programming is part of the software development lifecycle, too. And it’s the part where 90% of the vulnerabilities come from. What can developers do to stay safe out there?

Goals

Building secure software needn’t be any harder or take any longer than building insecure software. With training and disciplined practice, programmers will:

Secure Programming Best Practices

Developers must (1) know technology-specific risks and weak points in the software, and (2) program carefully (and defensively) to make the software as secure as possible. There are plenty of style guides and collections of security guidelines out there from consortiums, private companies, governmental agencies, non-governmental organizations, or even individuals publishing their own blog articles or course notes. Some are language-specific and some are general. Some are super famous, like the CWE and the OWASP Projects.

While there are hundreds of secure coding guidelines (remember CERT-C, CERT-C++, CERT-Java, OWASP), these notes focus on six extremely general programming principles that help maintain confidentiality, integrity, and availability. They apply to most if not all programming languages:

Manage Sharing

Avoid unintended sharing by using immutable objects, defensively copying, or preventing copying

Define Contracts

Modules and functions should maintain integrity through preconditions and postconditions

Control Failures

Handle failures securely so state does not get corrupted

Validate Everything

Validate callers, inputs, and the current context

Reduce Complexity

Complexity is the enemy of security (as someone once said)

Tighten Visibility

Design for least privilege and don’t leak anything

Immutability

Ah, immutability, often a programmer’s best friend. When used properly, this idea can improve correctness, efficiency and security. How so? Since immutable objects can not be changed:

No surprise errors from things changing out from under you, no race conditions, less memory required, programs running faster without copies. Faster means less waiting and fewer timeouts. Sounds good! How do we do this?

Immutable Variables vs. Immutable Objects

First, be very clear what you mean by immutable. It’s contextual. An immutable variable cannot be reassigned; an immutable object cannot have any of its properties reassigned (nor can you add properties, remove properties, nor reconfigure it in any way by making properties writable).

Here is a JavaScript example of an immutable variable holding a mutable object:

const a = [1, 2, 3]
a[1] = 55             // This is legal!
console.log(a)        // prints [1, 55, 3]
a = []                // NOT legal, throws a TypeError
console.log(a)        // still prints [1, 55, 3]
Exercise: Is this necessarily bad?
Exercise: Give the equivalent example in Java. In C++. In C#. In Kotlin.

How can we make objects immutable? This varies from language to language.

JavaScript

Given an object x, invoke Object.freeze(x). Freezing an object:

Normally, you place Object.freeze(this) at the end of constructors.

$ node --use-strict
> class Point { constructor(x, y) {this.x=x; this.y=y; Object.freeze(this)} }
> let p = new Point(3, 5)
> p.x = 2
Uncaught TypeError: Cannot assign to read only property 'x' of object '#<Point>'

Java

In a class, make each field private and final. You can use a record, which automatically makes the fields private and final.

$ jshell
|  Welcome to JShell -- Version 16.0.1
|  For an introduction type: /help intro

jshell> record Point(double x, double y) {}
|  created record Point

jshell> var p = new Point(3, 5)
p ==> Point[x=3.0, y=5.0]

jshell> p.x()
$3 ==> 3.0

jshell> p.x = 100.0
|  Error:
|  x has private access in Point
|  p.x = 100.0
|  ^-^

Python

Many of the built-in types of Python are already immutable: all the numeric types are, as are bool, str, bytes, tuple, frozenset, and range. (A few are mutable, including list, set, dict, and bytearray.)

If you want to create a type of your own that produces only immutable objects, check out this StackOverflow question and several of its answers for various approaches. TL;DR you want namedtuple.

Here’s the normal, mutable form of a user-defined class:

$ python
>>> import math
>>> class MutablePoint:
...   def __init__(self, x, y):
...     self.x = x
...     self.y = y
...   def distance_to_origin(self):
...     return math.hypot(self.x, self.y)
... 
>>> p = MutablePoint(-4, 3)
>>> p.x
-4
>>> p.y
3
>>> p.distance_to_origin()
5.0
>>> p.y = 23
>>> p.distance_to_origin()
23.345235059857504
>>> type(p)
<class '__main__.MutablePoint'>
>>> isinstance(p, MutablePoint)
True
>>> type(MutablePoint)
<class 'type'>

Now, let’s use namedtuple to make immutable points:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> type(Point)
<class 'type'>
>>> p = Point(x=3, y=5)
>>> type(p)
<class '__main__.Point'>
>>> isinstance(p, Point)
True
>>> p.x
3
>>> p.y
5
>>> p.x = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
Exercise: (Research) Can methods be added to a named tuple?

PHP

I don’t know PHP well enough, but I could not resist the opportunity to include a screenshot from my favorite blog, Melissa Elliott’s PHP Manual Masterpieces:

phpdatetimeimmutable.png

Persistent Data Structures

There are languages in which all (or almost all) objects are immutable. Wait what? ALL objects? Okay some lists and trees can be immutable, but what if we want to keep track of data that grows and shrinks at runtime? How would we add to a list? Insert into a tree? Remove an element from a set? Replace a value in a dictionary?

If these structures are immutable, YOU DON’T! Add, delete, and update operations return a new data structure! Adding to a list returns a new list. Inserting into a tree returns a new tree. Removing an element from a set returns a new set. Even replacing (updating) returns a new object. More formally:

    add_to_front([3, 5, 7], 1) ==> [1, 3, 5, 7]
    add_to_set(3, {5, 1}) ==> {5, 3, 1}
    update({'x': 1, 'y': 3}, 'y', 10) ==> {'x': 1, 'y': 10}
    delete({7, 3, 2, 1}, 3) ==> {1, 7, 2}

If we return a new structure, the old one still hangs around; we have what are called persistent data structures. But is it efficient to keep the old ones around? What do you think happens here?

a = make_persistent_list_with_filled_value("dog", 50)
b = add_to_front(a, "cat")
print(a)
print(b)

Do you think we copy all the nodes of list a? Actually no, the add_to_front operator is efficient. This happens:

catdogdog.png

But, but, aren’t a and b sharing internal data? What would happen if we mutate a? Wouldn’t b get mutated too? Isn’t this bad?

ANSWER: Calm down, we are talking about immutable data structures, remember?. Since both a and b are immutable, their contents can’t change, so this sharing is quite alright.

CLASSWORK
We’re going to do successive insertions and deletions on a the binary search tree on a whiteboard. It will be fun. Also, we’ll discuss advantages and disadvantages of these things. Surely immutability has advantages, but do they outweigh the cost of the extra storage needed to support them?
You do need to be aware of these inner workings, because efficiency matters. Inserting at the front of a list is fine, but appending to the end is a totally different story: the assignment b = a.append("cat") would need a complete (and expensive) copy of a, provided we actually used a or b later.
Exercise: Explain why.

Non-extensibility

Just a quick note: related to the idea of immutability is the idea that in some languages you can lock-down more than just objects. In Java, for example, you can make methods final, meaning they can not be overridden in subclasses, and classes final, meaning they can not be subclassed.

Exercise: Think of how final classes can aid security.

If you can’t have immutability...

Sometimes you need mutable data, right? Of course. Not every language will give you persistent data structures; in fact well over 99% are going to let you mutate. It’s simpler! If you do use mutable structures, though, you need to manage and control these objects. Don’t share them on different threads. Don’t accept them from untrusted sources, and don’t leak them to untrusted targets. What programming constructs should you employ to do these things?

Design by Contract

Ensure that each function, each method, each module, each operator, each everything states precisely:

Writing good contracts takes some skill. You need to write these at the right level of abstraction.

Some languages allow contracts to be written declaratively.

Exercise: The enforcement of preconditions and postconditions can lead to simplifying code by taking advantage of layers of trust. Give an example of how this can be the case.

Validation

Techniques like using immutable objects and employing languages with static program analysis tools detect possible errors before the program is ever run, they can’t check everything. Errors may still happen at run time. Validation is the checking for problems at run time. During validation, you can either reject (preferred) or sanitize (only if you know what you are doing and then only 0.00000000001% of the time) input.

There are three broad categories of validation

Validate Callers

Is this caller allowed to do this?

Validate Inputs

Does this parameter make any sense?

Validate State

Can this operation be done now (in the current context)?

The general approach is:

  1. At the beginning of an operation, make sure objects are in the proper state to carry out the operation (i.e., that all preconditions are satisfied). For example, in an add operation for a container, make sure the container isn’t already full.
  2. Also at the beginning, validate the inputs (the arguments) entering the system, as they cannot be trusted. For example, don’t accept negative quantities, null things, objects that are too large, or malformed things.
  3. During the operation, ensure any internal consistency rules or other invariants do not get violated.
  4. On exit, check that all postconditions are satisfied.

There are three best practices for validation:

Whitelisting and Blacklisting

A whitelisting approach “enumerates” (possibly with patterns) which inputs are valid and a blacklisting approach says which ones will be rejected. In practice you might use a combination of both. Probably you want to start with a whitelisting approach where possible.

Ordering Validations

Why do we validate in a specific order? It’s more important that you might think. Layer the validation so that the easiest checks come first, so you might not even need to get to the expensive checks.

Here’s a way to approach validation. What do we validate? We validate operations. When an operation is called, we validate the (1) source, the (2) requestor, the (3) context, and the (4) operands. In more detail, the order is generally:

  1. ORIGIN Is the request coming from a legal source/sender?

    Many requests are tagged with an origin, or source (can be an IP address in a network-based application, or a requesting object or module can be tracked even in a single application). Certain origins can be whitelisted or blacklisted, often with the help of security groups, virtual private clouds (VPCs) or access control lists (ACLs). Origin checks happen before you even read the data!

  2. AUTHENTICATION Is the caller properly authenticated (who they say they are)

    Caller may be required to present valid credentials, or a valid access key, or a valid token. Keys and tokens should generally be signed, and the signature has to be verified against a secret, and must not be expired, etc.

    Admittedly, there is a fine line between origin checks and authentication checks, but basically origin checks are done outside of your app (either on the router or some kind of gateway), while authentication checks are normally done in-app, albeit well factored out into some nice middleware.
  3. AUTHORIZATION Does the caller have permission to do this operation?

    It is often important to set up a permission system for every operation. Permissions can be things like CanDeleteRecord, CanUpdateTaskMetadata, CanCreateUser, etc. There can be tens of thousands of these in large systems. Sometimes we group big sets of permissions into user roles.

  4. STATE Is the system in a state in which operation can even be called?

    You should not push onto a full stack, nor pop from an empty stack, nor initialize something that has already been initialized, nor send a message over a connection that has not been established, nor pause a thread that is not already running, nor close a file that has not already been opened.

  5. SIZE Is an argument negative, zero, too small, too large?

    Negative quantities or prices can mess up an e-commerce application, and massive data payloads can cause a denial of service. PUT BOUNDS ON ALL THE THINGS.

    Sometimes do a size check even if a regex subsumes it. If you are taking in SSNs or ISBNs or custom IDs of any kind with a known length, check the length first before even engaging the regex engine.

  6. LEXICAL Is an argument lexically well-formed?

    Usually these are quick checks, e.g., is the data from the right character set, is the encoding valid (e.g., some byte sequences are not valid UTF-8). Regex matching is common here. Oh, and if you are taking in XML, don’t expand entities. In general, whatever you are taking in, make sure it can’t be executed. Don’t allow injection attacks of any kind!

  7. SYNTACTIC Is and argument syntactically well-formed?

    These checks often require parsing, though you may get away with a simple checksum or hash computation.

  8. SEMANTIC Is and argument semantically well-formed?

    These checks apply formal business rules, which are often programmatic in nature and might have to hit external resources like databases.

Some of these concerns may appear to overlap a little, but that’s fine.

The ordering of state and semantic can be messy. If state validation requires a database lookup, do it last. Only do early state checks on in-memory objects.

Validating In the Domain Model

Define constraints within an object where possible; it’s generally better for an object to check itself (or better, not even get created in a bad state). Example: use a UserId class, rather than passing around user ids as strings. All legality rules on these ids are managed inside the UserId class, never elsewhere.

This is a powerful idea because:

Validating without if-statements

One way to validate is to pollute your code with a whole bunch of if-statements all over the place checking for this and that and every little thing. This is bad because it clutters the code and focuses on the negative cases...sometimes leading you to write with double negatives. These are BAD:

void searchPhotos() {
    if (!loggedIn()) {
        throw new AuthenticationError("Not logged in");
    }
    // do the search ...
}
void chooseItems(int quantity) { 
    if (quantity < 1) {
        throw new IllegalArgumentException("Too small");
    } else if (quantity > 50) {
        throw new IllegalArgumentException("Too big");
    }
    // choose the items ...
}

Better is to be more positive, using a validator (which internally can throw), something like:

void searchPhotos() {
    Validate.loggedIn();
    // do the search ...
}
void chooseItems(int quantity) { 
    Validate.inInclusiveBounds(1, 50, "Quantity is out of range");
    // choose the items ...
}

We can still do so much better!! We could use a decorator:

@Authenticated
void searchPhotos(Query query) {
    // do the search ...
}

and we can encapsulate type and bounds checks inside a domain object (here the Quantity class ensures that quantity objects can’t even represent values out of the accepted range):

void chooseItems(Quantity quantity) {
    // choose the items ...
}

These are powerful ideas.

Exercise: Discuss why these approaches are so powerful.

Try to come up with good names for your validation functions. Some examples:

  Validate.isNotNull(expression)
  Validate.isTrue(expression)
  Validate.matchesPattern(pattern, expression)
  Validate.isGreaterThan(threshold, expression)
  Validate.isLessThan(threshold, expression)
  Validate.isInclusivelyBetween(low, high, expression)
  Validate.isExclusivelyBetween(low, high, expression)
  Validate.isNotEmpty(expression)
  Validate.isNotFull(expression)

Don’t be afraid of writing validations that are highly domain-specific! In a compiler, you might validators like:

  isNumeric
  isNumericOrString
  isBoolean
  isInteger
  isAType
  isAnOptional
  isAnArray
  hasSameTypeAs
  allHaveSameType
  isNotRecursive
  isAssignableTo
  isNotReadOnly
  areAllDistinct
  isInTheObject
  isInsideALoop
  isInsideAFunction
  isCallable
  returnsNothing
  returnsSomething
  isReturnableFrom
  matches
  matchesParametersOf
  matchesFieldsOf

Sanitizing Inputs?

One more note about validation. We validate for many reasons, but one specific one is to ensure that an input can never be executed (or interpreted). If an input is ever passed into an executable context, it must be sanitized or defanged.

Actually, it’s best not to pass input into executable contexts in the first place, e.g., use textContent rather than innerHTML in a browser, or whitelist the heck out of inputs so that no characters with special meaning to an interpreter are even allowed. But sometimes you can’t, so you have to sanitize. After all, SQL uses apostrophes for strings and D'Andre and O'Brien are indeed person names.

But there is a huge warning:

Never sanitize input yourself

That’s right, if you have to sanitize, get a library that’s been tested to death. Sanitization is ridiculously hard: how do you know you covered every case? How do you know you’ve called the sanitizer everywhere? How can you avoid over-sanitizing and messing up valid data? How do you know if your sanitization as changed a user’s intended meaning?

Your first thought should be how to avoid the need for sanitization. To recap:

Further reading on sanitization.

Speaking of Names

All developers need to read Falsehoods Programmers Believe About Names and Why Your Form Only Needs One Name Field. Please. Be a good person. Do the right thing. Inclusion matters.

Secure Error Handling

If an operation cannot be carried out, for whatever reason, it is said to fail. What should the programmer arrange to do on failure?

The problem with the first two approaches is that a system is likely left in an inconsistent (corrupted) state, and something terrible is going to occur long after the initial problem occurred—by that time it’ll be hard to go back and figure out what happened. The cost at this point might be catastrophic.

Exercise: Discuss whether a completely shutdown, stopped system is better or worse that an inconsistent or corrupted system that keeps running. Or perhaps it depends on the kind of system? Under what situations is a hard crash acceptable?

Some thoughts to start your discussion: a banking system that is down cannot clear out your account, and flight control software that isn’t running cannot, um, ... (does that help?)

We must:

Other best practices:

There’s another big part of error handling we have to mention:

Okay, now let’s look at how to manage control flow in the presence of error handling.

Error Objects

An error object represents, well, an error. You can use error objects just like any other object, you can assign them to variables and return them from functions. But in many languages, you can throw them. Throwing disrupts the caller, forcing it to abandon its normal control flow and transfer control to an error handler.

If you can’t throw (some languages *cough* C *cough* don’t allow this), you can return a structure with both an error object and the success value, or a union of the error object and success value. The language Swift allows both throwing and union-style result objects:

song_errors.swift
import Foundation

struct Song { 
    let songId: String
    let lyrics: String
    let price: Int
}

struct User {
    let userId: String
    var name: String
    var currency: Int
}

let songs = [
    "1": Song(songId: "1", lyrics: "Lalala", price: 10),
    "2": Song(songId: "2", lyrics: "Dumdumdum", price: 50),
]

let validTokens = [
    "ABC": User(userId: "alice", name: "Alice", currency: 30),
    "XYZ": User(userId: "bob", name: "Bob", currency: 15),
]

enum SongFetchError: Error {
    case illegalToken
    case noSuchSong
    case insufficientCurrency(shortBy: Int)
}

/*
 * Example of a fetch that throws an error on failure.
 */
func retrieveSong(token: String, songId: String) throws -> Song {
    guard var user = validTokens[token] else {
        throw SongFetchError.illegalToken
    }
    guard let song = songs[songId] else {
        throw SongFetchError.noSuchSong
    }
    guard user.currency >= song.price else {
        throw SongFetchError.insufficientCurrency(shortBy: song.price - user.currency)
    }
    user.currency -= song.price
    return song
}

/*
 * Example of a fetch that returns a Result instance.
 */
func getSong(token: String, songId: String) -> Result<Song, SongFetchError> {
    guard var user = validTokens[token] else {
        return .failure(.illegalToken)
    }
    guard let song = songs[songId] else {
        return .failure(.noSuchSong)
    }
    guard user.currency >= song.price else {
        return .failure(.insufficientCurrency(shortBy: song.price - user.currency))
    }
    user.currency -= song.price
    return .success(song)
}

// ---------------------------------------------------------------------------------------
// HANDLING ERRORS THAT ARE THROWN
// ---------------------------------------------------------------------------------------

// General way: do-catch statement with try
do {
    let song = try retrieveSong(token: "ABC", songId: "2")
    print("Sing along: \(song.lyrics)")
} catch SongFetchError.illegalToken {
    print("Sorry, bad token")
} catch SongFetchError.noSuchSong {
    print("Sorry, that's not there")
} catch SongFetchError.insufficientCurrency(let need) {
    print("You need \(need) more currency units")
} catch {
    print("I don't know what's going on")
}

// If you don't care about the error, you can make it an optional with "try?"
let lyrics = (try? retrieveSong(token: "ABC", songId: "2"))?.lyrics ?? "Nothing to sing"
print(lyrics)

// If you KNOW IT WILL WORK you can force it with "try!" but beware
let song = try! retrieveSong(token: "XYZ", songId: "1")
print(song.lyrics)

// ---------------------------------------------------------------------------------------
// HANDLING ERRORS IN RESULT INSTANCES
// ---------------------------------------------------------------------------------------

// General way: switch on .success and .failure
switch getSong(token: "ABC", songId: "2") {
case .success (let song):
    print("Sing along: \(song.lyrics)")
case .failure(SongFetchError.illegalToken):
    print("Sorry, bad token")
case .failure(SongFetchError.noSuchSong):
    print("Sorry, that's not there")
case .failure(SongFetchError.insufficientCurrency(let need)):
    print("You need \(need) more currency units")
}

// You can ignore failure by just binding to .success
if case .success = getSong(token: "XYZ", songId: "1") {
    print("Sing along: \(song.lyrics)")
}

// Invoking .get() on a result makes it throw on error, so
//   try result.get() can be used in a do-catch
//   try? result.get() ---> makes an optional
//   try! result.get() ---> will force 

When might you prefer errors to throwing? One case is async code, where throwing an exception would go...where?

The Let-It-Crash Philosophy

The programming language Erlang has been used to build systems with seven-nines availability: down only three seconds per year, or two minutes every forty years. How?! A big part of the reason is the Let-It-Crash philosophy. Instead of polluting the code with a lot of error checking and handling, a process just dies. This works because (1) Erlang processes are independent of each other (they share nothing and communicate only via message passing) and (2) the Erlang runtime is such that processes are extremely cheap to restart.

This whole concept is fascinating so learn more:

When are errors errors?

Consider these scenarios:

The first two scenarios are not errors at all, as they not unexpected. Often, things may or may not exist (not every employer has a supervisor, and sometimes you might go to store and they are out of what you are looking for). But whether the scenario is expected or not: your code must never attempt to operate on the missing value!

When something is truly “optional,” its absence is not an error, and you should take advantage of your language’s support for optionals. Most modern languages (even Java!) directly support optionals via a generic type, usually with a nice syntax. JavaScript doesn’t have a type, but it does have the common ?. and ?? operators. Python has type hints for optionals but no dynamic optional type.

CLASSWORK
We are going to explore the use of optionals in Swift, Java, and JavaScript.

Reducing Complexity

Complexity is an enemy of security. If you have many moving parts, many interacting subsystems, many computation paths through a function (too many ifs and switches and loops), really clever dense code (that you don’t understand immediately), instances of duplicate code, functions with way too many parameters, or you accept massive or intricate inputs, you have more ways for something to go wrong, not to mention a bigger attack surface. Readability and understandability and maintainability are crucial! if you can’t read, understand, or maintain your code, how can you argue it is secure? How can you even go about fixing it if something goes wrong?

Exercise: Draw a State Diagram for an Order with all of the above states.
Exercise: Do an image search on the web for “state diagram” to get a sense of how to think in terms of states.

There are ways to measure the complexity of code. One way is cyclomatic complexity. Give the Wikipedia article on it a read. How can we reduce this kind of complexity? The short answer appears to be “less if-statements”! How do we do that?

Answer: Think in terms of state diagrams.

Thinking in terms of state

Too many if-statements in various places throughout the code is generally not a good sign. This code is horrifying:

def add_item_to_order(item, order):
    if order.get_status() == "SHIPPED":  # (1) WAIT, is this the right check?
        raise OrderAlreadyShippedError()
    else:
        order.add_item(item)             # (2) Also, did we miss anything?

Ugh! First of all, this “service method” is asking about the order’s status then trying to apply business logic. Did we get all the right checks in place? What is our order rules change? Did we forget any other checks? These checks belong in the Order class, where all of the knowledge of how orders behave are kept, or, if you require a multithreaded environment with immutable order snapshots so all writes are on a single thread, you can encapsulate business logic in a special order service.

But orders can be in so many states: new, open, paid, payment_rejected, cancelled, shipping, delivered, lost, returned, credited_after_return, etc. How can we handle all these states while minimizing the risk of getting things wrong?

If you have too many checks, create a state diagram, giving a name to each state, and include outgoing transitions only for the legal operations in each state. You might end up making a state object with all the transitions. Here’s a description of the famous State Design Pattern in Java.

Subclasses

The state pattern is great when there are more than a couple states an entity can go between. But sometimes an object’s state is inherent. For example, a shape object may be a circle or a rectangle or a polygon and never change. In this case, subclasses defining their own method implementations are often much cleaner and less error-prone than if-statements using run-time type checking code.

Clean Code

There’s another angle to “simplicity.” We want our code to be easy to read, otherwise it is confusing and hence complex. If readers and coworkers don’t understand the code, or are confused by its lies, ambiguities, and sloppy structure, exploitable flaws might be introduced. Therefore, good programming practice is demanded for security reasons. Do all the good things, including:

These items were taken from this larger list which is worth a look, even though it comes from a book that is opinionated, controversial, and not intended to be complete.

Many language-specific style guides define rules and heuristics for clean code, and many static analyzers enforce these rules.

Summary

We’ve covered:

  • Goals of Secure Programming
  • Secure Programming Best Practices
  • Immutability
  • Design by Contract
  • Validation
  • Secure Error Handling
  • Reducing Complexity