Security and Testing

We have to consider security throughout the software development process, and testing is part of the process.

Testing Goals

There’s a lot of literature out there about testing correctness requirements. But what about testing security requirements? Seems harder because there are an infinite number of ways to fail.

But there are some strategies:

Design attack trees and build automated tests that traverse all the attacks in the tree
Maintain a list of risks (remember that each are marked with likelihood and severity) together with mitigations for each

    Risk            │ Likelihood │ Severity │ Mitigations
    ────────────────┼────────────┼──────────┼───────────────────
                    │            │          │
                    │            │          │
                    │            │          │

Always test with security in mind. Look for defects, flaws, bugs, vulnerabilities, weaknesses.

Static Analysis

We often think of testing in terms of running the code (against a comprehensive test suite) and checking that it does what it is supposed do (it is correct) and does not do what it is not supposed to (it is secure).

Running the code checks its dynamic properties. But there is value in just looking at the code before running it, that is, checking its static properties. These code checks are also a kind of testing. A lot of flaws (and therefore exploitable weaknesses) can be found just by analyzing the source code.

Type Checkers

The simplest kind of static analysis is ensuring the type safety of all expressions. For some languages, this is mostly built in. For many languages, a type checker can examine the code an report on possible errors.

Static Types vs. Dynamic Types

An expression’s static type is its type at compile time, tracked by a type checker. Its dynamic type is the type tracked and available at runtime. These can differ! In TypeScript, for example, these are the static types:

(every value gives rise to a type containing only that value)
bool
number
bigint
symbol
string
$T\,[]$
$[T_1,\:...,\:T_n]$
$T_1\:|\:T_2$
$T_1\:\&\:T_2$
$(T_1,\:...,\:T_n) \rightarrow T_0$
(any interface or class defines a static type)
$\{ x_1:T_1;\; ... ;\; x_n:T_n \}$
unknown
void
never

But there are only 8 dynamic types:

Undefined
Null
Boolean
Number
BigInt
Symbol
String
Object

Types in “Static” Languages

Languages such as Java, Rust, Go, Swift, C#, Haskell, Standard ML, and OCaml are called “static languages” because so many language rules are enforced at compile-time (including type consistency) meaning if the checks don’t pass, and the types don’t match up, the program does not even get to run. In a “dynamic language” many language rules, including typechecking happen at run time.

The static (compile-time) and dynamic (run-time) types in “static” languages are almost always the same.

By the way, just because a language is static does not mean ALL type checking is done at compile-time. It may defer SOME typechecking to run-time. For example, in Java, the following can lead to run-time typechecking and therefore the possibility of run-time type errors:

Explicit casting
instanceof
Arrays (yep, believe it or not!)

Exercise: Research Java’s ArrayStoreException. How in the heck did the language designer’s allow array element types to get past the compile-time typechecker?

Type Annotations and Gradual Typing

Many languages are dynamically typed — they do all typechecking at run-time. Python, Julia, and TypeScript are examples of these. However, these languages also allow type hints, or type annotations that allow a compile-time typechecker to analyze the source code for type errors. If the checks don’t pass, you just get a warning, and the program stll gets to run.

The cool thing is you don’t have to annotate everything in these languages: you only add the type hints you want and leave others off. You can start off your code with no hints and gradually add as many hints as you like as development proceeds. Hence the term gradual typing.

TODO Example in Python

TODO Example in Julia

TODO Example in JavaScript (TypeScript)

Linters

A linter is a tool that can perform hundreds of static checks on your code that can enforce good lexical style (formatting), but also good naming and good programming practices (e.g., not having too many parameters) that raise the likelihood of programming defects. Here are links to lists of the checks made by a number of popular linters:

The Rules of ESLint
For pylint, run pylint --list-msgs
The Java Rules in PMD

Static Analyzers

Static Code Analysis Tools are basically linters that do some extra extra things, like producing reports such as dependency graphs, various metrics, and well, just about anything you can think of.

Guess what? Here is a list of 100 Java Static Analysis Tools. 😮 And several dozen for Python. 😮

Many of these tools are multi-lingual. Check out this large catalog.

Code Reviews

A code review is a manual check of source code by a person or group of people, not including the author the code. Code reviews are done in addition to the automated checks performed by static analysis tools, because the more checking the better. Manual code reviews are also important for knowledge transfer, mentoring, and shared responsibility in a team.

Besides humans sometimes find things that automated tools might not (though the other way around is more common, of course).

Start with the Wikipedia article then search the web for more resources, including best practices and standards for code reviews. An excellent resource is the OWASP Code Review Guide.

No matter how good of a programmer you are, you won’t get everything right:

code-review-wtf

But don’t be like Ponytail:

xkcd code review comic

Types of Software Testing

There are many dimensions of testing.

How much of a system are you testing?

Unit Testing helps ensure individual components function as advertised. Must be extremely fast and never interact with filesystem, databases, networks, and so on—all external systems must be mocked.
Integration Testing helps ensure components can communicate with each other properly. Okay to interact with filesystem, databases, and networks.
System Testing (sometimes called Acceptance Testing) helps ensure a fully-built system has no defects from a user’s point of view. Usually done on a production-like environment. If automated, it will script or mock the GUI.

What are you testing?

Performance Testing checks to see if the system runs within specified temporal and spatial constraints.
Regression Testing checks to see if the latest fixes and enhancements broke something that used to work.
Stress Testing sees how the system holds up under excessive load, or when run in an environment without reasonable resources.

Do you care about implementation or just the functional results?

Black Box Testing looks only at inputs and outputs and makes sure the outputs are expected, caring nothing at all for what goes on in between.
Whitebox Testing “exercises” an implementation attempting to get complete “code coverage” (access to source code is needed for this).

When testing for security, there are specific strategies:

Fuzz Testing
Penetration Testing

Fuzz Testing

Fuzzing is a testing technique that uses a fuzzer to automatically generate test cases that primarily try to find vulnerabilities or crashes. A fuzzer will make inputs that are valid, invalid, partially-valid, and even completely random, with the goal of finding memory leaks, overflows, injection opportunities, violations of preconditions, violations of postconditions, invariant breaks, and other such things. Fuzzers might know about the source code structure and specifically execute edge cases.

Fuzz Testing article at Wikipedia.

Penetration Testing

A pentest is an authorized set of attacks by an ally to help assess weaknesses and risks in a system so they can be patched by the system’s owner. Pentests can be blackbox or whitebox. They typically involve the usual steps that adversarial attackers go through.

Pen Testing article at Wikipedia

Summary

We’ve covered:

Testing Goals
Static Analysis
Code Reviews
Types of Testing
Symbolic Execution
Fuzz Testing
Penetration Testing