Squid

Squid is a little IR I made up myself.

Introduction

Squid stands for Simple Quadruple-based Intermediate Description. It is a fairly useful and extensible intermediate language for imperative, block-structured languages.

A Squid description consists of a collection of subroutines, each of which is either an internal subroutine or an external subroutine. A subroutine has a list of tuples. Tuples consist of an operator and zero to three arguments. Arguments are literals, variables, temporaries, subroutines, or labels.

Tuples

The set of Squid tuples is not fixed. Anyone can add more if desired. The core set is as follows:

Tuple Rendered as... Description
(COPY,x,y) y := x Simple copy
(COPY_FROM_DEREF,x,y) y := *x Copy the contents of memory at address x into y
(COPY_TO_DEREF,x,y) *y := x Copy x into the memory at address y
(COPY_FROM_OFS,x,y,z) z := *(x+y) Copy the contents of memory at address x+y into z
(COPY_TO_OFS,x,y,z) *(y+z) := x Copy x into the memory at address y+z
(ADD,x,y,z) z := x + y Sum
(SUB,x,y,z) z := x - y Difference
(MUL,x,y,z) z := x * y Product
(DIV,x,y,z) z := x / y Quotient
(MOD,x,y,z) z := x mod y Modulo
(REM,x,y,z) z := x rem y Remainder
(POWER,x,y,z) z := pow x, y Exponentiation
(SHL,x,y,z) z := x << y Left shift
(SHR,x,y,z) z := x >> y Logical right shift
(SAR,x,y,z) z := x >>> y Arithmetic right shift
(AND,x,y,z) z := x & y Bitwise conjunction
(OR,x,y,z) z := x | y Bitwise disjunction
(XOR,x,y,z) z := x xor y Bitwise exclusive or
(NOT,x,y) y := !x Logical complement
(NEG,x,y) y := -x Negation
(COMP,x,y) y := ~x Bitwise complement
(ABS,x,y) y := abs x Absolute value
(SIN,x,y) y := sin x Sine
(COS,x,y) y := cos x Cosine
(ATAN,x,y,z) z := atan x,y Arctangent
(LN,x,y) y := ln x Natural logarithm
(SQRT,x,y) y := sqrt x Square root
(INC x) inc x Increment
(DEC,x) dec x Decrement
(INC_DEREF x) inc *x Increment a memory location given its address
(DEC_DEREF,x) dec *x Decrement a memory location given its address
(LT,x,y,z) z := x < y 1 if x is less than y,0 otherwise
(LE,x,y,z) z := x <= y 1 if x is less than or equal to y; 0 otherwise
(EQ,x,y,z) z := x == y 1 if x is equal to y; 0 otherwise
(NE,x,y,z) z := x != y 1 if x is not equal to y; 0 otherwise
(GE,x,y,z) z := x >= y 1 if x is greater than or equal to y; 0 otherwise
(GT,x,y,z) z := x > y 1 if x is greater than y; 0 otherwise
(LABEL,L) L: Label
(JUMP,L) goto L Unconditional jump to a label
(JZERO,x,L) if x == 0 goto L Jump if zero / Jump if false
(JNZERO,x,L) if x != 0 goto L Jump if zero / Jump if true
(JLT,x,y,L) if x < y goto L Jump if less / Jump if not greater or equal
(JLE,x,y,L) if x <= y goto L Jump if less or equal / Jump if not greater
(JEQ,x,y,L) if x == y goto L Jump if equal
(JNE,x,y,L) if x != y goto L Jump if not equal
(JGE,x,y,L) if x >= y goto L Jump if greater or equal / Jump if not less
(JGT,x,y,L) if x > y goto L Jump if greater / Jump if not less
(IJE,x,y,L) if ++x == y goto L Increment and jump if equal. Useful at end of for-loops that count up to a given value.
(DJNZ,x,L) if --x != 0 goto L Decrement and jump if not zero. Great for loops that count down to zero.
(STARTCALL,f) startcall f Begin the call to function f: a STARTCALL tuple should be followed by zero or more PARAM tuples and then a CALLP or CALLF tuple. This operation is provided for languages that allow nested functions and normally pass static links before parameters.
(PARAM,x,p) param x, p Pass x as an argument to parameter p
(CALLP,f) call f Call procedure (non-value returning function) f
(CALLF,f,x) x := call f Call function f, catching return value in x
(RETP) ret Return from procedure
(RETF,x) ret x Return x from this function
(INT_TO_STRING,x,y) y := to_string x String representation of an integer (one can add a new tuple to take a base)
(FLOAT_TO_STRING,x,y) y := to_string x String representation of a float (one can add a new tuple taking a format string)
(BOOL_TO_STRING,x,y) y := to_string x String representation of a boolean (localized?)
(CHAR_TO_STRING,x,y) y := to_string x String representation of a character
(ALLOC,x,y) y := alloc x Allocate x bytes of memory, returning the address in y (or 0 if the memory could not be allocated).
(TO_FLOAT,x,y) y := to_float x Integer to real
(NULL_CHECK,x) assert x != 0 Do something if x is null
(ASSERT_POSITIVE,x) assert x > 0 Do something if x is not positive
(BOUND,x,y,z) assert y <= x < z Do something if x is not between y (inclusive) and x (inclusive)
(NO_OP) nop Do nothing
(EXIT) exit terminate the program

Squid is designed to be pretty flexible, so some aspects of this definition are left unspecified. For example, what happens if an assertion fails? That depends on the source language; therefore, a given translator will fill in these details. How big is an integer, or a float? How are characters encoded? Also left up to the implementation.

Tuple Arguments

Tuple arguments in Squid are limited to:

By non-structured we mean items that fit into a single fixed-size machine unit. Strings, arrays, and structs are always referenced through a pointer.

Squid does not specify the kinds or sizes of the value data types. It assumes, of course, integers and floating-point values, but allows for a several different types of various sizes of these. A specific translation would define these types, and a back end would arrange for type conversions. For example:

    (ADD, x, i, y)

where x and y are floating point and i is an integer, is an allowable tuple, and would be turned into backend code that explicitly or implicitly convertes i to a floating-point value (of the right size) before doing a floating-point addition.

Similarly, a particular translation may include unsigned integer types (as one would in C). A backend has to take care of these cases. For example, with an x86 backend, a JGE tuple would become a jge instruction with signed arguments, but a jae with unsigned arguments.

Because the set of data types is not specified in Squid, many tuples take arguments that represent sizes in bytes. It is up to a specific translation to compute these values.

Literals

Squid has integer, floating-point, and string literals. Enumeration, character, and boolean literals in a source language should be turned into simple integer literals by a front-end.

Squid has no other literals, such as for arrays or structs.

Variables

Squid variables represent run-time storage locations. Squid does not care if these locations are in static memory, in a stack frame, or the heap, but it does keep track of the static nesting level for a variable, which is useful for languages with arbitrarily nested subroutines. In languages without subroutine nesting, such as C, global varaibles would have level 0 and all others would have level 1.

Squid allows two types of variables, integer variables and floating point variables, since most hardware makes the same distinction. As of right now, these are the only two variable types. Any further interpretation is up to the compiler front or back end using Squid.

Temporaries

Temporaries differ from variables in that variables come from source language variables that are actually declared, while temporaries are the result of evaluating expressions. For example, if a source language featured the expression

    x + y * z

the corresponding Squid tuple sequence would be:

   (MUL, y, z, r0)
   (ADD, x, r0, r1)

where r0 and r1 are temporaries. Temporaries come in two flavors: value temporaries and address temporaries. An address temporary would arise from evaluating an expression like:

   a[i].x

since the address of the corresponding storage location has to be computed at run-time (assuming the value of i is not known at compile-time).

Value temporaries can be either integer or floating-point, while address temporaries are always themselves integer-valued, since their value is an address, but internally they track whether they are referencing an integer or floating-point object.

Subroutines

Squid distinguishes between subroutines defined in the current translation unit (user subroutines) and those defined elsewhere (external subroutines).

User Subroutines

A user subroutine denotes a function in the current translation unit. It contains the following properties:

External Subroutines

External subroutines represent top-level functions callable by the current translation unit but defined elsewhere. They are known only by name. Example:

print(x+4);
    add i0, 4, r3
    param r3
    param s0
    call __print

Labels

Labels are used to hold the structured control graph in a linearized list of tuples.

while (x < 3) {
    y = 10;
    x = x * 5;
}
L1:
    if i0 >= 3 goto L2
    i1 := 10
    r3 := i0 * 5
    i0 := r3
    goto L1
L2:

Comprehensive Examples

A simple example:

string f(real i) {
    return "dog" unless i >= 3.3;
}
string test = f(cos(2.2));
p0:
  ; level=0 parent=null params=[] locals=[i0]
  f0 := cos 2.2
  startcall p1
  f1 := cos 2.2
  param f1
  r1 := call p1
  i0 := r1
  exit
p1:
  ; level=1 parent=p0 params=[x1] locals=[]
  r0 := x1 >= 3.3
  if r0 != 0 goto L0
  ret s0
L0:
  call __terminate_thread
s0:
  [100, 111, 103]

An example with heap-allocated arrays and structures:

int[] a;
a[6] = 20;
real x = a[80];
a = new int[]{4, 2, 1};
struct Point {
  int x; int y;
}
Point[] polygon;
print(polygon[5].y);
p0:
  ; level=0 parent=null params=[] locals=[i0, x1, i1]
  assert i0 != 0
  r0 := *(i0+-4)
  assert 0 <= 6 < r0
  r1 := 6 * 4
  r2 := i0 + r1
  *r2 := 20
  assert i0 != 0
  r3 := *(i0+-4)
  assert 0 <= 80 < r3
  r4 := 80 * 4
  r5 := i0 + r4
  r6 := *r5
  x1 := r6
  r7 := alloc 16
  *r7 := 3
  r7 := r7 + 4
  *(r7+0) := 4
  *(r7+4) := 2
  *(r7+8) := 1
  i0 := r7
  assert i1 != 0
  r8 := *(i1+-4)
  assert 0 <= 5 < r8
  r9 := 5 * 4
  r10 := i1 + r9
  r11 := *r10
  assert r11 != 0
  r12 := r11 + 4
  r14 := *r12
  r13 := int_to_str r14
  param r13
  call __print
  exit

An example with threads:

void f() {
    for (int i = 0; i < 20000; i++) {
        print (i + " ");
    }
}
thread t1 = start f();
thread t2 = start f();
join(t1);
join(t2);
p0:
  ; level=0 parent=null params=[] locals=[i1, i2]
  param 0
  param p1
  r3 := call __create_thread
  i1 := r3
  param 0
  param p1
  r4 := call __create_thread
  i2 := r4
  param i1
  call __join
  param i2
  call __join
  exit
p1:
  ; level=1 parent=p0 params=[] locals=[i0]
  i0 := 0
L0:
  if i0 >= 20000 goto L1
  r1 := int_to_str i0
  param s0
  param r1
  r2 := call __concat_string
  param r2
  call __print
  inc i0
  goto L0
L1:
  ret
s0:
  [32]