Tuple-Based Intermediate Representations

Instead of a graph for the intermediate representation, you can create some flat code that looks virtual machine like. Tuples fit the bill.

What are Tuples?

Tuples are instruction-like entities consisting of an operator and zero to three arguments. Arguments are literals, variables, temporaries, subroutines, or labels.

Typical tuples are:

Tuple Rendered as... Description
(COPY,x,y) y := x Simple copy
(COPY_FROM_DEREF,x,y) y := *x Copy the contents of memory at address x into y
(COPY_TO_DEREF,x,y) *y := x Copy x into the memory at address y
(COPY_FROM_OFS,x,y,z) z := *(x+y) Copy the contents of memory at address x+y into z
(COPY_TO_OFS,x,y,z) *(y+z) := x Copy x into the memory at address y+z
(ADD,x,y,z) z := x + y Sum
(SUB,x,y,z) z := x - y Difference
(MUL,x,y,z) z := x * y Product
(DIV,x,y,z) z := x / y Quotient
(MOD,x,y,z) z := x mod y Modulo
(REM,x,y,z) z := x rem y Remainder
(POWER,x,y,z) z := pow x, y Exponentiation
(SHL,x,y,z) z := x << y Left shift
(SHR,x,y,z) z := x >> y Logical right shift
(SAR,x,y,z) z := x >>> y Arithmetic right shift
(AND,x,y,z) z := x & y Bitwise conjunction
(OR,x,y,z) z := x | y Bitwise disjunction
(XOR,x,y,z) z := x xor y Bitwise exclusive or
(NOT,x,y) y := !x Logical complement
(NEG,x,y) y := -x Negation
(COMP,x,y) y := ~x Bitwise complement
(ABS,x,y) y := abs x Absolute value
(SIN,x,y) y := sin x Sine
(COS,x,y) y := cos x Cosine
(ATAN,x,y,z) z := atan x,y Arctangent
(LN,x,y) y := ln x Natural logarithm
(SQRT,x,y) y := sqrt x Square root
(INC x) inc x Increment
(DEC,x) dec x Decrement
(INC_DEREF x) inc *x Increment a memory location given its address
(DEC_DEREF,x) dec *x Decrement a memory location given its address
(LT,x,y,z) z := x < y 1 if x is less than y,0 otherwise
(LE,x,y,z) z := x <= y 1 if x is less than or equal to y; 0 otherwise
(EQ,x,y,z) z := x == y 1 if x is equal to y; 0 otherwise
(NE,x,y,z) z := x != y 1 if x is not equal to y; 0 otherwise
(GE,x,y,z) z := x >= y 1 if x is greater than or equal to y; 0 otherwise
(GT,x,y,z) z := x > y 1 if x is greater than y; 0 otherwise
(LABEL,L) L: Label
(JUMP,L) goto L Unconditional jump to a label
(JZERO,x,L) if x == 0 goto L Jump if zero / Jump if false
(JNZERO,x,L) if x != 0 goto L Jump if zero / Jump if true
(JLT,x,y,L) if x < y goto L Jump if less / Jump if not greater or equal
(JLE,x,y,L) if x <= y goto L Jump if less or equal / Jump if not greater
(JEQ,x,y,L) if x == y goto L Jump if equal
(JNE,x,y,L) if x != y goto L Jump if not equal
(JGE,x,y,L) if x >= y goto L Jump if greater or equal / Jump if not less
(JGT,x,y,L) if x > y goto L Jump if greater / Jump if not less
(IJE,x,y,L) if ++x == y goto L Increment and jump if equal. Useful at end of for-loops that count up to a given value.
(DJNZ,x,L) if --x != 0 goto L Decrement and jump if not zero. Great for loops that count down to zero.
(STARTCALL,f) startcall f Begin the call to function f: a STARTCALL tuple should be followed by zero or more PARAM tuples and then a CALLP or CALLF tuple. This operation is provided for languages that allow nested functions and normally pass static links before parameters.
(PARAM,x,p) param x, p Pass x as an argument to parameter p
(CALLP,f) call f Call procedure (non-value returning function) f
(CALLF,f,x) x := call f Call function f, catching return value in x
(RETP) ret Return from procedure
(RETF,x) ret x Return x from this function
(INT_TO_STRING,x,y) y := to_string x String representation of an integer (one can add a new tuple to take a base)
(FLOAT_TO_STRING,x,y) y := to_string x String representation of a float (one can add a new tuple taking a format string)
(BOOL_TO_STRING,x,y) y := to_string x String representation of a boolean (localized?)
(CHAR_TO_STRING,x,y) y := to_string x String representation of a character
(ALLOC,x,y) y := alloc x Allocate x bytes of memory, returning the address in y (or 0 if the memory could not be allocated).
(TO_FLOAT,x,y) y := to_float x Integer to real
(NULL_CHECK,x) assert x != 0 Do something if x is null
(ASSERT_POSITIVE,x) assert x > 0 Do something if x is not positive
(BOUND,x,y,z) assert y <= x < z Do something if x is not between y (inclusive) and x (inclusive)
(NO_OP) nop Do nothing
(EXIT) exit terminate the program

Different authors of tuple systems will make different design choices. For example, what happens if an assertion fails? That depends on the source language; therefore, a given translator will fill in these details. How big is an integer, or a float? How are characters encoded? Also left up to the implementation.

Tuple Arguments

Tuple arguments are generally low-level entities, such as:

By non-structured we mean items that fit into a single fixed-size machine unit. Strings, arrays, and structs are always referenced through a pointer.

It is not necessary to specify the kinds or sizes of the value data types. Integers and floating-point values are generally assumed, but a system will generally allow for a several different types of various sizes of these. A specific translation would define these types, and a back end would arrange for type conversions. For example:

    (ADD, x, i, y)

where x and y are floating point and i is an integer, is an allowable tuple, and would be turned into backend code that explicitly or implicitly converts i to a floating-point value (of the right size) before doing a floating-point addition.

Similarly, a particular translation may include unsigned integer types (as one would in C). A backend has to take care of these cases. For example, with an x86 backend, a JGE tuple would become a jge instruction with signed arguments, but a jae with unsigned arguments.

In practice, tuple arguments can sometimes take arguments that represent sizes in bytes. It is up to a specific translation to compute these values.

Literals

In tuple-base representations, source language features like enumeration, character, and boolean literals should be turned into simple integer literals by a compiler front-end. Ditto for arrays and structs: tuples generally maniuplate individual word-like entities; large structures are flattened into smaller things.

Variables

Variables represent run-time storage locations. It’s not important, really. if these locations are in static memory, in a stack frame, or the heap, but it’s good to keep track of the static nesting level for a variable, as this is useful for languages with arbitrarily nested subroutines. In languages without subroutine nesting, such as C, global variables would have level 0 and all others would have level 1.

You might often see two types of variables, integer variables and floating point variables, since most hardware makes the same distinction.

Temporaries

Temporaries differ from variables in that variables come from source language variables that are actually declared, while temporaries are the result of evaluating expressions. For example, if a source language featured the expression

    x + y * z

the corresponding tuple sequence would be:

   (MUL, y, z, r0)
   (ADD, x, r0, r1)

where r0 and r1 are temporaries. Temporaries come in two flavors: value temporaries and address temporaries. An address temporary would arise from evaluating an expression like:

   a[i].x

since the address of the corresponding storage location has to be computed at run-time (assuming the value of i is not known at compile-time).

Value temporaries can be either integer or floating-point, while address temporaries are always themselves integer-valued, since their value is an address, but internally they track whether they are referencing an integer or floating-point object.

Subroutines

It might be helpful to distinguish between subroutines defined in the current translation unit (user subroutines) and those defined elsewhere (external subroutines).

User Subroutines

A user subroutine denotes a function in the current translation unit. It contains the following properties:

External Subroutines

External subroutines represent top-level functions callable by the current translation unit but defined elsewhere. They are known only by name. Example:

print(x+4);
    add i0, 4, r3
    param r3
    param s0
    call __print

Labels

Labels are used to hold the structured control graph in a linearized list of tuples.

while (x < 3) {
    y = 10;
    x = x * 5;
}
L1:
    if i0 >= 3 goto L2
    i1 := 10
    r3 := i0 * 5
    i0 := r3
    goto L1
L2:

Comprehensive Examples

A simple example:

string f(real i) {
    return "dog" unless i >= 3.3;
}
string test = f(cos(2.2));
p0:
  ; level=0 parent=null params=[] locals=[i0]
  f0 := cos 2.2
  startcall p1
  f1 := cos 2.2
  param f1
  r1 := call p1
  i0 := r1
  exit
p1:
  ; level=1 parent=p0 params=[x1] locals=[]
  r0 := x1 >= 3.3
  if r0 != 0 goto L0
  ret s0
L0:
  call __terminate_thread
s0:
  [100, 111, 103]

An example with heap-allocated arrays and structures:

int[] a;
a[6] = 20;
real x = a[80];
a = new int[]{4, 2, 1};
struct Point {
  int x; int y;
}
Point[] polygon;
print(polygon[5].y);
p0:
  ; level=0 parent=null params=[] locals=[i0, x1, i1]
  assert i0 != 0
  r0 := *(i0+-4)
  assert 0 <= 6 < r0
  r1 := 6 * 4
  r2 := i0 + r1
  *r2 := 20
  assert i0 != 0
  r3 := *(i0+-4)
  assert 0 <= 80 < r3
  r4 := 80 * 4
  r5 := i0 + r4
  r6 := *r5
  x1 := r6
  r7 := alloc 16
  *r7 := 3
  r7 := r7 + 4
  *(r7+0) := 4
  *(r7+4) := 2
  *(r7+8) := 1
  i0 := r7
  assert i1 != 0
  r8 := *(i1+-4)
  assert 0 <= 5 < r8
  r9 := 5 * 4
  r10 := i1 + r9
  r11 := *r10
  assert r11 != 0
  r12 := r11 + 4
  r14 := *r12
  r13 := int_to_str r14
  param r13
  call __print
  exit

An example with threads:

void f() {
    for (int i = 0; i < 20000; i++) {
        print (i + " ");
    }
}
thread t1 = start f();
thread t2 = start f();
join(t1);
join(t2);
p0:
  ; level=0 parent=null params=[] locals=[i1, i2]
  param 0
  param p1
  r3 := call __create_thread
  i1 := r3
  param 0
  param p1
  r4 := call __create_thread
  i2 := r4
  param i1
  call __join
  param i2
  call __join
  exit
p1:
  ; level=1 parent=p0 params=[] locals=[i0]
  i0 := 0
L0:
  if i0 >= 20000 goto L1
  r1 := int_to_str i0
  param s0
  param r1
  r2 := call __concat_string
  param r2
  call __print
  inc i0
  goto L0
L1:
  ret
s0:
  [32]