Subroutine Implementation

What are some of the ways we can implement subroutines?

Activation Records (Stack Frames)

Each invocation of a subroutine has a record associated with it at runtime, holding some or all of the following:

Arguments (parameters)
A link to the caller's stack frame (the dynamic link)
In a language with nested subroutines, a mechanism for accessing non-locals, which could be one of:
- A static link, or
- A pointer to the most recent frame at this level, or
- The whole display
The return address
Local variables
Copies of callee-save registers, if needed
Temporaries
Spilled registers

Note that some of the stack frame may be in registers and some of it may be in memory.

Activation records are also called stack frames. When the runtime supports closures or concurrency, something like a spaghetti stack is required.

Calling Conventions

Calling conventions specify:

Order of evaluation of arguments (left-to-right, right-to-left, in parallel, unspecified...)
Where the arguments go (run-time stack, registers, ...)
Which registers are allowed for arguments, which registers are to be saved by the caller and which by the callee
Whether the caller or callee has to clean up the stack

Example: C on the IA-32

The C-calling convention on the IA-32 is:

Arguments are evaluated right to left and pushed on the stack
Callee must save registers ebx, ebp, esi, edi if it uses them
Caller must save every other register that it wants preserved
Return value goes in eax, edx:eax, or st[0]
Register ebp is used to point to the previous stack frame, so the old value of ebp is (almost) always saved.
Space for local variables in the callee is allocated above the saved ebp.

int example(int x, int y) {
  int a, b, c;
  ...
  f(y, a, b, b, x);
  ...
}

The prolog/epilog for this function would look like:

    push    ebp         ; must save old ebp
    mov ebp, esp        ; point ebp to this frame
    sub esp, ___        ; make space for locals
    ...
    mov esp, ebp        ; clean up locals
    pop ebp         ; restore old ebp
    ret

with stack frame

                +---------+
         ebp-12 |    a    |
                +---------+
         ebp-8  |    b    |
                +---------+
         ebp-4  |    c    |
                +---------+
         ebp    | old ebp |
                +---------+
         ebp+4  | retaddr |
                +---------+
         ebp+8  |    x    |
                +---------+
         ebp+12 |    y    |
                +---------+

Example: SPARC

The SPARC uses register windows

! ----------------------------------------------------------------------------
! writeint
!
! A small program that contains a function to write an unsigned integer value
! to standard output.
! ----------------------------------------------------------------------------

        .global _start

        .text

! writeint(unsigned int n) writes n to standard output as text.  It is
! inefficient since it is recursive and invokes a system call to write
! each digit.
!
! It is equivalent to the C function
!
!   void writeint(unsigned n) {
!       if (n >= 10) writeint(n / 10);
!       putchar(n + '0');
!   }

writeint:
    save    %sp, -96, %sp

    mov 10, %l0         ! Keep 10 in %l0
    cmp %i0, %l0        ! Is it less than 10?
    bl  writedigit              ! If so just write the digit
    nop
    mov %g0, %y
    udiv    %i0, %l0, %l1       ! n / 10 --> %l1
    umul    %l1, %l0, %l2       ! n * (n/10) --> %l2
    sub %i0, %l2, %l2       ! n mod 10 --> %l2

    call    writeint
    mov %l1, %o0        ! call writeint(n / 10)

    mov %l2, %i0        ! Put digit into %i0 before writing

writedigit:             ! Writes the digit in %i0
    add 48, %i0, %l1        ! Character value of the digit
    set digit, %o1
    stb %l1, [%o1]      ! Must be in memory to write
    mov 4, %g1          ! System call write
    mov 1, %o0          ! stdout
    mov 1, %o2          ! number of characters to write
    ta  0

    ret
    restore

! A starting point for the program that just makes a couple of calls
! to the writeint function.

_start:
    call    writeint        ! writeint(5)
    mov 5, %o0

    call    writeint        ! writeint(1312)
    mov 1312, %o0

        sethi   %hi(7308841), %o0   ! writeint(7308841)
    call    writeint
    or  %o0, %lo(7308841), %o0

    mov 1, %g1
    ta  0

    .data
    .align  4
digit:  .skip   4

Leaf Subroutines

A leaf subroutine makes no calls. A good compiler can generate code to make a leaf subroutine's activation use the stack frame of its caller. The SPARC has direct support for this.

! write_stars(n) writes n stars followed by a newline

write_stars:
    mov %o0, %l0        ! save the argument in %l0
    mov 4, %g1          ! syscall write
    mov 1, %o0          ! stdout
    set star, %o1       ! message to write
    mov 1, %o2          ! one character in message
write_star:
    tst %l0             ! no more characters to write
    be  write_newline   ! write terminating newline if so
    nop
    ta  0
    ba  write_star      ! go see if more to write
    dec %l0             ! count off one more char as written
write_newline:
    set newline, %o1
    retl
    ta  0               ! write the newline (before returning)
star:
    .ascii  "*"
newline:
    .ascii  "\n"

Inlining

A compiler inlines a subroutine call when it replaces the call with the body of the called subroutine (with all argument-parameter bindings resolved, of course). If all calls are inlined, there is no need to generate separate code for the subroutine. Inlining should be faster in general, since there is no call/return overhead (which is substantial for short subroutines). But if there are thousands of such calls you can end up with more compiled code.

And compilers have to be careful when inlining recursive subroutines....

Non Local Lexical Variables

Given

function f(a, b) {
    var x = 1;

    function g(a, y) {
        var z = 3;

        function h(z) {
            print x;
        }

        function k(c) {
            var x = c;
            if (x < 5) k(c+1); else h(1);
        }

        if (a == 0) g(1, 0); else k(2);
    }

    g(0, 0);
}

The call sequence goes f → g → g → k → k → k → k → h. What gets printed? It depends on whether we have static or dynamic scoping.

If dynamically scoped, use a-lists or central reference tables.

If statically scoped, use static links (or displays). In this case we rely on the compiler to generate code that says:

"To find x from the body of k, take two steps down the static chain and look in offset -4".