LMU ☀️ CMSI 2210
COMPUTER SYSTEMS ORGANIZATION
Practice
  1. Numeric Encoding
    1. You know that an 8-bit register can represent signed decimal values from -128 to 127, and a 32-bit register can hold values from -2147483648 to 2147483647.
      1. What about a 24-bit register?
      2. What about a 2-bit register?
      3. What about an 128-bit register?
    2. Assuming 8-bit words:
      1. Express -35 in hex
      2. Express D3 as a signed decimal value
      3. Express D3 as an unsigned decimal value
      4. Express 0 in hex
    3. Assume 16-bit signed words x=8000, y=0010, z=1000, w=FFFE. Compute:
      1. -x
      2. -y
      3. -z
      4. -w
    4. Explain how to tell, just by looking at a 8-digit hexadecimal number (representing a 32-bit signed word), whether or not the value is negative.
    5. Prove that in signed modular addition overflow occurs if and only if the carry into the most significant bit differs from the carry out of the most significant bit.
    6. Do the following 16-bit sums, assuming signed arithmetic, for both saturated and modular addition. Indicate for each whether the modular arithmetic produced a carry and/or an overflow.
        Sum, Saturated Sum, Modular Carry (y/n) Overflow (y/n)
      FEF3 + 99AA    
      0007 + 7FFE    
      8FFF + 20E0    
      8000 + 8000    
    7. Give the single precision IEEE-754 encoding of -365.25 and explain your answer.
    8. Give the single precision IEEE-754 encoding of 514.625 and explain your answer.
    9. Complete the following table, in which each row expresses the interpretation of a particular 32-bit value.
      Hexadecimal Signed Decimal IEEE-754 Single Precision
      (Decimal)
        49.25
        -1.5 × 2-131
      FFFFFFCC  
      0CC00000  
       100 
       -100 
  2. Character Encoding
    1. Complete the following table, in which each row gives the encoding of a particular character.
      UTF-32 UTF-16 UTF-8
      000000CA  
      000004F5  
      0000E188  
      0010E6F2  
    2. Convert the following UTF-32 encoded codepoints to UTF-8 (Express your answer in hex):
      1. 000369F5
      2. 0000E09A
      3. 00000080
  3. Logic
    1. Consider the function with three inputs (A, B, C) and two outputs (X, Y) that works like this:
           A  B  C | X  Y
          ---------+------
           0  0  0 | 0  1
           0  0  1 | 0  0
           0  1  0 | 1  0
           0  1  1 | 1  0
           1  0  0 | 0  1
           1  0  1 | 1  1
           1  1  0 | 1  0
           1  1  1 | 1  0
       

      Design two logic circuits for this function, one using AND, OR and NOT gates only, and one using NAND gates only.

    2. Draw a logic circuit that compares two 2-bit signed numbers as follows. It should have four inputs a1, a0, b1 and b0. a1a0 is a 2-bit signed number (call it a) and b1b0. is a 2-bit signed number (call it b). The circuit has one output, b, which is 1 if a > b and 0 otherwise.
  4. Basic C Programming
    1. Write a C program that writes, to standard output, the names of the 88 piano keys and their frequencies. The program needs to actually compute the values; you cannot hardcode them. Display all frequencies with four places after the decimal point.
      $ piano_keys
      A          27.5000
      A#         29.1352
      B          30.8677
      C          32.7032
      C#         34.6478
      D          36.7081
      D#         38.8909
      .
      .
      .
      A#       3729.3101
      B        3951.0664
      C        4186.0090
      
    2. Write a C program that takes a command line argument which is the name of a piano key, and writes to standard output the major and minor scales for that key.
      $ piano_scales F#
      F# major: F# G# A# B  C# D# E#
      F# minor: F# G# A  B  C# D  E
      
    3. Write a C function which takes in two "strings" (pointers to bytes in memory with a zero at the end), and returns a new string containing, in each position i, the maximum of the two characters at corresponding positions of the two input strings. If the two strings have different lengths, your result should have the length of the longer string, and all of the extra characters in the longer string will simply appear in the result at their original position.
    4. Write a C function that takes in a string s and an int k and returns a newly allocated string which is the k-fold left rotation of s. For example, perfoming this operation on "doghouse" and 3 will return "housedog". More examples:
      rotate("doghouse", 0)  ⇒  "doghouse"
      rotate("doghouse", 1)  ⇒  "oghoused"
      rotate("doghouse", 2)  ⇒  "ghousedo"
      rotate("doghouse", 3)  ⇒  "housedog"
      rotate("doghouse", 4)  ⇒  "ousedogh"
      rotate("doghouse", 5)  ⇒  "usedogho"
      
    5. Write a C function that takes in a string s and an int k and returns a newly allocated string containing sk. In case you haven't seen that notation before, here's an example: ho3 = hohoho. (Hint: Be very careful about the '\0' at the end.)
    6. Write a C function, from scratch, meaning no string library functions (you can use malloc, though), and that returns a substring of a string, given the first index (inclusive) and the last index (exclusive). Be careful about allocating the right amount of space and making sure your result ends with a zero byte. Examples:
      substring("snoopdog", 2, 6) ==> "oopd"
      substring("snoopdog", 32, 456) ==> ""
      substring("snoopdog", -5, 2) ==> "sn"
      
    7. Write a C program that takes on the command line the name of a file, and prints to standard output the index in the file of the first zero byte. By index we mean that the first byte in the file is at index 0, the second at index 1, etc. I'll start the code for you (I'm using C99 here):
      #include <stdio.h>
      int main(int argc, char** argv) {
          if (argc != 2) {
              printf("Exactly one argument required\n");
              return 1;
          }
          FILE* f = fopen(argv[1]);
          if (f == NULL) {
              printf("File does not exist\n");
              return 2;
          }
          // Okay you do the rest.....
      
  5. Basic Assembly Language Programming
    1. Give x86 logic instructions to perform the following operations on the edx register:
      1. Clear bits 8, 3, 4, 11, 20 and 16
      2. Complement bits 31 through 22
      3. Replace it with the remainder of itself divided by 64
      4. Set the middle 16 bits
      5. Zero out the low-order 16 bits
      6. Set bits 1, 3, 16, and 30
      7. Complement bits 20 through 24
      8. Change its value to 1 if it is odd or zero if it is even?
      9. Complement every odd-numbered bit
      10. Replace it with its value mod 32
      11. Replace it with the largest multiple of 512 less than or equal to itself
    2. Give x86 logic instructions to perform the following operations:
      1. Clear all even numbered bits of ECX.
      2. Set the last three bits of EAX.
      3. Replace EBX with the remainder of itself divided by 8.
      4. Move -1 into EAX.
    3. Give x86 logic operations to perform the following operations on the esi register (assuming that it contains an unsigned number).
      1. Complement the two highest order bits
      2. Replace it with its value mod 64
      3. Replace it with the largest multiple of 8 less than or equal to itself
    4. Write an assembly language program, using Linux system calls only, that writes (the glyphs for) Unicode characters 32 through 126 to standard output, 16 characters per line.
    5. Write an assembly language program, using a C library, that writes (the glyphs for) Unicode characters 32 through 126 to standard output, 16 characters per line.
    6. Write an assembly language program that displays its command line arguments in reverse order, one per line, to standard output.
    7. Write the following:
      1. An assembly language function, in its own file, that computes the GCD of its two input arguments. Assume the arguments are UNSIGNED numbers that have been pushed on the stack. Return the result in EAX. Use Euclid's algorithm, which says that
        gcd(x, y) = (y == 0) ? x : gcd(y, x mod y)
        
      2. An assembly language program that calls the gcd function with its command line arguments. It is up to you to make the program behave sensibly when presented with missing or garbage arguments.
      3. A C program that calls your gcd function with its command line arguments. It is up to you to make the program behave sensibly when presented with missing or garbage arguments.
    8. Write an assembly language function that reverses the byte order of a 4 byte integer, for example 0x3d744b26 would be turned into 0x264b743d. The function should accept a pointer to the integer to be converted. Also write a C program that calls the function on its first command line argument and writes the result to stdout.
    9. Write a complete assembly language program that takes in two integer command line arguments, x and y, and displays xy to stdout.
    10. Write the following in assembly language (use the C calling convention). It is supposed to compute a*log3(b)+c.
      double f(double a, double b, double c);
      
    11. Write the following in assembly language (use the C calling convention):
      double logBaseThreeOf(double x);
      
    12. Write, in assembly language, a function to exchange each of the characters at even numbered positions (the first position is 0) with their successors, that is intended to be called from C. The function should turn "abcdefghij" into "badcfehgji". Actually swap the characters within the string object itself; don't return a new string.
    13. Show how to multiply the contents of ecx by 5 with a single lea instruction.
    14. Write two versions of an assembly language program that takes in two command line arguments, a double-precision number x and an integer y, and displays xy to stdout. One version should use the XMM registers, the other should use the old FPU registers.
    15. Suppose in a 32-bit assembly language program you had the variable x pointing to three consecutive signed doublewords in memory. Write assembly language fragments to place in eax the median of the three values (1) using no conditional jumps, and (2) using conditional jumps.
    16. Implement the following in assembly language:
      int spfft(int a, int b, int* c, int d) {
        if (&b < c)
          return a / b / *c % d;
        else
          return d * *c;
      }
      
    17. Suppose mm0 contained 7895A2C5FF2A99AF and mm1 contained 33F390BBC34AAAC2.
      1. After paddb mm0,mm1 mm0 would be __________________________
      2. After paddw mm0,mm1 mm0 would be __________________________
      3. After paddsb mm0,mm1 mm0 would be __________________________
      4. After paddusb mm0,mm1 mm0 would be __________________________
      5. After psubsw mm0,mm1 mm0 would be __________________________
    18. Write a standalone assembly language program that writes the "output" of the CPUID instruction to standard output.
    19. Write in assembly language the function
      void replaceAllValuesWithTheirSquareRoots(float a[], int length);
      

      using the SQRTPS instruction. That is, you should do the square root computations four at a time.

    20. Write the following function in assembly language, using the PMAXUB instruction:
      void f(char* a, char* b) {
        int i;
        for (i = 0; i < 8; i++)
          if (a[i] < b[i])
              a[i] = b[i];
      }
      
    21. Implement the following function in assembly language:
      void run(char* s, int n) {
          // Executes the n bytes of machine code starting at address s,
          // then gracefully returns to the caller.  For example, if the string
          // "\x53\x66\x68\x68\x69\x89\xE1\x31\xC0\x40\x89\xC3\x40\x89\xC2\xD1
          // \xE0\xCD\x80\x44\x44\x5B" were passed in, the function should
          // write the string "hi" to standard output.
          ...
      }
      
    22. Write an assembly language fragment to move ECX bytes of data from [ESI] to [EDI] using the XMM registers to move 16 bytes at a time. If any of ECX, ESI or EDI is not a multiple of 16, make the program segfault. That'll teach those users to align their data!
    23. What is the difference if any between MOVAPS and MOVDQA?
    24. Write, in assembly language, a function that is intended to be called from C, that takes in an array of ints (and its length) and returns the maximum value in the array. Do not use any conditional jumps other than one to determine when you are "done" iterating through the array. Implement the function the easy way, with cmovg.
    25. What well-known mathematical function is this?
      mystery:  cdq
                xor     eax, edx
                sub     eax, edx
                ret
      

      Explain why it works.

    26. Write the following in assembly language (use the C calling convention). It is supposed to compute a*log10(b). Use the fyl2x and fldl2t instructions.
      double f(double a, double b);
      
    27. Explain in detail how the following computes the minimum of the two signed 32-bit numbers in %eax and %ebx, leaving the result in %ebx. (Hint: give your explanation in two parts — one in which %eax starts off less than %ebx and the other where it doesn't.)
          sub %ebx, %eax
          cltd
          and %eax, %edx
          add %edx, %ebx
      

      The cltd instruction is the same as the two instruction sequence:

          mov  %eax, %edx
          sar  $31, %edx
      
  6. Instruction Encoding and Decoding
    1. Describe why, on the x86, opcodes beginning with 80 and 82 are necessarily the same instruction.
    2. Give all possible encodings for sub eax, -5.
    3. Give all possible encodings for each of the following instructions:
      1. add esi, dword [edi+eax+8]
      2. xor esp, dword [edi+esp+8]
      3. and eax, 2
      4. sbb edi, ecx
      5. sar dword [ebx+ebp+20], 1
    4. Explain the difference between B804000000, A104000000, and C7050400000000000000.
    5. What does a Pentium III processor do when executing 0F012F?
    6. What does an "original" Pentium processor do when executing F00FC7C8? Show a C program and a complete NASM program that will cause this machine code to be executed. If possible, execute these programs on a computer with an original processor.
    7. What does this code fragment do?
          xor eax, ebx
          xor ebx, eax
          xor eax, ebx
      

      Show the machine language for it.

    8. Answer for a 32-bit x86 architecture:
      1. Why does the instruction "push 205" require 5 bytes but "push 105" require only 2?
      2. Why does the instruction "add esi, 128" require 6 bytes but "sub esi, -128" require only 3?
      3. Why does the instruction "add edi, [ebp]" require more bytes than "add edi, [esi]"? (give a general answer)
      4. Why does the instruction "lea edx, [ebx+ebx]" require less bytes than "lea edx, [ebx*2]"? (give a general answer)
    9. This code fragment is produces the same quotient as "idiv n" where n is what number? Explain.
      cdq
      shr edx, 22
      add eax, edx
      sar eax, 10
      
    10. Explain in great detail why the decoding of the IA-32 machine instruction F0F2660FC634B5C8000000FE is lock repnz shufpd xmm6, [esi*4+200], -2.
    11. Explain each byte of the following machine language instructions. That is, for each byte, state if it is a prefix (and if so, which one), opcode (and if so, which one or part of one), modrm byte (and if so, show and describe each part in binary), SIB byte (and if so, show and describe each part in binary), displacement, or immediate. For example, given 0F 02 75 00, you would write:
      0f 02         => opcode for LAR
      75 (01110101) => modr/m, 01:reg+disp8, 110:spare is 6 => esi, 101: ebp
      00            => 8 bit displacement with value 0
      Result        => LAR esi, [ebp+0]
      1. imul edi, dword [esi*8+20], 256 (693CF51400000000010000)
      2. rep cvttps2pi mm2, [ss: esp+2*eax] (F3360F2C1444)
      3. lock repne invlpg [3000] (F0F20F013DB80B0000)
      4. lock repe unpcklps xmm2, [8*eax-127] (F0F30F1414C581FFFFFF)
      5. lock repe punpckhdq xmm6, [esp+8*ebp+1024] (F0F3660F6AB4EC00040000)
      6. rep cvttsd2si esp, [4*ecx+2] (F3F20F2C248D02000000)
      7. imul ebx, [ebp], 257 (695D0001010000)
    12. Explain why the 4-byte hexadecimal octet sequence 0F 72 13 50 comprises an illegal IA-32 instruction. That is, first "decode" it and based on the combination of opcodes and operands you see, explain why your decoding makes an instruction that would be out of line with the design of the processor's MMX subsystem. Note that "the instruction is illegal because the entry in the opcode map is empty" is a flippant answer and is worth zero points.
    13. A fun thing to do in C is embed machine code in a string and assign that string to a function variable. Calling the function executes the machine code in the string. For example:
      #include <stdio.h>
      int (*f)() = "\xb0\x64\x0f\xb6\xc0\xc3";
      int main() {printf("%d\n", f());}
      

      On an x86, What does this program do and why?

  7. System Calls
    1. Write a standalone program, using Linux system calls and no external libraries at all, that copies standard input to standard output but skips all space characters (codepoint 20 hex). Remember, reading is done with system call 3, which accepts, in order the following parameters: the file handle, the address of the buffer to read into, and the number of bytes to read. It returns the number of bytes actually read. The file handle for standard input is 0. You already know about the write system call.
    2. Write a standalone program, using Linux system calls and no external libraries at all, that applies the Caesar cipher with a key of 3 to standard input writing the ciphertext to standard output. Reading is done with system call 3, which accepts, in order the following parameters: the file handle, the address of the buffer to read into, and the number of bytes to read. It returns the number of bytes actually read. The file handle for standard input is 0. You already know about the write system call.
    3. Write an assembly language program for Linux, using system calls only, that displays the contents of a file 16 octets per line, in hexadecimal. Show the file offset at the beginning of each line.

      For example

        00000000   9d cc 23 00 90 87 33 11  34 56 c1 dd 9c aa 9b bc
        00000010   11 23 5b 6b 89 89 89 89  32 2a ca fe e4 4e 32 10
        00000020   2e e6
      

      The name of the file should come from the command line, and the output should be written to stdout.

    4. Recall that in Linux, the exit system call is made by placing 1 in eax, the return code in ebx, and invoking int 80h. Define a C function that invokes the exit system call with return code 0, using embedded machine language, like this:
      void (*exit)() = "........";
      

      You fill in the string. Hand assemble the code that invokes the exit system call. Make sure you don't have any zeros in the machine code. Does it really matter if the machine code does have zeros?

  8. Linking and Loading
    1. Write a C program that takes in the name of an ELF-formatted file and displays a human-readable description of its contents.
    2. I wrote a program and saw that the machine code in the object file was 31 c9 8d 41 08 bb 13 00 00 00 cd 80 31 c0 89 c3 40 cd 80 64 6f 67 00. When it was loaded into memory, however, I noticed it was 31 c9 8d 41 08 bb 93 80 04 08 cd 80 31 c0 89 c3 40 cd 80 64 6f 67 00. Explain the difference. What memory location was the program loaded at? For extra credit, what program is this?
    3. I wrote a program and saw that the machine code in the object file was 31 DB 43 89 D8 C1 E0 01 89 C2 C1 E0 01 B9 16 00 00 00 CD 80 EB EA 79 0A. When it was loaded into memory, however, I noticed it was 31 DB 43 89 D8 C1 E0 01 89 C2 C1 E0 01 B9 8A 80 04 08 CD 80 EB EA 79 0A. Explain the difference. What memory location was the program loaded at? For extra credit, what program is this?
    4. What is the intent of this following function?
      mystery:  call    helper
                ret
      helper:   mov     eax, [esp]
                sub     eax, 5
                ret

      Note that superficial, obvious observations like "it calls a function that returns 5 less than the value on the stack top" will earn you zero points. Tell me intent of the code, then tell me a much simpler way to do that.

    5. Write an assembly language program for Linux, using system calls only, that does exactly the same thing as the Unix "yes" program without command line parameters. Make the program small, but don't do any tricks like shoving code in the ELF header.
    6. If you can, write an assembly language program that installs a handler for the #UD interrupt and invokes it by actually attempting to execute an invalid instruction. Why are you having a hard time with this?
  9. Compression
    1. Encode, by hand, the string "aababaabbabaa" in LZ78.
    2. Write a C function that compresses a file using LZ78. The parameters to the function are the input filename and the output filename. The function should return a double indicating the percentage that the file was compressed (that is, return 0% if the filesize stayed the same, 50% if it the output file is half the size as the input, etc.). Note: Although I did not specifically ask you in the question to write a decompress function, you really have to do so in order to test.
  10. Crytpology
    1. Vigenere-encode, by hand, the string "bad dog" with the key "¿sí?". Assume the text and the key are encoded with ISO8859-1 charset.
    2. Write an assembly language function to encrypt and decrypt a string using Vigenere encoding. The parameters should be (1) a character array for the data to be encoded, (2) the length of the character array you want to encode, (3) the encryption key and (4) a flag indicating whether you are encrypting or decrypting. The function should overwrite its input. Include in the comments the reason why you are not encoding a C "string". Also include a driver program written in C that calls your function. The driver program should have command line parameters for input and output file names, a key, as well as "command line options" such as "-d" for decrypt and "-e" for encrypt.
    3. One benefit of public key cryptosystems is that some measure of message authentication is possible: if I send you a message then you can be (reasonably) sure it's from me and not from someone pretending to be me (as long as secret keys are indeed secret). I can do this by encrypting my message twice using two different keys. Which keys? And how can you decrypt my message? Why does this allow you to be sure the message is from me?
    4. Suppose you got your hands on an encrypted Windows EXE file and you knew that the encoder used a Vigenere encoding scheme with a two-character key. How do you break the code (without trying all possible keys)?
    5. Could you encrypt a message by ANDing it with the key? Why or why not? Assume that "encrypt" means to encode in such a way that it is possible to decode later knowing only the key.
  11. Security