x86 Assembly Language Programming

The x86 architecture is the most popular architecture for desktop and laptop computers. Let’s see how we can program in assembly language for processors in this family.

Introduction

This document contains very brief examples of assembly language programs for the x86. The topic of x86 assembly language programming is messy because:

There are many different assemblers out there: MASM, NASM, gas, as86, TASM, a86, Terse, etc. All use radically different assembly languages.
There are differences in the way you have to code for Linux, macOS, Windows, etc.
Many different object file formats exist: ELF, COFF, Win32, OMF, a.out for Linux, a.out for FreeBSD, rdf, IEEE-695, as86, etc.
You generally will be calling functions residing in the operating system or other libraries so you will have to know some technical details about how libraries are linked, and not all linkers work the same way.
Modern x86 processors run in either 32 or 64-bit mode; there are quite a few differences between these.

We’ll give examples written for NASM, MASM and gas for both Win32 and Linux. We will even include a section on DOS assembly language programs for historical interest. These notes are not intended to be a substitute for the documentation that accompanies the processor and the assemblers, nor is it intended to teach you assembly language. Its only purpose is to show how to assemble and link programs using different assemblers and linkers.

Assemblers and Linkers

Regardless of the assembler, object file format, linker or operating system you use, the programming process is always the same:

Each assembly language file is assembled into an object file and the object files are linked with other object files to form an executable. A "static library" is really nothing more than a collection of (probably related) object files. Application programmers generally make use of libraries for things like I/O and math.

Assemblers you should know about include

MASM, the Microsoft Assembler. It outputs OMF files (but Microsoft’s linker can convert them to win32 format). It supports a massive and clunky assembly language. Memory addressing is not intuitive. The directives required to set up a program make programming unpleasant.
GAS, the GNU assember. This uses the rather ugly AT&T-style syntax so many people do not like it; however, you can configure it to use and understand the Intel-style. It was designed to be part of the back end of the GNU compiler collection (gcc).
NASM, the "Netwide Assembler." It is free, small, and best of all it can output zillions of different types of object files. The language is much more sensible than MASM in many respects.

This document does not cover how to use all the different assemblers; you need to read the documentation that comes with them. We will, however, give step-by-step instructions and complete examples of all three of these assemblers for a few extremely simple programs.

There are many object file formats. Some you should know about include

OMF: used in DOS but has 32-bit extensions for Windows. Old.
AOUT: used in early Linux and BSD variants
COFF: "Common object file format"
Win, Win32: Microsoft’s version of COFF, not exactly the same! Replaces OMF.
Win64: Microsoft’s format for Win64.
ELF, ELF32: Used in modern 32-bit Linux and elsewhere
ELF64: Used in 64-bit Linux and elsewhere
macho32: NeXTstep/OpenStep/Rhapsody/Darwin/macOS 32-bit
macho64: NeXTstep/OpenStep/Rhapsody/Darwin/macOS 64-bit

The NASM documentation has great descriptions of these.

You’ll need to get a linker that (1) understands the object file formats you produce, and (2) can write executables for the operating systems you want to run code on. Some linkers out there include

LINK.EXE, for Microsoft operating systems.
ld, which exists on all Unix systems; Windows programmers get this in any gcc distribution.

Programming for Linux

Programming Using System Calls

64-bit Linux installations use the processor’s SYSCALL instruction to jump into the portion of memory where operating system services are stored. To use SYSCALL, first put the system call number in RAX, then the arguments, if any, in RDI, RSI, RDX, R10, R8, and R9, respectively. In our first example we will use system calls for writing to a file (call number 1) and exiting a process (call number 60). Here it is in the NASM assembly language:

hello.asm

; ----------------------------------------------------------------------------------------
; Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
; To assemble and run:
;
;     nasm -felf64 hello.asm && ld hello.o && ./a.out
; ----------------------------------------------------------------------------------------

          global    _start

          section   .text
_start:   mov       rax, 1                  ; system call for write
          mov       rdi, 1                  ; file handle 1 is stdout
          mov       rsi, message            ; address of string to output
          mov       rdx, 13                 ; number of bytes
          syscall                           ; invoke operating system to do the write
          mov       rax, 60                 ; system call for exit
          xor       rdi, rdi                ; exit code 0
          syscall                           ; invoke operating system to exit

          section   .data
message:  db        "Hello, World", 10      ; note the newline at the end

Here’s the same program in gas:

hello.s

# ----------------------------------------------------------------------------------------
# Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
# To assemble and run:
#
#     gcc -c hello.s && ld hello.o && ./a.out
#
# or
#
#     gcc -nostdlib hello.s && ./a.out
# ----------------------------------------------------------------------------------------

        .global _start

        .text
_start:
        # write(1, message, 13)
        mov     $1, %rax                # system call 1 is write
        mov     $1, %rdi                # file handle 1 is stdout
        mov     $message, %rsi          # address of string to output
        mov     $13, %rdx               # number of bytes
        syscall                         # invoke operating system to do the write

        # exit(0)
        mov     $60, %rax               # system call 60 is exit
        xor     %rdi, %rdi              # we want return code 0
        syscall                         # invoke operating system to exit
message:
        .ascii  "Hello, world\n"

Since gas is the "native" assembler under Linux, assembling and linking is automatic with gcc, as explained in the program’s comments. If you just enter "gcc hello.s" then gcc will assemble and then try to link with a C library. You can suppress the link step with the -c option to gcc, or do the assembly and linking in one step by telling the linker not to use the C library with -nostdlib.

System Calls in 32-bit Linux

There are some systems with 32-bit builds of Linux out there still. On these systems you invoke operating systems services through an INT instruction, and use different registers for system call arguments (specifically EAX for the call number and EBX, ECX, EDX, EDI, and ESI for the arguments). Although it might be interesting to show some examples for historical reasons, this introduction is probably better kept short.

Programming with a C Library

Sometimes you might like to use your favorite C library functions in your assembly code. This should be trivial because the C library functions are all stored in a C library, such as libc.a. Technically the code is probably in a dynamic library, like libc.so, and libc.a just has calls into the dynamic library. Still, all we have to do is place calls to C functions in our assembly language program, and link with the static C library and we are set.

Before looking at an example, note that the C library already defines _start, which does some initialization, calls a function named main, does some clean up, then calls the system function exit! So if we link with a C library, all we have to do is define main and end with a ret instruction! Here is a simple example in NASM, which illustrates calling puts.

hola.asm

; ----------------------------------------------------------------------------------------
; Writes "Hola, mundo" to the console using a C library. Runs on Linux.
;
;     nasm -felf64 hola.asm && gcc hola.o && ./a.out
; ----------------------------------------------------------------------------------------

          global    main
          extern    puts

          section   .text
main:                                       ; This is called by the C library startup code
          mov       rdi, message            ; First integer (or pointer) argument in rdi
          call      puts                    ; puts(message)
          ret                               ; Return from main back into C library wrapper
message:
          db        "Hola, mundo", 0        ; Note strings must be terminated with 0 in C

And the equivalent program in GAS:

hola.s

# ----------------------------------------------------------------------------------------
# Writes "Hola, mundo" to the console using a C library. Runs on Linux or any other system
# that does not use underscores for symbols in its C library. To assemble and run:
#
#     gcc hola.s && ./a.out
# ----------------------------------------------------------------------------------------

        .global main

        .text
main:                                   # This is called by C library's startup code
        mov     $message, %rdi          # First integer (or pointer) parameter in %rdi
        call    puts                    # puts(message)
        ret                             # Return to C library code
message:
        .asciz "Hola, mundo"            # asciz puts a 0 byte at the end

The previous example shows that the first argument to a C function, if it’s an integer or pointer, goes in register RDI. Subsequent arguments go in RSI, RDX, RCX, R8, R9, and then subsequent arguments (which no sane programmer would ever use) will go "on the stack" (more about this stack thing later). If you have floating point arguments, they’ll go in XMM0, XMM1, etc. There is even quite a bit more to calling functions; we’ll see this later.

Programming for macOS

Rather than getting into macOS system calls, let’s just show the simple hello program using the C library. We’ll assume a 64-bit OS, and we’ll also assume you’ve installed gcc (usually obtained via downloading xcode).

hola.asm

; ----------------------------------------------------------------------------------------
; This is an macOS console program that writes "Hola, mundo" on one line and then exits.
; It uses puts from the C library.  To assemble and run:
;
;     nasm -fmacho64 hola.asm && gcc hola.o && ./a.out
; ----------------------------------------------------------------------------------------

          global    _main
          extern    _puts

          section   .text
_main:    push      rbx                     ; Call stack must be aligned
          lea       rdi, [rel message]      ; First argument is address of message
          call      _puts                   ; puts(message)
          pop       rbx                     ; Fix up stack before returning
          ret

          section   .data
message:  db        "Hola, mundo", 0        ; C strings need a zero byte at the end

There are some differences here! C library functions have underscores, and we had to say default rel for some strange reason, which you can read about in the NASM documentation.

Programming for Win32

Win32 is the primary operating system API found in most of Microsoft’s 32-bit operating systems including Windows 9x, NT, 2000 and XP. We will follow the plan of the previous section and first look at programs that just use system calls and then programs that use a C library.

For historical reference only.

These notes are pretty old. I’ve never learned Win64.

Calling the Win32 API Directly

Win32 defines thousands of functions! The code for these functions is spread out in many different dynamic libraries, but the majority of them are in KERNEL32.DLL, USER32.DLL and GDI32.DLL (which exist on all Windows installations). The interrupt to execute system calls on the x86 processor is hex 2E, with EAX containing the system call number and EDX pointing to the parameter table in memory. However, according to z0mbie, the actually system call numbers are not consistent across different operating systems, so, to write portable code you should stick to the API calls in the various system DLLs.

Here is the "Hello, World" program in NASM, using only Win32 calls.

hello.asm

; ----------------------------------------------------------------------------
; hello.asm
;
; This is a Win32 console program that writes "Hello, World" on one line and
; then exits.  It uses only plain Win32 system calls from kernel32.dll, so it
; is very instructive to study since it does not make use of a C library.
; Because system calls from kernel32.dll are used, you need to link with
; an import library.  You also have to specify the starting address yourself.
;
; Assembler: NASM
; OS: Any Win32-based OS
; Other libraries: Use gcc's import library libkernel32.a
; Assemble with "nasm -fwin32 hello.asm"
; Link with "ld -e go hello.obj -lkernel32"
; ----------------------------------------------------------------------------

	global	go
	extern	_ExitProcess@4
	extern	_GetStdHandle@4
	extern	_WriteConsoleA@20

        section .data
msg:	db	'Hello, World', 10
handle: db	0
written:
	db	0

	section .text
go:
	; handle = GetStdHandle(-11)
	push	dword -11
	call	_GetStdHandle@4
	mov	[handle], eax

        ; WriteConsole(handle, &msg[0], 13, &written, 0)
	push	dword 0
        push    written
	push	dword 13
	push	msg
	push	dword [handle]
	call	_WriteConsoleA@20

	; ExitProcess(0)
	push	dword 0
	call	_ExitProcess@4

Here you can see that the Win32 calls we are using are

GetStdHandle
WriteConsoleA
ExitProcess

and parameters are passed to these calls on the stack. The comments instruct us to assemble into an object format of "win32" (not "coff"!) then link with the linker ld. Of course you can use any linker you want, but ld comes with gcc and you can download a whole Win32 port of gcc for free. We pass the starting address to the linker, and specify the static library libkernel32.a to link with. This static library is part of the Win32 gcc distribution, and it contains the right calls into the system DLLs.

The gas version of this program looks very similar:

hello.s

/*****************************************************************************
* hello.s
*
* This is a Win32 console program that writes "Hello, World" on one line and
* then exits.  It uses only plain Win32 system calls from kernel32.dll, so it
* is very instructive to study since it does not make use of a C library.
* Because system calls from kernel32.dll are used, you need to link with
* an import library.  You also have to specify the starting address yourself.
*
* Assembler: gas
* OS: Any Win32-based OS
* Other libraries: Use gcc s import library libkernel32.a
* Assemble with "gcc -c hello.s"
* Link with "ld -e go hello.o -lkernel32"
*****************************************************************************/

         .global go

         .data
msg:     .ascii  "Hello, World\n"
handle:  .int    0
written: .int    0

         .text
go:
         /* handle = GetStdHandle(-11) */
         pushl   $-11
         call    _GetStdHandle@4
         mov     %eax, handle

         /* WriteConsole(handle, &msg[0], 13, &written, 0) */
         pushl   $0
         pushl   $written
         pushl   $13
         pushl   $msg
         pushl   handle
         call    _WriteConsoleA@20

         /* ExitProcess(0) */
         pushl   $0
         call    _ExitProcess@4

In fact the differences between the two programs are really only syntactic. Another minor point is that gas doesn’t really care if you define external systems with some sort of "extern" directive or not.

As in the NASM version, we’ve specified our entry point, and will be passing it to the linker in the -e option. To assemble this code, do

gcc -c hello.s

The -c option is important! It tells gcc to assemble but not link. Without the -c option, gcc will try to link the object file with a C runtime library. Since we are not using a C runtime library, and in fact are specifying our own starting point, and cleaning up ourselves with ExitProcess we definitely want to link ourselves. The linking step is the same as the NASM example; the only difference is that gcc produces win32 object files with extension .o rather than .obj.

If you really want to pay a vendor for an assembler and linker you can use Microsoft’s MASM assembler. Anything less than version 6.14 will be extremely painful to use. Here is the version of the hello program in MASM

hello.asm

; ----------------------------------------------------------------------------
; hello.asm
;
; This is a Win32 console program that writes "Hello, World" on one line and
; then exits.  It uses only plain Win32 system calls from kernel32.dll, so it
; is very instructive to study since it does not make use of a C library.
; Because system calls from kernel32.dll are used, you need to link with
; an import library.
;
; Processor: 386 or later
; Assembler: MASM
; OS: Any Win32-based OS
; Other libraries: Use Microsoft's import library kernel32.lib
; Assemble with "ml hello.asm /c"
; Link with "link hello kernel32.lib /subsystem:console /entry:go"
; ----------------------------------------------------------------------------

        .386P
        .model    flat
        extern    _ExitProcess@4:near
        extern    _GetStdHandle@4:near
        extern    _WriteConsoleA@20:near
        public    _go

        .data
msg     byte      'Hello, World', 10
handle  dword     ?
written dword     ?

        .stack

        .code
_go:        
        ; handle = GetStdHandle(-11)
        push      -11
        call      _GetStdHandle@4
        mov       handle, eax

        ; WriteConsole(handle, &msg[0], 13, &written, 0)
        push      0
        push      offset written
        push      13
        push      offset msg
        push      handle
        call      _WriteConsoleA@20

        ; ExitProcess(0)
        push      0
        call      _ExitProcess@4

        end

The processor (.386P) and model (.model) directives are an annoyance, but they have to be there and the processor directive must precede the model directive or the assembler will think the processor is running in 16-bit mode (*sigh*). As before we have to specify an entry point and pass it to the linker. Assemble with

    ml hello.asm /c

The /c option is required since ml will try to link. Not only is the MASM assembler, ml, not free, but neither is Microsoft’s linker, link.exe, nor are static versions of the Win32 libraries, such as kernel32.lib. After you buy those you link your code with

    link hello.obj kernel32.lib /subsystem:console /entry:go

To get this to work, kernel32.lib needs to be in a known library path or additional options must be passed to the linker. You might find the /subsystem option interesting; leave it out to see a ridiculous error message when running the linked executable (at least under Win9x).

Most of MASM’s syntactic weirdness, like using the "offset" keyword to get the address of a variable are not present in NASM. While NASM is probably gaining popularity, there is far more MASM code out there, and it is a good idea to have at least a passing acquaintance with MASM, since most publications use it. It is the closest thing to a "standard" x86 assembly language there is.

Using a C Runtime Library for Win32 Programming

As under Linux, using a C runtime library makes it very easy to write simple assembly language programs. Here is one in NASM:

powers.asm

; ----------------------------------------------------------------------------
; powers.asm
;
; Displays powers of 2 from 2^0 to 2^31, one per line, to standard output.
;
; Assembler: NASM
; OS: Any Win32-based OS
; Other libraries: Use gcc's C runtime library
; Assemble with "nasm -fwin32 powers.asm"
; Link with "gcc powers.obj" (C runtime library linked automatically)
; ----------------------------------------------------------------------------

        extern _printf
	global _main

	section .text
_main:
	push	esi			; callee-save registers
	push	edi

        mov     esi, 1                  ; current value
        mov     edi, 31                 ; counter
L1:
	push	esi			; push value to print
        push    format			; push address of format string
	call	_printf
	add	esp, 8			; pop off parameters passed to printf
	add	esi, esi		; double value
	dec	edi			; keep counting
	jne	L1

	pop	edi
	pop	esi
	ret

format:	db	'%d', 10, 0

The same program in gas looks like this:

powers.s

/*****************************************************************************
* powers.s
*
* Displays powers of 2 from 2^0 to 2^31, one per line.  It should be linked
* with a C runtime library.  The C runtime library contains startup code
* so you do not have to specify a starting label.  The startup code in
* the C library eventually calls main.
*
* Assembler: gas
* OS: Any Win32-based OS
* Other libraries: Use the gccs C runtime library
* Assemble and link: "gcc powers.s" (gcc links the C library automatically)
*****************************************************************************/

        .global   _main

        .text
format: .asciz    "%d\n"

_main:
        pushl     %esi                  /* callee save registers */
        pushl     %edi
        
        movl      $1, %esi              /* current value */
        movl      $31, %edi             /* counter */
L1:
        pushl     %esi                  /* push value of number to print */
        pushl     $format               /* push address of format */
        call      _printf
        addl      $8, %esp

        addl      %esi, %esi            /* double value */
        decl      %edi                  /* keep counting */
        jnz       L1
        
        popl      %edi
        popl      %esi
        ret

Note you can assemble and link with

    gcc powers.s

For the MASM version of this program, you can go purchase C Runtime Libraries from Microsoft as well. There are many versions of the library, but for single threaded programs, libc.lib is fine. Here is the powers program in MASM:

powers.asm

; ----------------------------------------------------------------------------
; powers.asm
;
; Displays powers of 2 from 2^0 to 2^31, one per line, to standard output.
;
; Processor: 386 or later
; Assembler: MASM
; OS: Any Win32-based OS
; Other libraries: Use a Microsoft-compatible C library (e.g. libc.lib).
; Assemble with "ml powers.asm /c"
; Link with "link powers libc.lib"
;
; By default, the linker uses "/subsystem:console /entry:mainCRTStartup".
; The function "mainCRTStartup" is inside libc.lib.  It does some
; initialization, calls a function "_main" (which will end up in powers.obj)
; then does more work and finally calls ExitProcess.
; ----------------------------------------------------------------------------

        .386P
        .model    flat
        extern    _printf:near
        public    _main

        .code
_main:
        push      esi                  ; callee-save registers
        push      edi
        
        mov       esi, 1               ; current value
        mov       edi, 31              ; counter                
L1:
        push      esi                  ; push value to print
        push      offset format        ; push address of format string
        call      _printf
        add       esp, 8               ; pop off parameters passed to printf
        add       esi, esi             ; double value
        dec       edi                  ; keep counting
        jnz       L1

        pop       edi
        pop       esi
        ret
        
format: byte      '%d', 10, 0

        end

When linking with libc.lib you get nice linker defaults. To assemble and link:

    ml powers.asm /c
    link powers.obj libc.lib

You’ll have to make sure the linker knows where to find libc.lib by setting some environment variables, of course, but you get the idea.

OpenGL Programming in NASM for Win32

For fun, here is a complete assembly language program that implements an OpenGL application running under GLUT on Windows systems:

triangle.asm

; ----------------------------------------------------------------------------
; triangle.asm
;
; A very simple *Windows* OpenGL application using the GLUT library.  It
; draws a nicely colored triangle in a top-level application window.  One
; interesting thing is that the Windows GL and GLUT functions do NOT use the
; C calling convention; instead they use the "stdcall" convention which is
; like C except that the callee pops the parameters.
; ----------------------------------------------------------------------------

	global	_main
	extern	_glClear@4
	extern	_glBegin@4
	extern	_glEnd@0
	extern	_glColor3f@12
	extern	_glVertex3f@12
	extern	_glFlush@0
	extern	_glutInit@8
	extern	_glutInitDisplayMode@4
	extern	_glutInitWindowPosition@8
	extern	_glutInitWindowSize@8
	extern	_glutCreateWindow@4
	extern	_glutDisplayFunc@4
	extern	_glutMainLoop@0

	section	.text
title:	db	'A Simple Triangle', 0
zero:	dd	0.0
one:	dd	1.0
half:	dd	0.5
neghalf:dd	-0.5

display:
	push	dword 16384
	call	_glClear@4		; glClear(GL_COLOR_BUFFER_BIT)
	push	dword 9
	call	_glBegin@4		; glBegin(GL_POLYGON)
	push	dword 0
	push	dword 0
	push	dword [one]
	call	_glColor3f@12		; glColor3f(1, 0, 0)
	push	dword 0
	push	dword [neghalf]
	push	dword [neghalf]
	call	_glVertex3f@12		; glVertex(-.5, -.5, 0)
	push	dword 0
	push    dword [one]
	push	dword 0
	call	_glColor3f@12		; glColor3f(0, 1, 0)
	push	dword 0
	push	dword [neghalf]
	push	dword [half]
	call	_glVertex3f@12		; glVertex(.5, -.5, 0)
	push	dword [one]
	push	dword 0
	push	dword 0
	call	_glColor3f@12		; glColor3f(0, 0, 1)
	push	dword 0
	push	dword [half]
	push	dword 0
	call	_glVertex3f@12		; glVertex(0, .5, 0)
	call	_glEnd@0		; glEnd()
	call	_glFlush@0		; glFlush()
	ret

_main:
	push	dword [esp+8]		; push argv
	lea	eax, [esp+8]		; get addr of argc (offset changed :-)
	push	eax
	call	_glutInit@8		; glutInit(&argc, argv)
	push	dword 0
	call	_glutInitDisplayMode@4
	push	dword 80
	push	dword 80
	call	_glutInitWindowPosition@8
	push	dword 300
	push	dword 400
	call	_glutInitWindowSize@8
	push	title
	call	_glutCreateWindow@4
	push	display
	call	_glutDisplayFunc@4
	call	_glutMainLoop@0
	ret

Programming for DOS

Both MASM and NASM can create DOS executables. DOS is a primitive operating system (indeed, many people, perhaps correctly, refuse to call it an operating system), which runs in real mode only. Real mode addresses are 20-bit values written in the form SEGMENT:OFFSET where the segment and offset are each 16-bits wide and the physical address is SEGMENT * 16 + OFFSET.

A DOS program is a collection of segments. When the program is loaded, DS:0 and ES:0 points to a 256-byte section of memory called the program segment prefix and this is immediately followed by the segments of the program. CS:0 will point to the code segment and SS:0 to the stack segment. SP will be loaded with the size of the stack specified by the programmer, which is perfect because on the x86 a PUSH instruction decrements the stack pointer and then moves the pushed value into the memory addressed by SS:SP. The length of the command line argument string is placed in the byte at offset 80h of the prefix and the actual argument string begins at offset 81h.

Here is a simple DOS program to echo the command line argument string:

echo.asm

; ----------------------------------------------------------------------------
; echo.asm
;
; Echoes the command line to standard output.  Illustrates DOS system calls
; 40h = write to file, and 4ch = exit process.
;
; Processor: 386 or later
; Assembler: MASM
; OS: DOS 2.0 or later only
; Assemble and link with "ml echo.asm"
; ----------------------------------------------------------------------------

        .model  small
        .stack  64                      ; 64 byte stack
        .386
        .code
start:  movzx   cx,byte ptr ds:[80h]    ; size of parameter string
        mov     ah, 40h                 ; write
        mov     bx, 1                   ; ... to standard output
        mov     dx, 81h                 ; ... the parameter string
        int     21h                     ; ... by calling DOS
        mov     ah, 4ch
        int     21h
        end     start

Note with the MASM assembler you have to place the .model directive before the processor directive to make the processor use 16-bit mode required for DOS.

Note that all "operating system services" such as input/output are accessible through the processor’s interrupt instruction so there is no need to link your program to a special library. Of course if you wanted to link to a 16-bit C runtime library you certainly can.

The echo program defines only a code and stack segment; an example of a program with a programmer-defined data segment is:

hello1.asm

; ----------------------------------------------------------------------------
; hello1.asm
;
; Displays a silly message to standard output.  Illustrates user-defined data.
; The easiest way to do this is to put the data in a data segment, separate 
; from the code, and access it via the ds register.  Note that you must have
; ds:0 pointing to your data segment (technically to your segment's GROUP) 
; before you reference your data.  The predefined symbol @data referes to 
; the group containing the segments created by .data, .data?, .const, 
; .fardata, and .fardata?.
;
; Processor: 386 or later
; Assembler: MASM
; OS: DOS 2.0 or later only
; Assemble and link with "ml hello1.asm"
; ----------------------------------------------------------------------------

	.model	small

        .stack  128

        .code
start:  mov     ax, @data
        mov     ds, ax
        mov     ah, 9
        lea     dx, Msg
        int     21h
	mov	ah, 4ch
	int	21h

        .data
Msg     byte    'Hello, there.', 13, 10, '$'

	end	start

Although DOS has been obsolete for many years, a brief study of DOS systems and the x86 real-addressing mode is somewhat interesting. First, real-mode addresses correspond to real, physical memory, so one can watch exactly what is happening in the machine very easily with a good debugger. In fact, most embedded microprocessors work in a kind of "real mode." Less than 1% of microprocessors run desktop PCs, servers and workstations; most are simple embedded processors. Finally a lot of DOS applications still exist, so it might be useful to know what kind of technology underlies it all.

Writing Optimized Code

Assembly language programmers and compiler writers should take great care in producing efficient code. This requires a fairly deep understanding of the x86 architecture, especially the behavior of the cache(s), pipelines and alignment bias. These specifics are well beyond the scope of this little document, but an excellent place to begin your study of this material is Agner Fog’s Optimization Guide or even Intel’s.

Differences between NASM, MASM, and GAS

The complete syntactic specification of each assembly language can be found elsewhere, but you can learn 99% of what you need to know by looking at a comparison table:

Operation	NASM	MASM	GAS
Move contents of esi into ebx	mov ebx, esi		movl %esi, %ebx
Move contents of si into dx	mov dx, si		movw %si, %dx
Clear the eax register	xor eax, eax		xorl %eax, %eax
Move immediate value 10 into register al	mov al, 10		movb $10, %al
Move contents of address 10 into register ecx	mov ecx, [10]	I DON’T KNOW	movl 10, %ecx
Move contents of variable dog into register eax	mov eax, [dog]	mov eax, dog	movl dog, %eax
Move address of variable dog into register eax	mov eax, dog	I DON’T KNOW	movl $dog, %eax
Move immediate byte value 10 into memory pointed to by edx	mov byte [edx], 10	mov byte ptr [edx], 10	movb $10, (%edx)
Move immediate 16-bit value 10 into memory pointed to by edx	mov word [edx], 10	mov word ptr [edx], 10	movw $10, (%edx)
Move immediate 32-bit value 10 into memory pointed to by edx	mov dword [edx], 10	mov dword ptr [edx], 10	movl $10, (%edx)
Compare eax to the contents of memory 8 bytes past the cell pointed to by ebp	cmp eax, [ebp+8]		cmpl $8(%ebp), %eax
Add into esi the value in memory ecx quadwords past the cell pointed to by eax	add esi, [eax+ecx*8]		addl (%eax,%ecx,8), %esi
Add into esi the value in memory ecx doublewords past 128 bytes past the cell pointed to by eax	add esi, [eax+ecx*4+128]		addl $128(%eax,%ecx,4), %esi
Add into esi the value in memory ecx doublewords past eax bytes past the beginning of the variable named array	add esi, [eax+ecx*4+array]		addl array(%eax,%ecx,4), %esi
Add into esi the value in memory ecx words past the beginning of the variable named array	add esi, [ecx*2+array]		addl array(,%ecx,2), %esi
Move the immediate value 4 into the memory cell pointed to by eax using selector fs	mov byte [fs:eax], 4	mov byte ptr fs:eax, 4	movb $4, %fs:(%eax)
Jump into another segment	?	jump far S:O	ljmp $S, $O
Call to another segment	?	call far S:O	lcall $S, $O
Return from an intersegment call	retf V	ret far V	lret $V
Sign-extend al into ax	cbw		cbtw
Sign-extend ax into eax	cwde		cwtl
Sign-extend ax into dx:ax	cwd		cwtd
Sign-extend eax into edx:eax	cdq		cltd
Sign-extend bh into si	movsx si, bh		movsbw %bh, %si
Sign-extend bh into esi	movsx esi, bh		movsbl %bh, %esi
Sign-extend cx into esi	movsx esi, cx		movswl %cx, %esi
Zero-extend bh into si	movzx si, bh		movzbw %bh, %si
Zero-extend bh into esi	movzx esi, bh		movzbl %bh, %esi
Zero-extend cx into esi	movzx esi, cx		movzwl %cx, %esi
100 doublewords, all initialized to 8192	times 100 dd 8192	dd 100 dup (8192)	I DON’T KNOW
Reserve 64 bytes of storage	resb 64	db 64 dup (?)	.space 64
Hello World	db 'Hello, World'		.ascii "Hello, World"
Hello World with a newline, and zero-terminated	db 'Hello, World', 10, 0		.asciz "Hello, World\n"

Good to know:

NASM and MASM use what is sometimes called the Intel syntax, while GAS uses what is called the AT&T syntax.
GAS uses % to prefix registers
GAS is source(s) first, destination last; MASM and NASM go the other way.
GAS denotes operand sizes on instructions (with b, w, l suffixes), rather than on operands
GAS uses $ for immediates, but also for addresses of variables.
GAS puts rep/repe/repne/repz/repnz prefixes on separate lines from the instructions they modify
MASM tries to simplify things for the programmer but makes headaches instead: it tries to "remember" segments, variable sizes and so on. The result is a requirement for stupid ASSUME directives, and the inability to tell what an instruction does by looking at it (you have to go look for declarations; e.g. dw vs. equ).
MASM writes FPU registers as ST(0), ST(1), etc.
NASM treats labels case-sensitively; MASM is case-insensitive.