x86-64 Architecture Overview

The IA-32 is the instruction set architecture (ISA) of Intel’s most successful line of 32-bit processors, and the Intel 64 ISA is its extension into 64-bit processors. Actually, Intel 64 was invented by AMD, who called it x86-64, which is the more general moniker. These notes summarize a few items of interest about these two ISAs. They in no way serve as a substitute for reading Intel’s manuals.

IA-32 and x86-64

The two massively popular architectures IA-32 and x86-64 are so common, they are described in a single set of manuals. The flagship manual is the Software Developer’s Manual, which is over 5,000 pages long and broken up into four volumes (1=Basic Architecture, 2=Instruction Set Reference, 3=System Programming Guide, 4=Model-Specific Registers). For convenience, you can get it in 10 parts: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4.

The following notes briefly summarize the x8-64 architecture only.

x86-64 Architecture Diagram

The basic architecture of the x86-64 is described in Volume 1 of the Software Developer’s Manual. The following diagram is taken directly from Chapter 3:

Registers

Application Programmers generally use only the general purpose registers, floating point registers, XMM, and YMM registers.

General Purpose Registers

These are 64 bits wide and used for integer arithmetic and logic, and to hold both data and pointers to memory. The registers are called R0...R15. Also:

You can access the lower order 32-bits of each register using the names R0D...R15D. The “D” stands for “doubleword” because strangely, the word “word” on this platform refers to a 16-bit quantity. Why? Backward compatibility! The x86-64 grew out of a 16-bit processor family created in the 1970s.
You can access the lower order 16-bits of each register using the names R0W...R15W.
You can access the lower order 8-bits of each register using the names R0B...R15B.
R0...R7 have aliases RAX, RCX, RBX, RDX, RSP, RBP, RSI, RDI, respectively.
R0D...R7D have aliases EAX, ECX, EBX, EDX, ESP, EBP, ESI, EDI, respectively.
R0W...R7W have aliases AX, CX, BX, DX, SP, BP, SI, DI, respectively.
R0B...R7B have aliases AL, CL, BL, DL, SPL, BPL, SIL, DIL, respectively.

RIP and RFLAGS

RIP is the instruction pointer and RFAGS is the flags register.

Segment Registers

These are CS, DS, SS, ES, FS, and GS. I haven’t used them in 64-bit programming.

XMM Registers

These are 128-bits wide. They are named XMM0...XMM15. Use them for floating-point and integer arithmetic. You can do operations on 128-bit integers, but you can also take advantage of their ability to do operations in parallel:

Two 64-bit integer operations in parallel
Four 32-bit integer operations in parallel
Eight 16-bit integer operations in parallel
Sixteen 8-bit integer operations in parallel
Two 64-bit floating-point operations in parallel
Four 32-bit floating-point operations in parallel
Eight 16-bit floating-point operations in parallel

YMM Registers

These are 256-bits wide. They are named YMM0...YMM15. Use them for floating-point arithmetic. You can do:

Four 64-bit floating-point operations in parallel
Eight 32-bit floating-point operations in parallel

and some other crazy things.

FPU Registers

There are eight registers used for computing with 80-bit floating point values. The registers don’t have names because they are used in a stack-like fashion.

Other Registers

Application programmers can remain oblivious of the rest of the registers:

The 8 32-bit processor control registers: CR0, CR1, CR2, CR3, CR4, CR5, CR6, CR7. The lower 16 bits of CR0 is called the Machine Status Word (MSW).
The 4 16-bit table registers: GDTR, IDTR, LDTR and TR.
The 8 32-bit debug registers: DR0, DR1, DR2, DR3, DR4, DR5, DR6 and DR7.
The 5 test registers: TR3, TR4, TR5, TR6 and TR7.
The memory type range registers
The machine specific registers
The machine check registers

Instruction Set

See the SDM Volume 1, Chapter 5 for a nice overview of all of the processor instructions and Volume 2 for complete information.

The following table shows most of the available instructions, using the instruction names as specified in the Intel syntax. Not every processor supports every instruction, of course.

This table is incomplete
It also may be out of date.
Check out official sources for complete information on every instruction.

The vertical bar means OR, the square brackets mean OPTIONAL, and parentheses are used for grouping. For example:

SH(L|R)[D] stands for SHL, SHR, SHLD, SHRD.
PUSH[A[D]] stands for PUSH, PUSHA, PUSHAD.

INTEGER	FPU	SSE	SSE2
MOV CMOV[N]((L\|G\|A\|B)[E]\|E\|Z\|S\|C\|O\|P) XCHG BSWAP XADD CMPXCHG[8B] PUSH[A[D]] \| POP[A[D]] CBW \| CWDE \| CWD \| CDQ MOVSX \| MOVZX ADD \| ADC \| AD(C\|O)X SUB \| SBB [I]MUL [I]DIV INC \| DEC NEG CMP DAA \| DAS AAA \| AAS \| AAM \| AAD AND \| OR \| XOR \| NOT SH(L\|R)[D] SA(L\|R) RO(L\|R) RC(L\|R) BT[S\|R\|C] BS(F\|R) SET[N]((L\|G\|A\|B)[E]\|E\|Z\|S\|C\|O\|P) TEST CRC32 POPCNT JMP J[N]((L\|G\|A\|B)[E]\|E\|Z\|S\|C\|O\|P) J[E]CXZ LOOP[N][Z\|E] CALL \| RET INT[O] \| IRET ENTER \| LEAVE BOUND MOVS[B\|W\|D] CMPS[B\|W\|D] SCAS[B\|W\|D] LODS[B\|W\|D] STOS[B\|W\|D] REP[N][Z\|E] IN \| OUT INS[B\|W\|D] OUTS[B\|W\|D] ENTER \| LEAVE STC \| CLC \| CMC STD \| CLD STI \| CLI LAHF \| SAHF PUSHF[D] \| POPF[D] LDS \| LES \| LFS \| LGS \| LSS LEA NOP UD XLAT[B] CPUID	F[I]LD F[I]ST[P] FBLD FBSTP FXCH FCMOV[N](E\|B\|BE\|U) FADD[P] FIADD FSUB[R][P] FISUB[R] FMUL[P] FIMUL FDIV[R][P] FIDIV[R] FPREM[1] FABS FCHS FRNDINT FSCALE FSQRT FXTRACT F[U]COM[P][P] FICOM[P] F[U]COMI[P] FTST FXAM FSIN FCOS FSINCOS FPTAN FPATAN F2XM1 FYL2X FYL2XP1 FLD1 FLDZ FLDPI FLDL2E FLDLN2 FLDL2T FLDLG2 FINCSTP FDECSTP FFREE F[N]INIT F[N]CLEX F[N]STCW FLDCW F[N]STENV FLDENV F[N]SAVE FRSTOR F[N]STSW FWAIT \| WAIT FNOP FXSAVE FXRSTOR	MOV(A\|U)PS MOV(H\|HL\|L\|LH)PS MOVSS MOVMSKPS ADD(P\|S)S SUB(P\|S)S MUL(P\|S)S DIV(P\|S)S RCP(P\|S)S SQRT(P\|S)S RSQRT(P\|S)S MAX(P\|S)S MIN(P\|S)S CMP(P\|S)S [U]COMISS ANDPS ANDNPS ORPS XORPS SHUFPS UNPCK(H\|L)PS CVTPI2PS CVT[T]PS2PI CVTSI2SS CVT[T]SS2SI PAVG(B\|W) PEXTRW PINSRW P(MIN\|MAX)(UB\|SW) PMOVMSKB PMULHUW PSADBW PSHUFW LDMXCSR STMXCSR MASKMOVQ MOVNT(Q\|PS) PREFETCHT(0\|1\|2) PREFETCHNTA SFENCE	MOV(A\|U)PD MOV(H\|L)PD MOVSD MOVMSKPD ADD(P\|S)D SUB(P\|S)D MUL(P\|S)D DIV(P\|S)D SQRT(P\|S)D MAX(P\|S)D MIN(P\|S)D CMP(P\|S)D [U]COMISD ANDPD ANDNPD ORPD XORPD SHUFPD UNPCK(H\|L)PD CVT(PI\|DQ)2PD CVT[T]PD2(PI\|DQ) CVTSI2SD CVT[T]SD2SI CVTPS2PD CVTPD2PS CVTDQ2PS CVT[T]PS2DQ CVTSS2SD CVTSD2SS MOVDQ(A\|U) MOVQ2DQ MOVDQ2Q PUNPCK(H\|L)QDQ PADDQ PSUBQ PMULUDQ PSHUF(LW\|HW\|D) PS(L\|R)LDQ MASKMOVDQU MOVNT(PD\|DQ\|I) CLFLUSH LFENCE MFENCE PAUSE
SYSTEM	MMX	SSE3	SSE4
LGDT \| SGDT LLDT \| SLDT LTR \| STR LIDT \| SIDT LMSW \| SMSW CLTS ARPL LAR LSL VERR \| VERW INVD \| WBINVD INVLPG LOCK HLT RSM RDMSR \| WRMSR RDPMC RDTSC SYSENTER SYSEXIT	MOVD MOVQ PACKSS(WB\|DW) PACKUSWB PUNPCK(H\|L)(BW\|WD\|DQ) PADD(B\|W\|D) PADD(S\|US)(B\|W) PSUB(B\|W\|D) PSUB(S\|US)(B\|W) PMUL(H\|L)W PMADDWD PCMP(EQ\|GT)(B\|W\|D) PAND PANDN POR PXOR PS(L\|R)L(W\|D\|Q) PSRA(W\|D) EMMS	FISTTP LDDQU ADDSUBP(S\|D) HADDP(S\|D) HSUBP(S\|D) MOVS(H\|L)DUP MOVDDUP MONITOR MWAIT	PMUL(LD\|DQ) DPP(D\|S) MOVNTDQA BLEND[V](PD\|PS) PBLEND(VB\|W) PMIN(UW\|UD\|SB\|SD) PMAX(UW\|UD\|SB\|SD) ROUND(P\|S)(S\|D) EXTRACTPS INSERTPS PINSR(B\|D\|Q) PEXTR(B\|W\|D\|Q) PMOV(S\|Z)X(BW\|BD\|WD\|BQ\|WQ\|DQ) MPSADBW PHMINPOSUW PTEST PCMPEQQ PACKUSDW PCMP(E\|I)STR(I\|M) PCMPGTQ CRC32 POPCNT
64-BIT MODE	VIRTUAL MACHINE	SSSE3	AESNI
CDQE CMPSQ CMPXCHG16B LODSQ MOVSQ MOVZX STOSQ SWAPGS SYSCALL SYSRET	VMPTRLD VPTRST VMCLEAR VMREAD VMWRITE VMCALL VMLAUNCH VMRESUME VMXOFF VMXON INVEPT INVVPID	PHADD(W\|SW\|D) PHSUB(W\|SW\|D) PABS(B\|W\|D) PMADDUBSW PMULHRSW PSHUFB PSIGN(B\|W\|D) PALIGNR	AESDEC[LAST] AESENC[LAST] AESIMC AESKEYGENASSIST PCLMULQDQ

Addressing Memory

In protected mode, applications can choose a flat or segmented memory model (see the SDM Volume 1, Chapter 3 for details); in real mode only a 16-bit segmented model is available. Most programmers will only use protected mode and a flat-memory model, so that’s all we’ll discuss here.

A memory reference has four parts and is often written as:

[SELECTOR : BASE + INDEX * SCALE + OFFSET]

The selector is one of the six segment registers; the base is one of the eight general purpose registers; the index is any of the general purpose registers except ESP; the scale is 1, 2, 4, or 8; and the offset is any 32-bit number. (Example: [fs:ecx+esi*8+93221].) The minimal reference consists of only a base register or only an offset; a scale can only appear if there is an index present.

Sometimes the memory reference is written like this:

selector
offset(base,index,scale)

Data Types

The integer data types are:

Type name	Number of bits	Bit indices	Notes
Byte	8	7..0	Both signed and unsigned
Word	16	15..0	Both signed and unsigned
Doubleword	32	31..0	Both signed and unsigned
Quadword	64	63..0	Both signed and unsigned
Doublequadword	128	127..0	Both signed and unsigned

The floating-point data types are:

Type name	Number of bits	Sign	Exponent	Mantissa
Half	16	15	14..10	9..0
Single	32	31	30..23	22..0
Double	64	63	62..52	51..0
Extended	80	79	78..64	63,62..0

Little Endianness

The IA-32 is little endian, meaning the least significant bytes come first in memory. For example:

    0    12       byte @ 2 = CB
    1    31       byte @ 9 = 1F
    2    CB       word @ B = FE06
    3    74       word @ 6 = 230B
    4    67       word @ 1 = CB31
    5    45       dword @ A = 7AFE0636
    6    0B       qword @ 6 = 7AFE06361FA4230B
    7    23       word @ 2 = 74CB
    8    A4       qword @ 3 = 361FA4230B456774
    9    1F       dword @ 9 = FE06361F
    A    36  
    B    06  
    C    FE  
    D    7A  
    E    12

Note that if you draw memory with the lowest bytes at the bottom, then it is easier to read these values!

Flags Register

Many instructions cause the flags register to be updated. For example if you execute an add instruction and the sum is too big to fit into the destination register, the Overflow flag is set.

    3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
    1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
   ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
   │ │ │ │ │ │ │ │ │ │ │I│V│V│A│V│R│ │N│ I │O│D│I│T│S│Z│ │A│ │P│ │C│
   │ │ │ │ │ │ │ │ │ │ │D│I│I│C│M│F│ │T│ P │F│F│F│F│F│F│ │F│ │F│ │F│
   │ │ │ │ │ │ │ │ │ │ │ │P│F│ │ │ │ │ │ L │ │ │ │ │ │ │ │ │ │ │ │ │
   └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

The flags are described in Section 3.4.3 of Volume 1 of the SDM. To determine how each instruction affects the flags, see Appendix A of Volume 1 of the SDM.

The Software Developer’s Manual

The Software Developer’s Manual contains vast amounts of important information and is required reading for all assembly language programmers and backend compiler writers. The manual is split into several volumes; links to all volumes are here. Highlights from Volumes 1 and 2:

Volume 1:
- Chapter 1: About this manual
- Chapter 2: History of the IA-32 and Intel 64 architectures, a description of many of the microarchitectures and processors and technologies
- Chapter 3: Basic execution environment
- Chapter 4: Data types
- Chapter 5: Instruction set summary. Lists all instructions and a brief (but not precise) description of each. Instructions are grouped into convenient categories.
- Chapter 6: Details on calls and returns, and exceptions
- Chapter 7: All about general purpose instructions
- Chapter 8: All about FPU instructions
- Chapter 9: All about MMX instructions
- Chapter 10: All about SSE instructions
- Chapter 11: All about SSE2 instructions
- Chapter 12: All about SSE3, SSSE3, SSE4 and AESNI instructions
- Chapter 13: XSAVE
- Chapter 14: All about AVX, FMA, and AVX2 instructions
- Chapter 15: AVX-512
- Chapter 16: All about transactional synchronization instructions
- Chapter 17: Memory protection extensions
- Chapter 18: All about I/O instructions
- Chapter 19: How to determine what processor you have, and what its features are
- Appendix A: Shows which instructions affect which flags
- Appendix B: Condition codes
- Appendix C: Floating-point exceptions
- Appendix D: Guidelines for writing x87 exception handlers
- Appendix E: Guidelines for writing SIMD exception handlers
Volume 2:
- Chapter 1: About this manual
- Chapter 2: Instruction formats
- Chapter 3–5: Instruction set reference: full description, and encodings, of every instruction
- Chapter 6: Safer Mode Extensions Reference
- Appendix A: Opcode map
- Appendix B: Encoding summary
- Appendix C: Compiler intrinsics

Summary

We’ve covered:

IA-32 vs x86-64
x86-64 Diagram
Registers
Instruction Set
Memory Addressing
What’s in the Software Developer’s Manual