The two massively popular architectures IA-32 and x86-84 are so common, they are described in a single set of manuals.
The following notes briefly summarize the latter architecture only.
The basic architecture of the x86-64 is described in Volume 1 of the System Developer’s Manual. The following diagram is taken directly from Chapter 3 in this volume:
Application Programmers generally use only the general purpose registers, floating point registers, XMM, and YMM registers.
These are 64 bits wide and used for integer arithmetic and logic, and to hold both data and pointers to memory. The registers are called R0...R15. Also:
RIP is the instruction pointer and RFAGS is the flags register.
These are CS, DS, SS, ES, FS, and GS. I haven’t used them in 64-bit programming.
These are 128-bits wide. They are named XMM0...XMM15. Use them for floating-point and integer arithmetic. You can do operations on 128-bit integers, but you can also take advantage of their ability to do operations in parallel:
These are 256-bits wide. They are named YMM0...YMM15. Use them for floating-point arithmetic. You can do:
and some other crazy things.
There are eight registers used for computing with 80-bit floating point values. The registers don’t have names because they are used in a stack-like fashion.
Application programmers can remain oblivious of the rest of the registers:
See the SDM Volume 1, Chapter 5 for a nice overview of all of the processor instructions and Volume 2 for complete information.
The following table shows most of the available instructions, using the instruction names as specified in the Intel syntax. Not every processor supports every instruction, of course.
The vertical bar means OR, the square brackets mean OPTIONAL, and parentheses are used for grouping. For example:
SH(L|R)[D]
stands for SHL
, SHR
, SHLD
,
SHRD
.
PUSH[A[D]]
stands for PUSH
, PUSHA
, PUSHAD
.
INTEGER | FPU | SSE | SSE2 |
---|---|---|---|
MOV CMOV[N]((L|G|A|B)[E]|E|Z|S|C|O|P) XCHG BSWAP XADD CMPXCHG[8B] PUSH[A[D]] | POP[A[D]] IN | OUT CBW | CWDE | CWD | CDQ MOVSX | MOVZX ADD | ADC SUB | SBB [I]MUL [I]DIV INC | DEC NEG CMP DAA | DAS AAA | AAS | AAM | AAD AND | OR | XOR | NOT SH(L|R)[D] SA(L|R) RO(L|R) RC(L|R) BT[S|R|C] BS(F|R) SET[N]((L|G|A|B)[E]|E|Z|S|C|O|P) TEST JMP J[N]((L|G|A|B)[E]|E|Z|S|C|O|P) J[E]CXZ LOOP[N][Z|E] CALL | RET INT[O] | IRET ENTER | LEAVE BOUND MOVS[B|W|D] CMPS[B|W|D] SCAS[B|W|D] LODS[B|W|D] STOS[B|W|D] INS[B|W|D] OUTS[B|W|D] REP[N][Z|E] STC | CLC | CMC STD | CLD STI | CLI LAHF | SAHF PUSHF[D] | POPF[D] LDS | LES | LFS | LGS | LSS LEA NOP UD2 XLAT[B] CPUID |
F[I]LD F[I]ST[P] FBLD FBSTP FXCH FCMOV[N](E|B|BE|U) FADD[P] FIADD FSUB[R][P] FISUB[R] FMUL[P] FIMUL FDIV[R][P] FIDIV[R] FPREM[1] FABS FCHS FRNDINT FSCALE FSQRT FXTRACT F[U]COM[P][P] FICOM[P] F[U]COMI[P] FTST FXAM FSIN FCOS FSINCOS FPTAN FPATAN F2XM1 FYL2X FYL2XP1 FLD1 FLDZ FLDPI FLDL2E FLDLN2 FLDL2T FLDLG2 FINCSTP FDECSTP FFREE F[N]INIT F[N]CLEX F[N]STCW FLDCW F[N]STENV FLDENV F[N]SAVE FRSTOR F[N]STSW FWAIT | WAIT FNOP FXSAVE FXRSTOR |
MOV(A|U)PS MOV(H|HL|L|LH)PS MOVSS MOVMSKPS ADD(P|S)S SUB(P|S)S MUL(P|S)S DIV(P|S)S RCP(P|S)S SQRT(P|S)S RSQRT(P|S)S MAX(P|S)S MIN(P|S)S CMP(P|S)S [U]COMISS ANDPS ANDNPS ORPS XORPS SHUFPS UNPCK(H|L)PS CVTPI2PS CVT[T]PS2PI CVTSI2SS CVT[T]SS2SI PAVG(B|W) PEXTRW PINSRW P(MIN|MAX)(UB|SW) PMOVMSKB PMULHUW PSADBW PSHUFW LDMXCSR STMXCSR MASKMOVQ MOVNT(Q|PS) PREFETCHT(0|1|2) PREFETCHNTA SFENCE |
MOV(A|U)PD MOV(H|L)PD MOVSD MOVMSKPD ADD(P|S)D SUB(P|S)D MUL(P|S)D DIV(P|S)D SQRT(P|S)D MAX(P|S)D MIN(P|S)D CMP(P|S)D [U]COMISD ANDPD ANDNPD ORPD XORPD SHUFPD UNPCK(H|L)PD CVT(PI|DQ)2PD CVT[T]PD2(PI|DQ) CVTSI2SD CVT[T]SD2SI CVTPS2PD CVTPD2PS CVTDQ2PS CVT[T]PS2DQ CVTSS2SD CVTSD2SS MOVDQ(A|U) MOVQ2DQ MOVDQ2Q PUNPCK(H|L)QDQ PADDQ PSUBQ PMULUDQ PSHUF(LW|HW|D) PS(L|R)LDQ MASKMOVDQU MOVNT(PD|DQ|I) CLFLUSH LFENCE MFENCE PAUSE |
SYSTEM | MMX | SSE3 | SSE4 |
LGDT | SGDT LLDT | SLDT LTR | STR LIDT | SIDT LMSW | SMSW CLTS ARPL LAR LSL VERR | VERW INVD | WBINVD INVLPG LOCK HLT RSM RDMSR | WRMSR RDPMC RDTSC SYSENTER SYSEXIT |
MOVD MOVQ PACKSS(WB|DW) PACKUSWB PUNPCK(H|L)(BW|WD|DQ) PADD(B|W|D) PADD(S|US)(B|W) PSUB(B|W|D) PSUB(S|US)(B|W) PMUL(H|L)W PMADDWD PCMP(EQ|GT)(B|W|D) PAND PANDN POR PXOR PS(L|R)L(W|D|Q) PSRA(W|D) EMMS |
FISTTP LDDQU ADDSUBP(S|D) HADDP(S|D) HSUBP(S|D) MOVS(H|L)DUP MOVDDUP MONITOR MWAIT |
PMUL(LD|DQ) DPP(D|S) MOVNTDQA BLEND[V](PD|PS) PBLEND(VB|W) PMIN(UW|UD|SB|SD) PMAX(UW|UD|SB|SD) ROUND(P|S)(S|D) EXTRACTPS INSERTPS PINSR(B|D|Q) PEXTR(B|W|D|Q) PMOV(S|Z)X(BW|BD|WD|BQ|WQ|DQ) MPSADBW PHMINPOSUW PTEST PCMPEQQ PACKUSDW PCMP(E|I)STR(I|M) PCMPGTQ CRC32 POPCNT |
64-BIT MODE | VIRTUAL MACHINE | SSSE3 | AESNI |
CDQE CMPSQ CMPXCHG16B LODSQ MOVSQ MOVZX STOSQ SWAPGS SYSCALL SYSRET |
VMPTRLD VPTRST VMCLEAR VMREAD VMWRITE VMCALL VMLAUNCH VMRESUME VMXOFF VMXON INVEPT INVVPID |
PHADD(W|SW|D) PHSUB(W|SW|D) PABS(B|W|D) PMADDUBSW PMULHRSW PSHUFB PSIGN(B|W|D) PALIGNR |
AESDEC[LAST] AESENC[LAST] AESIMC AESKEYGENASSIST PCLMULQDQ |
In protected mode, applications can choose a flat or segmented memory model (see the SDM Volume 1, Chapter 3 for details); in real mode only a 16-bit segmented model is available. Most programmers will only use protected mode and a flat-memory model, so that’s all we’ll discuss here.
A memory reference has four parts and is often written as
[SELECTOR : BASE + INDEX * SCALE + OFFSET]
The selector is one of the six segment registers; the base is
one of the eight general purpose registers; the index is any of
the general purpose registers except ESP; the scale is 1, 2, 4,
or 8; and the offset is any 32-bit number.
(Example: [fs:ecx+esi*8+93221]
.) The minimal
reference consists of only a base register or only an offset;
a scale can only appear if there is an index present.
Sometimes the memory reference is written like this:
selector offset(base,index,scale)
The data types are
Type name | Number of bits | Bit indices |
---|---|---|
Byte | 8 | 7..0 |
Word | 16 | 15..0 |
Doubleword | 32 | 32..0 |
Quadword | 64 | 63..0 |
Doublequadword | 128 | 127..0 |
The IA-32 is little endian, meaning the least significant bytes come first in memory. For example:
0 12 1 31 byte @ 9 = 1F 2 CB word @ B = FE06 3 74 word @ 6 = 230B 4 67 word @ 1 = CB31 5 45 dword @ A = 7AFE0636 6 0B qword @ 6 = 7AFE06361FA4230B 7 23 word @ 2 = 74CB 8 A4 qword @ 3 = 361FA4230B456774 9 1F dword @ 9 = FE06361F A 36 B 06 C FE D 7A E 12
Note that if you draw memory with the lowest bytes at the bottom, then it is easier to read these values!
Many instructions cause the flags register to be updated.
For example if you execute an add
instruction and the sum
is too big to fit into the destination register, the Overflow
flag is set.
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | | | | | | | | | | |I|V|V|A|V|R| |N| I |O|D|I|T|S|Z| |A| |P| |C| | | | | | | | | | | |D|I|I|C|M|F| |T| P |F|F|F|F|F|F| |F| |F| |F| | | | | | | | | | | | |P|F| | | | | | L | | | | | | | | | | | | | +---------------------------------------------------------------+
The flags are described in Section 3.4.3 of Volume 1 of the SDM. To determine how each instruction affects the flags, see Appendix A of Volume 1 of the SDM.
The System Developer’s Manual contains vast amounts of important information and is required reading for all assembly language programmers and backend compiler writers. The manual is split into several volumes; links to all volumes are here. Highlights from Volumes 1 and 2: