IA-32

IA-32, sometimes generically called x86-32, is the instruction set architecture of Intel's most successful microprocessors. Within various programming language directives it is also referred to as "i386". The term may be used to refer to the 32-bit extensions to the original x86 architecture, or to the architecture as a whole.

This architecture defines the instruction set for the family of microprocessors installed in the vast majority of personal computers in the world.

The term means Intel Architecture, 32-bit, which distinguishes it from the 16-bit versions of the architecture that preceded it, and the 64-bit architecture IA-64 (also known as the Itanium architecture, which is very different, although it has an IA-32 compatibility mode). The more generic name for this architecture is x86.

It should be noted that Intel refer to the 64-bit mode in their newer 64-bit processors, such as the newer Pentium 4/Pentium D and Core 2 Duo chips and their Xeon derivatives, as "IA-32e". The instruction set of those processors is called EM64T, and also includes a 'compatibility mode' for 32-bit use. It is based on the AMD64 instruction set. The Developers Manuals available from Intel on the IA-32 architecture refer to IA-32 and IA-32e in tandem.

Intel was the inventor and is the biggest supplier of processors compatible with this instruction set, but it is not the only supplier of such processors. The second biggest supplier is AMD. And then there are numerous even smaller more specialized suppliers of these processors.

This instruction set was introduced in the Intel 80386 microprocessor in 1985. This instruction set is still the basis of most PC microprocessors twenty years later in 2005. Even though the instruction set has remained intact, the successive generations of microprocessors that run it have become much faster at running it.

The IA-32 instruction set is usually described as a CISC (Complex Instruction Set Computer) architecture, though such classifications have become less meaningful with advances in microprocessor design.

Two memory management models

There are two memory access models that IA-32 supports. One is called Real mode, and the other is called Protected mode. In Real Mode, the processor is limited to accessing a total of just over 1MB of memory, while in Protected mode it can access all of its memory (up to 4GB in one address space).

Real mode

The old DOS operating system required the real mode to work, while newer OS/2, Windows, Linux and other operating systems usually require the protected mode. Upon power-on (aka booting), the processor initiates itself into Real mode, and then it begins loading programs automatically into RAM from ROM and disk. A program inserted somewhere along the boot sequence may be used to put the processor into the Protected mode.

Protected mode

In Protected mode, a number of other advantages beyond just the additional memory addressability beyond the DOS 1MB limit get activated. One of them is protected memory, which prevents programs from corrupting one another. Another one is virtual memory, which uses hard disk space so that programs use more memory than is physically installed on the machine. And the third feature is task-switching, aka multitasking, which lets a computer juggle multiple programs all at once to look like they are all running at the same time.

The size of memory in Protected mode is usually limited to 4GB. However, this isn't the ultimate limit of the size of memory in IA-32 processors. Through tricks in the processor's page and segment memory management systems, IA-32 operating systems may be able to access more than 32-bits of address space, even without the switchover to the 64-bit paradigm. One such trick is known as Physical Address Extension.

Virtual 8086 mode

There was also a sub-mode of operation in Protected mode, called virtual 8086 mode. This is basically a special hybrid operating mode which allowed old DOS programs and operating systems to run while under the control of a Protected mode supervisor operating system. This allowed for a great deal of flexibility in running both Protected mode programs and DOS programs simultaneously. This mode was added only with the IA-32 version of Protected mode,; virtual 8086 mode did not exist previously in the 80286 16-bit version of Protected mode.

Registers

The 386 has eight 32-bit general purpose registers for application use. There are 8 floating point stack registers. Later processors added new registers with their various SIMD instruction sets too, such as MMX, 3DNow!, SSE, SSE2 and SSE3.

There are also system registers that are used mostly by operating systems but not by applications usually. They are known as segment, control, debug, and test registers. There are six segment registers, used mainly for memory management. The number of control, debug or test registers varies from model to model.

General Purpose registers

The x86 general purpose registers are not really as general purpose as their name implies. That is because these general purpose registers have some highly specialized tasks that can often only be done by using only one or two specific registers. In other architectures, any general purpose register can be used for any purpose. The x86 general purpose registers further subdivide into registers specializing in data and others specializing in addressing.

Also a lot of operations can be done either inside a register or directly inside RAM without requiring the data to be loaded into a register first. The 1970s heritage of this architecture shows through by this behaviour.

Note: with the advent of the 64-bit extensions to x86 in AMD64, this odd behaviour has now been cleaned up (at least in 64-bit mode). General purpose registers are now truly general purpose and they can be used interchangeably. This does not affect the 32-bit architecture, however.

8-bit and 16-bit register subsets

8-bit and 16-bit subsets of these registers are also accessible. For example, the lower 16-bits of the 32-bit EAX registers can be accessed by calling it the AX register. Some of the 16-bit registers can be further subdivided into 8-bit subsets too; for example, the upper 8-bit half of AX is called AH, and the lower half is called AL. Similarly, EBX is subdivided into BX (16-bit), which in turn is divided into BH and BL (8-bit).

General data registers

All of the four following registers may be used as general purpose registers. However each has some specialized purpose as well. Each of these registers also have 16-bit or 8-bit subset names.

EAX (At 000) Dedicated accumulator which is used for all major calculations.
ECX (At 001) The universal loop counter which has a special interpretation for loops.
EDX (At 010) The data register, which is an extension to the accumulator, stores data relevant to the operation applied to the accumulator.
EBX (At 011) Currently used for free storage but was originally used as a pointer in 16-bit mode

General address registers

Used only for address pointing. They have 16-bit subset names, but no 8-bit subsets.

ESP (At 100) Stack pointer. Is used to hold the top address of the stack.
EBP (At 101) Base pointer. Is used to hold the address of the current stack frame. It is also sometimes used as free storage.
ESI (At 110) Source index. Commonly used for string operations. It has a one-byte opcode for loading data from memory to the accumulator.
EDI (At 111) Destination index. Commonly used for string operations. Has a one-byte STOS instruction to write data out of the accumulator.

EIP Instruction pointer. Holds the current instruction address.

Floating point stack registers

Initially, IA-32 included floating-point capabilities only on add-on processors (8087, 80287 and 80387.) With the introduction of the 80486, these 8 80x87 floating point registers, known as ST(0) through ST(7) are built in to the CPU. Each register is 80 bits wide and stores numbers in the extended precision format of the IEEE floating-point standard.

These registers are not accessible directly, but are accessible like a LIFO stack. The register numbers are not fixed, but are relative to the top of the stack; ST(0) is the top of the stack, ST(1) is the next register below the top of the stack, ST(2) is two below the top of the stack, etc. That means that data is always pushed down from the top of the stack, and operations are always done against the top of the stack. So you couldn't just access any register randomly, it has to be done in the stack order.

MMX registers

MMX added 8 new "registers" to the architecture, known as MM0 through MM7 (henceforth referred to as MMn). In reality, these new "registers" were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed not relative, and therefore they were randomly accessible.

Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of packed data types, which means instead of using the whole register for a single 64-bit integer (quadword), two 32-bit integers (doubleword), four 16-bit integers (word) or eight 8-bit integers (byte) may be used.

Also because the MMX's 64-bit MMn registers are aliased to the FPU stack, and each of the stack registers are 80-bit wide, the upper 16-bits of the stack registers go unused in MMX, and these bits are set to all ones, which makes it look like NaN's or infinities in the floating point view. This makes it easier to tell whether you are working on a floating point data or MMX data.

3DNow! registers

3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses the exact same register naming convention as MMX, that is MM0 through MM7. The only difference is that instead of packing byte to quadword integers into these registers, one would pack single precision floating points into these registers.

The advantage of aliasing registers with the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about.

SSE registers

SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (Note: in AMD64, the number of SSE XMM registers has been increased from 8 to 16.)

But the downside is that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode.

SSE is a SIMD instruction set that works only on floating point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of single precision floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack double precision numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128-bit, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision.

Instructions

The full listing of the x86 machine language mnemonics including integer, floating point, and SIMD instructions can be found in the X86 instruction listings link. They are categorized into a chronological and hierarchal format showing when the instructions first became available, and what category of instructions they are.

The original IA-32 instruction set has been evolved over time with the addition of the multimedia instruction updates. However, the ultimate evolution of IA-32 was when it was extended again to 64-bits, but of course at that point it cannot be called IA-32 anymore; the 64-bit extension is called x86-64. It could not be called IA-64 as Intel had already used this label for the Itanium design (a design which is not really an evolution of the IA-32 architecture). AMD's AMD64 was the first x86-64 instruction set designed. Later, Intel followed by imitating AMD's design with what they call EM64T.

Next-generation 64-bit Instruction Sets

Two new instruction sets can claim to be the 64-bit successor to IA-32. One of them builds on top of IA-32 but has a different name, while the other one discards IA-32 completely but has a similar name.

IA-64

Intel's IA-64 architecture is not directly compatible with the IA-32 instruction set. It completely discards all IA-32 instructions, and starts from scratch with a completely different instruction set as well as using a VLIW design instead of out-of-order execution. IA-64 is the architecture used by their Itanium line of processors. The Itanium has hardware-support for IA-32, though very slow because of the different approach. IA-32 execution mode is set by the EFI program loaded on boot-up. The nomenclature "IA-64" means "Intel Architecture, 64-bit", but the connection with IA-32 is only in the name.

AMD64

AMD's AMD64 instruction set, aka x86-64, is largely built on top of IA-32, and thus maintains the x86 family heritage. While extending the instruction set, AMD took the opportunity to clean up some of the odd behaviour of this instruction set that has existed (plagued programmers?) since its earliest 16-bit days, while the processor is operating in 64-bit mode.

Further improvements are:

Two times the amount of general purpose registers (now 16)
Two times the amount of SSE registers (now 16)
The general purpose registers are now truly general-purpose registers and are no longer restricted.
Most of the functionality of the segment registers has been deprecated, since their usage has steadily declined even during the IA-32 days.

EM64T

By February 2004, Intel announced the EM64T instruction set, formerly known as Yamhill. It was derived from AMD's AMD64. EM64T is generally compatible with code written for the AMD64, though it lacks some AMD64 features; for more details, read the EM64T article. Intel started using the set starting with the Xeon Nocona core in late 2004, introducing it to the desktop market with the Pentium 4 E0 revision in early 2005.

External links

Free IA-32 documentation, provided by INTEL