Multi media extension
The Multi Media Extension ( MMX for short ) is a SIMD extension of the IA-32 instruction set brought onto the market by Intel at the beginning of 1997 , in which instructions are always applied to several data at the same time.
The abbreviation MMX originally stood for Matrix Math Extensions , but was renamed Multi Media Extension by Intel for marketing reasons .
MMX does not use any new physical processor registers , but rather reworks the registers of the math coprocessor ( FPU ). Intel designed new instructions for processing integers for MMX 57 and introduced four new vector data formats . MMX instructions support saturation arithmetic .
After its introduction, MMX was only hesitantly supported by the software industry and was already after three years thanks to Intel's own further developments SSE and AMD's 3DNow ! obsolete. Benchmarks for its performance showed a wide range.
Multimedia application requirements
The demands of the multimedia and communication area place different and new demands on a computer system and thus the processor. The processing of the data can usually be parallelized to a high degree. So are z. B. in video editing, the operations for the many individual pixels are identical. Theoretically optimal would be the execution by means of a single command to be applied to all points. The operations required are often not simple, individual instructions, but rather extensive chains of commands. Fading in an image in front of a background is, for example, a complex process consisting of mask formation using XOR , preparation of the background using AND and NOT and the superimposition of the partial images using OR. These requirements are met by providing new, complex commands. So united z. B. the MMX command
PANDNan inversion and AND operation of the form
x = y AND (NOT x).
With MMX, Intel created a new concept for using existing registers, new data formats, an extended instruction set and the choice between various arithmetic options (saturation mode and wrap-around mode). Minor internal differences that do not affect the scope of commands exist between the (not officially named) versions MMX 1.0 and 2.0 of the various Pentium processors. The MMX approach can be found even further developed in the ASICs (where it originally comes from) as well as in the AltiVec units of modern PowerPC CPUs - or on graphics cards .
New data formats
There were four new data formats for MMX: PackedByte , PackedWord , PackedDoubleWord and quadword established so that it is possible to large 64-bit integer - data packets to process at one time. In principle, these formats are just other names for existing formats. The new nomenclature indicates that MMX does not process individual data or numbers, but data fields. In principle, a QuadWord is just a 64-bit field, which could also have been called DoubleLongInt ; a ShortPackedWord is actually a ShortPackedInteger .
Additional 64-bit registers MM0 to MM7 were created for data manipulation, but these are physically identical to the 80-bit registers R0 to R7 of the FPU. MMX only uses eight bytes of the ten byte wide FPU registers (i.e. only the mantissa range of FPU values). The two remaining bytes are set to the hexadecimal value FFFF under MMX . The other FPU registers, such as the 16-bit wide control, status and tag registers, the 11-bit wide Op register and the two 48-bit wide pointer registers have none or one in rare cases in MMX applications limited or differently interpreted meaning of the values occurring here.
Change between FPU and MMX
Before switching to an MMX application, you should first check whether SIMD extensions and especially MMX are supported by the system. This is possible with the CPUID command, which has existed since the Pentium, by transferring the value 1 in the EAX register.
MOV eax, 1 ; Es soll das Feature-Flag abgefragt werden CPUID ; CPUID-Befehl ausführen TEST edx, 00800000h ; Ist Bit 23 im Register edx gesetzt? JNZ MMX_kompatibel ; Wenn ja, dann ist der Prozessor MMX-kompatibel
If you want to use it after a positive test for MMX capability, the next step is to
FXSAVEsave the FPU data in a 512 byte memory area using the command . An MMX application is secured via the two bytes not used by MMX in each register, i.e. H. an FPU application. However, there is no explicit command to switch to an MMX application. Any FPU commands that occur during an MMX application are sent the status NaN (Not a Number) . This means that disruptive FPU commands usually have relatively no consequences.
After the application has ended,
FXSAVEFPU data that was previously backed up should be restored. The MMX command, which is not mandatory and is not always necessary, also exists to signal the release of MMX to pending FPU applications
EMMS. This can also be used within an MMX application - if e.g. B. an MMX application calls an API , which in turn uses FPU commands - may be necessary.
Use in operating systems
In multitasking - operating systems have in a context switch all register contents are saved in a special memory area. Since a change in this memory area should have been supported by all operating systems, a "trick" was used that allows MMX to also be used without operating system support: the MMX registers were mapped externally to the eight floating point registers of the FPU . This means that the actual FPU registers are no longer available as soon as a program uses MMX. Newer instruction set extensions such as SSE use completely separate registers and therefore absolutely require the support of the operating system. The overlap of the floating point registers by the MMX registers can also be switched off in newer processors.
Saturation mode and wrap-around mode
The MMX command set contains commands that use the saturation mode and commands that work in the wrap-around mode . So z. B. the MMX instruction
PADDBadds two packed bytes in wrap-around mode, while
PADDSBdoing the same in saturation mode.
The saturation mode means that a number does not overflow when it exceeds its largest or smallest value, but rather assumes this largest or smallest possible value.
An application example : With a fade-out effect of images, for example, two pixels with a 32-bit color depth could always be darkened by a certain amount at the same time. With saturation you don't have to check whether the pixels are already black (examples: or ). This and the parallel processing of several values can significantly increase the speed of the calculations.
In wrap-around mode , the carry is not taken into account in the event of an overflow or underflow. With a maximum value of one byte (decimal 256), the addition gives the result 1. Expressed in binary , the most significant bit (here in brackets) is not taken into account, which leads to the result 00000001 (i.e. decimal 1).
Specification of the operands
An essential difference between FPU and MMX applications is the form in which the commands receive their operands. Many FPU commands do not have any explicit operands. These fetch the commands via a stack pointer ( top of the stack ) from bits 11 to 13 of the status register. MMX commands, on the other hand, like CPU commands, work with operands explicitly specified after the command.
An MMX instruction can have zero, one or two source and destination operands. These can be MMX registers ( MMX ), general purpose registers ( Reg ), memory locations ( Mem ), or constants ( Const ) of different sizes (8, 16, 32 or 64 bits). Which operands are allowed for a particular instruction differs and is noted in reference books. An indication of how
Befehl Mem32, MMX Befehl MMX, Reg32 Befehl Reg32, MMX
would z. For example, it can say that the operation ( command ) is possible from an MMX register to a 32-bit general-purpose register, from a 32-bit general-purpose register to an MMX register and vice versa.
Most MMX commands are processed in just one processor cycle. The multiplication instructions take three cycles until the result is available; however, a new multiplication can be pushed into the pipeline after each cycle (Pentium MMX to Pentium III).
A total of 24 new commands can handle the various data types, resulting in the 57 commands specified by Intel. Many of these 24 commands differ only in the fact that the signs and the type of overflow are taken into account differently, so that in principle only 15 basic operations remain.
Since MMX works with packed data, most commands start - in order to distinguish them from having F beginning FPU instructions - with a P . In addition to the leading P, MMX commands optionally consist of the letters B , W , D or Q for the data format, a CPU-like command word (such as
CMP) and S for signed or US for unsigned saturation mode. So says z. B. the command
PADDSW: P for packed, ADD for addition, S for signed saturation mode applied to the date of a word. The MMX instruction set includes instructions to:
- arithmetic manipulation of data
- logical manipulation of data
- Data exchange
- Data comparison
- Data conversion
- MMX status
Detailed questions on the instruction set can be found in the Intel Architecture Software Developer's Manual, Volume 2 - Instruction Set , see the Literature section .
For addition in wrap-around fashion are three commands (
PADDD) for the data types PackedByte , PackedWord and PackedDoubleWord . In Saturation mode there are commands for the signed (
PADDSW) and unsigned (
PADDUSW) addition of PackedBytes and PackedWords. A command for the addition of DoubleWords is not available. In both modes there is no indication of an overflow or underflow of the value range, e.g. B. by setting flags .
The commands for subtraction are designed in the same way as for addition.
When multiplying , the problem is that the results can exceed the size of the registers of 64-bit. This was solved by storing the higher and lower parts of the result in two different registers.
PMULLW( Multiply Packed Word and Store Low ) is used for the multiplication and use of the lower-value part and
PMULHW( Multiply Packed Word and Store High ) for the higher-value part .
PMADDWDmultiplies four pairs of 16-bit words and adds up the results in pairs.
With the exception of the carry flag that is not set in MMX, the commands for shifting work in the same way as the shift commands of the CPU, e.g. B. SLL , SRL and SRA . They are only applicable to Words, DoubleWords and QuadWords, but not to Bytes. For the logical shift to the left,
PSLLDand are used for the reverse direction
PSRADare available for arithmetic shifting, and for logical shifting QuadWords
The bit manipulation commands are identical to the CPU commands AND , OR and XOR , only 64 bits, i.e. one QuadWord, are processed at once. There is no MMX equivalent to the CPU command NOT . The only MMX instruction without a correspondence in the CPU instruction set is
PANDNthat represents a negation of the first operand with subsequent AND operation with the second operand in the following form:
x = y AND (NOT x)
Analogous to the CPU command, there
MOVare two commands
MOVQfor DoubleWords and QuadWords for this purpose. Due to the computer architecture - the different sizes of 64-bit MMX registers, 32-bit general-purpose registers and the 32-bit address bus - both commands are subject to certain restrictions with regard to the permissible operands.
MOVDcannot be used to exchange data between two MMX registers because MMX registers only have 64-bit data. So it only allows exchange between an MMX register and 32-bit general-purpose registers and storage locations in both directions. So the possible forms are:
MOVD MMX, Mem32 MOVD Mem32, MMX MOVD MMX, Reg32 MOVD Reg32, MMX
Only the lower bits 0 to 31 of the MMX register are affected. This means that only these bits are used when moving data from an MMX register. When data is shifted to the MMX register, the higher-order part (bits 32 to 64) is deleted, i.e. set to zero.
MOVQallows bidirectional data exchange of all 64 bits between MMX registers and storage locations. Data exchange with the 32-bit general purpose registers is not intended. The possible forms are thus:
MOVQ MMX, MMX MOVQ MMX, Mem64 MOVQ Mem64, MMX
The MMX commands for data comparison are less flexible and powerful than the corresponding CPU and FPU commands. It is only intended to test both operands for equality or to have them checked whether the value in the first operand is greater than in the second. Both comparison options are available for the three formats Byte, Word and DoubleWord. Thus, the following commands yield:
PCMPGTD( EQ represents in each case equal , GT for greater ). Only an MMX register is permitted as the first operand, and an MMX register or a 64-bit memory location as the second.
A major difference to CPU and FPU is the way in which the result of the comparison is transferred. It is not indicated by setting flags or setting individual bits (e.g. in the status register of the FPU), but is stored in the first operand - i.e. an MMX register. If the comparison leads to a true result, the hexadecimal value FF or FFFF or FFFFFFFF is entered there. Otherwise, zeros are inserted. A comparison of two DoubleWords for equality by PCMPEQD MMx, MMy could be expressed in its sequence as follows:
IF MMx [31..00] > MMy [31..00] THEN MMx [31..00] := $FFFFFFFF ELSE MMx [31..00] := $00000000; IF MMx [63..32] > MMy [63..32] THEN MMx [63..32] := $FFFFFFFF ELSE MMx [63..32] := $00000000;
MMX instructions allow the conversion of a date in a smaller or larger, with a conversion into a smaller data format of course, always has a data loss.
- The commands
PACKUSWBfor converting Word to Byte and DoubleWord to Word are available for converting to a smaller date . The most significant bit of the target date is not used to preserve the sign. This means that only half of the value range is available. The commands therefore saturate values that exceed or fall below this range. For example B.
PUNBKHBWall values below −128 to −128 and all values exceeding 127 to 127.
PACKUSWB(Pack with Unsigned Saturation Word to Byte) does not take the sign into account, but saturates anyway.
- Conversion to a larger format is possible from Byte to Word, Word to DoubleWord, and DoubleWord to QuadWord. There is one command each for converting the lower-value and the higher-value part of the data: the former cover the three commands
PUNPCKHDQ, the latter
The three commands for the MMX status
FXSTOREhave no operands.
EMMSis a kind of cleanup command after the termination of an MMX application.
FXSTOREare used to back up and restore FPU-specific data, flags and registers, see also the section Switching between FPU and MMX .
Since MMX commands are not fundamentally different from CPU commands, they can basically throw the same exceptions . FPU-specific, floating-point number- related exceptions such as B. Exceptional situations in the case of denormalizations cannot occur when the registers are used by MMX.
CPUs with MMX
Since MMX is the first extension of the x86 architecture, actually all CPUs of the last few years have MMX. A complete list of all CPUs with MMX would go beyond the scope. At this point, however, reference is made to the list of microprocessors .
Below is an overview of the CPU family from which the respective manufacturers have integrated MMX:
- AMD : from AMD K6
- Centaur Technology : from IDT WinChip C6
- Cyrix : from Cyrix 6x86MX
- Intel : from Intel Pentium MMX
- Rise Technology : Rise mP6
- Transmeta : from Transmeta Crusoe
In order to implement the expanded and increased potential of a new processor concept such as MMX in optimized application software , it is necessary that the expanded possibilities of the machine language are also supported by the new versions of the various higher programming languages of the most varied of abstraction levels and their compilers .
On the one hand, the languages can be limited to implementing the possibilities of MMX in the compilation process, but not expanding the instruction set of the respective language. This changes very little for the programmer. For reasons of downward compatibility , he only has to specify before compilation whether or not MMX should be used in the target code .
However, a language can also expand its instruction set and implement new concepts and instructions for writing the source code that support the strengths of MMX . So z. B. Free Pascal predefined array types especially for MMX and 3DNow! ready. Vector Pascal enables parallel operations on data.
In the language area close to the system, the Microsoft Macro Assembler supported the new possibilities of MMX already nine months after the market launch of MMX in version 6.12. The flat assembler and NASM also later supported MMX. Intel supported MMX relatively quickly in its own C compilers and later in C ++ . The VectorC compiler from Codeplay also supports vectorization and optimizes C source code when translating for MMX. Other programming languages followed later with the implementation of the possibilities of MMX. However, MMX support in Microsoft's C ++ compiler was no longer adopted for 64-bit applications.
Use in software
MMX, like AMD's 3DNow !, was not used by the software industry to the extent that Intel had hoped for. Only a few products have an explicit note such as “Optimized for MMX”. It was most likely to be used in games and video applications such as B. Ulead VideoStudio used. One of the applications that implemented the MMX capabilities relatively quickly was Adobe Photoshop (see also the Performance section ).
Performance information is heavily dependent on the respective overall system, the tested application areas and applications, the algorithms used , the test method or the testing company, and many other boundary conditions. Intel itself promises 10–20% more performance with MMX processors with conventional software and up to 60% more with MMX-optimized software. However, especially with 3D graphics with lots of floating point calculations, MMX (see also graphic) hardly brings any increase in performance, since switching between MMX and FPU arithmetic ("context switch") with up to 50 clock cycles can take a relatively long time.
In 2000, Sreraman and Govindarajan determined performance increases of a factor of 2 to 6.5 for MMX with regard to vectorization under the C language . When using Intel's own program libraries for signal and image processing, MMX brings improvements in performance by a factor of 1.5 to 2, for graphics applications between 4 and 6. According to other studies, the use of MMX brings performance advantages of factors between 1.2 and 1, 75. With MPEG decoding, according to Intel, the performance gain through MMX is limited to 40 percent. Thus, MMX can only bring significant performance advantages over non-optimized software for certain tasks.
Test results can vary significantly even when comparing different versions of the same software. A test of Adobe Photoshop Version 4.0, which has been optimized for MMX, showed performance gains of between 5 and 20 percent for most filters . In version 4.0.1, however, some actions under MMX ran surprisingly slower than without MMX support.
According to MMX
MMX was soon able to meet the increased requirements of rapidly changing graphics in high-resolution form, such as B. Set games are no longer enough. That is why Intel introduced SSE technology with the introduction of the Pentium III processor in early 1999. Eight - also physically - new, CPU and FPU-independent 128-bit registers were created. The MMX instruction set has been expanded and completely new commands have been created. SSE also extended the exclusive work of MMX with whole numbers (integers) to floating point numbers. Later successor versions also continuously expanded the capabilities of SSE.
Introduced by AMD in 1998 with the AMD K6-2 , 3DNow ! Like MMX, it used the registers of the FPU, but in an FPU-appropriate way for processing floating point numbers . The subsequent versions of 3DNow! eliminated incompatibilities with Intel's SSE concept.
Extension of the MMX instruction set under SSE
With SSE , twelve new commands were introduced for MMX mode, which do not work with the new XMM registers from SSE, but exclusively with the old MMX or FPU registers.
PAVGWform the rounded mean value of two operands.
PINSRWare used to extract and insert words.
PMINUBto calculate the minima and maxima of two signed or unsigned bytes Words.
PMOVMSKBcreates a mask from the most significant bits of a short packed byte.
PMULHUWworks like the old command
PMULHW, but uses two unsigned words.
PSADWBcalculates the absolute values of the differences between their individual bytes for two values and then adds the sum of these differences.
PSHUFWmixes the individual components of two 64-bit values according to rules that are transferred via a third instruction operand.
SSE2 to SSE4
With SSE2 , a standardized instruction set was implemented that can be used on the 128-bit XMM as well as on the 64-bit wide MMX registers. Some commands even allow the simultaneous use of both register groups, e.g. B. the conversion command
CVPD2PI MMX, XMM. With SSE4 , however, support for MMX was ended.
- David Bistry, Carole Delong, Mickey Gutman: Complete Guide to MMX Technology . McGraw-Hill, 1997, ISBN 0-07-006192-0
- Richard Blum: Professional Assembly Language . Wiley Publishing, 2005, ISBN 0-7645-7901-0
- Paul Cockshott, Kenneth Renfrew: SIMD Programming Manual for Linux and Windows . Springer, Berlin 2004, ISBN 1-85233-794-X
- Rohan Coelho, Maher Hawash: DirectX, RDX, RSX, and MMX Technology - A Jumpstart Guide to High Performance APIs . Addison-Wesley, Amsterdam 1997, ISBN 0-201-30944-0
- Randall Hyde: The Art of Assembly Language . No Starch Press, 2003, ISBN 1-886411-97-2
- Intel: Intel Architecture Software Developer's Manual, Volume 1 - Basic Architecture , catalog number 243190, 1999
- Intel: Intel Architecture Software Developer's Manual, Volume 2 - Instruction Set , order number 243191, 1999
- Intel: Intel Architecture Software Developer's Manual, Volume 3 - System Programming Guide , Order Number 243192, 1999
- Trutz Eyke Podschun: The assembler book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, ISBN 3-8273-2026-7
- Shreekant S. Thakkar: Programmer's Guide for Internet Streaming SIMD Extensions . Wiley & Sons, 2000, ISBN 0-471-37524-1
- Bliss Sloan: Developing for MMX Technology . Que, 1997, ISBN 0-7897-1302-0
- Ralf Weber: Configure Pentium, MMX, AMD . Sybex, 1997, ISBN 3-8155-7106-5
- Joachim Rohde: Assembler GE-PACKT . mitp-Verlag, 2001, ISBN 3-8266-0786-4 .
Joachim Rohde: Assembler GE-PACKT . 2nd Edition. mitp-Verlag, 2007, ISBN 978-3-8266-1756-0
- Rasmus Hahn, Bernd Peterson, Andreas Micklei: Processor extensions for multimedia - workstation architectures for multimedia systems WS 96/97
- Andreas Roskosch: Processors . Elaboration of a proseminar at the Technical University of Chemnitz, 1997
- Overview of Intel Pentium MMX processors
- Bernd Leitenberger: SIMD and VLIW . Overview of some SIMD technologies
- Jens Hohmuth: MMX tutorial . Instructions for using the MMX at the Westsächsische Hochschule Zwickau, dated January 2, 1999
- Pei Qi, Yang Wang: Accelerating 3D Geometry Transformation with Intel MMX Technology . (PDF; 195 kB)
- Controversy brews over use of MMX moniker ( Memento from July 19, 2012 in the web archive archive.today ). Bnet, January 6, 1997
- Richard Blum: Professional Assembly Language . Wiley Publishing, 2005, p. 482
- Intel Architecture Software Developer's Manual, Volume 1 - Basic Architecture , Order No. 243190, 1999, Chapter 8: Programming with the Intel MMX Technology , p. 216 f.
- Trutz Eyke Podschun: The assembly book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, pp. 274 f.
- Trutz Eyke Podschun: The assembler book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, pp. 276-278
- Randall Hyde: The Art of Assembly Language , No Starch Press, 2003, pp. 710-712
- Jens Hohmuth: MMX tutorial ( memento of the original from February 8, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. , Instructions for using the MMX at the Westsächsische Hochschule Zwickau, dated January 2, 1999
- Randall Hyde: The Art of Assembly Language . No Starch Press, 2003, p. 734
- Don Brumm, Leo J. Scanlon: 80486 Programming . Windcrest, 1991, p. 24
- Intel Architecture Software Developer's Manual, Volume 1 - Basic Architecture , Order No. 243190, 1999, Chapter 8: Programming with the Intel MMX Technology , page 221 ff.
- Trutz Eyke Podschun: The assembly book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, p. 281
- Klaus Wüst: Microprocessor Technology - Basics, Architectures and Programming of Microprocessors, Microcontrollers and Signal Processors . vieweg, 2006, pp. 214-218
- Richard Blum: Professional Assembly Language . Wiley Publishing, 2005, p. 488 ff.
- Richard Blum: Professional Assembly Language . Wiley Publishing, 2005, p. 494
- Trutz Eyke Podschun: The assembly book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, p. 296 ff.
- Randall Hyde: The Art of Assembly Language . No Starch Press, 2003, p. 718 ff.
- David Bistry, Carole Delong, Mickey Gutman: Complete Guide to MMX Technology . McGraw-Hill, 1997, p. 138
- Paul Herrmann: Computer architecture , vieweg, 2002, page 417
- Intel Pentium MMX processors on cpu-collection.de
- Shreekant S. Thakkar: Programmer's Guide for Internet Streaming SIMD Extensions . Wiley & Sons, 2000, p. 72
- Free Pascal Programmer's Guide , Section 5.1: Intel MMX support - What is it about? FreePascal.org
- Larry Carter, Jeanne Ferrante: Languages and Compilers for Parallel Computing . P. 400
- Introduction to MMX Programming The Code Project, examples for using MMX with C ++
- Codeplay VectorC Compiler Technology . ( Memento of the original from May 9, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. Code play
- MMX Technology Microsoft Developer Network
- Klaus Dembowski: PC workshop - boards, memory, processors . Markt + Technik, 2005, p. 711
- David J. Lilja: Measuring Computer Performance - A pracitioner's guide . Cambridge University Press, 2000, pp. 2 ff.
- Andreas Roskosch: Processors , section “MMX in a performance comparison” ( Memento of the original from July 12, 2010 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice.
- Intel Introduces 11TH Microprocessor with MMX Technology . ( Memento of the original from September 8, 2008 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. Berkeley Wireless Research Center
- David Bistry, Carole Delong, Mickey Gutman: Complete Guide to MMX Technology . McGraw-Hill, 1997, p. 291
- Paul Cockshott, Kenneth Renfrew: SIMD Programming Manual for Linux and Windows . Springer, Berlin 2004, p. 23
- Alan Conrad Bovik: Handbook of Image and Video Processing . 2005, p. 636
- R. Bhargava, R. Radhakrishnan, BL Evans, L. John: Characterization of MMX-enhanced DSP and Multimedia Applications on a General Purpose Processor. Digest of the Workshop on Performance Analysis and Its Impact on Design held in conjunction with ISCA98 . ( Page no longer available , search in web archives ) Info: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice. (PDF) University of Texas at Austin
- Jennis Meyer-Spradow, Andreas Stiller: Großspurig - A critical look at MMX . ( Memento of July 8, 2001 in the Internet Archive ) In: c't issue, 1/97, p. 228
- Less MMX in Photoshop . heise online, June 9, 1997
- Trutz Eyke Podschun: The assembler reference - coding, decoding and reference . Addison-Wesley, 2002, pp. 231-249
- Intel Architecture Software Developer's Manual, Volume 2 - Instruction Set . Section 9.3.6: Additional SIMD Integer Instructions , page. 246
- Trutz Eyke Podschun: The assembly book - basics, introduction and high-level language optimization . Addison-Wesley, 2002, p. 345 ff.