Streaming SIMD Extensions 4
SSE4 ( Streaming SIMD Extensions 4 ) is an instruction set extension that has been used at AMD since AMD Bulldozer and at Intel since the Penryn variant of the Core 2 processors . The second part, called SSE4.2, was introduced with the Intel Nehalem microarchitecture .
Intel SSE4 consists of 54 commands. The first part of 47 commands appeared under the name SSE4.1. In addition, seven more commands appeared as SSE4.2 from Core i7 from the Nehalem variant.
Instead, AMD added four of its own commands with the K10 architecture and published this set of instructions under the name SSE4a . The processors of the Bulldozer microarchitecture released in October 2011 also fully support SSE4.1 and 4.2 in addition to SSE4a. In return, Intel processors do not support the SSE4a commands to this day.
Instructions
The following is an incomplete list of the newly introduced commands and their areas of application.
SSE 4.1
-
Determine scalar product -
DPPS, DPPD
- z. B. for 3D graphics, games
- Conditional crossfading -
BLENDPS/-D, BLENDVPS/-D, PBLENDVB, PBLENDDW
- z. B. for image processing, multimedia, games
- Determine minima or maxima -
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD
- z. B. for image processing, multimedia, games
-
Integer conversion -
PMOVSXBW/-D/-Q, PMOVZXBW/-D/-Q, PMOVSXWD/-Q, PMOVZXWD/-Q, PMOVSXDQ, PMOVZXDQ
- z. B. for image processing, multimedia, games
SSE 4.2
-
Cyclical redundancy check -
CRC32
- Accelerated checksum calculation . Implements the Castagnoli variant (CRC-32C) and is therefore incompatible with the in IEEE 802.3 standard CRC32 variant which in network protocols (such as Ethernet , V.42 ), SATA , MPEG -2, PNG , and in the UNIX -
cksum
- command to Use comes. CRC-32C is used with iSCSI and the Linux file system Btrfs .
- Extended String Operations -
PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM
- Performance increase for virus scanners , databases and word processing . Since the operations are based on 128-bit operands, as is usual in SSE, only strings up to a maximum length of 16 bytes or up to 8 UCS-2 characters are processed. The possible functions are detailed
- Compare strings
- Find characters from a specified set
- Find characters from specified intervals
- Search whether a string is contained in another
Web links
- Extending the World's Most Popular Processor Architecture - Intel Whitepaper on SSE4 (PDF, 172 kB, last accessed March 17, 2014)
- Intel SSE4 Programming Reference (PDF, 760 kB, last accessed June 8, 2015)