x86 virtualization

from Wikipedia, the free encyclopedia

x86 virtualization refers to hardware and software-based mechanisms to support virtualization for processors based on the x86 architecture . Using a hypervisor , it allows several operating systems to run in parallel on an x86 processor and to divide the resources in an isolated and efficient manner between the parallel operating systems. With full virtualization, the (guest) operating systems should not be able to recognize any difference between virtualized (parallel) operation and (exclusive) operation directly on the hardware.

Development of x86 virtualization

Since the late 1990s, virtualization for x86 processors has been achieved through complex software implementations , which were necessary because the processor models of that time lacked hardware support for virtualization. Only in 2006 did AMD ( AMD-V ), followed by Intel ( VT-x ) announce the introduction of hardware support for virtualization. However, the first versions of the implementation offered only very slight speed advantages compared to the virtualization solutions implemented purely on the software side. Better virtualization support on the hardware side was only achieved later with the development of newer processor models.

Software-based virtualization

In order to be able to allocate resources exclusively to the guest systems running in parallel, only the host operating system or the hypervisor may be granted direct access to the processor hardware, while the guest systems, like all other applications, may only have limited access rights to the hardware. In particular, this prevents the guest systems from seeing or changing memory areas that the hypervisor needs for management.

With the 80286 processor, the so-called protected mode was introduced in the x86 world . With it, four different levels of protection or privilege levels, known as rings, were introduced, which grant the code segments running on them different rights. Only with the introduction of this concept was it possible to implement virtualization on the basis of the x86 architecture: In protected mode, the operating system kernel runs in a more privileged mode, which is referred to as ring 0 , and applications in a less privileged mode, in usually either ring 1 or ring 3.

The hypervisor or the host operating system are executed with ring 0 authorization due to their privileged position in resource management. In order to guarantee the protection of the hypervisor resources, guest systems must therefore be executed either at authorization level Ring 1 (in the so-called 0/1/3 model) or Ring 3 (in the so-called 0/3/3 model).

Deprivation

Since operating systems for the x86 architecture (which, as a guest system, must not see any difference between virtualized operation and operation directly on the hardware) are implemented in such a way that they are based on ring- 0 authorization and only then function correctly, the virtualization solution must have two Implement features, namely ring deprivation and ring aliasing :

  • The ring deprivileged system ensures that the guest system can execute all commands as if it had ring 0 authorizations on the hardware, even though it has less privileged authorizations due to the virtualization.
  • Ring aliasing ensures that when the guest system takes an action, it always receives the response it would receive if the command had been executed with ring 0 privileges. For example, there is a command to query the privilege level, which can be called with all authorization levels. If a guest system were to call this command without ring aliasing, it would receive ring 1 or 3 as a response, with ring aliasing it would receive ring 0.

Primary and shadow structures

The hypervisor also needs its own storage areas in which management data, e.g. B. to the states of the various VMs can be saved. It must be ensured that the memory areas belonging to the VM are visible to them and, if necessary, can also be changed, but the stored management data of the hypervisor must not be visible or even be changed. Rather, the memory must appear as if it were used exclusively by the respective VM. In order to guarantee this, the corresponding memory areas are kept several times: In the primary structure, the hypervisor data is stored in the secondary or shadow structures of the VMs that are available for each VM. For processor registers, accesses are normally intercepted (trapped) by the hypervisor and the state of the processor is emulated via the shadow structure. Every time the VMs access memory, the hypervisor must check whether it is a particularly protected memory area and, if necessary, make the data available from the shadow structure of the respective VM instead of the primary structure, but without the VM being able to determine this from its point of view . This technique is also known as tracing.

Software-based full virtualization for the x86 architecture

In order to implement these functions, a method that works according to the trap-and-emulate method is normally provided on the hardware side in the processor. In the x86 architecture until 2006 (afterwards see here ) there was no hardware support for virtualization available, so that the above. Functions had to be implemented on the software side. However, the trap-and-emulate method cannot be implemented on the software side without hardware support in the processor, so that a different approach had to be taken for software-based virtualization:

  • The so-called binary code translation is used to translate instructions of the guest system at processor instruction level from ring 3 / ring 1 instructions into corresponding ring 0 instructions of the host system - in a suitable manner to avoid ring deprivation and implement ring aliasing.
  • A number of important data structures used by the processor must be shadowed. Since most guest operating systems use paged virtual memory and direct access to the memory areas would lead to overwriting of data from the hypervisor or other VMs, some of the things that are normally done by the processor's memory management unit must be implemented again in the software of the hypervisor , to prevent this. In particular, it is necessary to prevent the guest systems from directly accessing the primary page tables by intercepting all accesses and emulating them on the software side. The x86 architecture also uses hidden states (processor status data that is not stored in processor registers, but in memory outside the processor) in order to temporarily store segment descriptors of the processor and, if necessary, to restore them. As soon as the memory areas have been loaded into the processor in order to restore the segment descriptors, the memory area originally used for the hidden state is released and can be used immediately, e.g. B. be overwritten by application processes. This is why shadow descriptor tables must be implemented so that changes to these memory areas by the VMs can be traced.
  • I / O devices of the guest systems that are not supported in the host operating system must be emulated by appropriate software emulators on the host operating system.

In order to implement these complex tasks on the software side, the first virtualization products were designed as type 2 hypervisors for use on workstation computers. The hypervisor ran in a kernel module on the host operating system. As a result, at least no drivers had to be developed for the host hardware, since a great deal of effort was required to implement the methods described above.

This type of implementation of the hypervisor led to a lower relative performance of the VMs (in relation to the performance of the host processor), in particular due to the reimplemented parts of the MMU on the software side compared to the performance of VMs on CPU architectures that already provide for a virtualization of the MMU on the hardware side, such as z. For example, the IBM - System / 370 architecture

There was also a controversial scientific discussion about whether the x86 architecture without hardware-supported virtualization features as described here even meets the requirements for virtualization according to the criteria established by Popek and Goldberg . VMware researchers showed in an ASPLOS article in 2006 that the techniques presented above make the x86 platform virtualizable in the sense of the three criteria established by Popek and Goldberg , but not with the help of the classic "trap-and-" described by Popek and Goldberg. emulate “technology.

Software-based paravirtualization for the x86 architecture

Another approach to implementing virtualization has been taken by hypervisors such as Denali, L4 and Xen . In order to simplify the implementation, a basic requirement was not implemented, namely the guest operating system should be able to run unchanged on both a virtualized and a non-virtualized system. Special versions of the guest operating systems were developed, which were coordinated for operation with the respective hypervisor. These hypervisors do not virtualize aspects of the x86 architecture that are particularly difficult to implement and impede performance, e.g. B. I / O virtualization. This approach, known as paravirtualization , can bring significant performance gains, as demonstrated in the 2003 SOSP- Xen paper. Paravirtualization still plays an important role today, especially in the embedded environment .

Software-based full virtualization for the x86-64 architecture

The first version of x86-64 from AMD ( AMD64 ) no longer allowed exclusively software-based virtualization, as it no longer offered support for segmentation in long mode (i.e. for 64-bit addressing) and thus the protection of the memory of the purely software-based hypervisor not allowed .. The revisions D and all subsequent 64-bit AMD processors (roughly speaking all chips designed in 90 nm technology and below) were equipped with basic support for segmentation in long mode, which means 64-bit guest systems 64-bit host systems could be virtualized via binary code translation.

Intel also did not offer any support for segmentation in long mode for its x86-64 processors, which, as with AMD's first chips, did not allow software-based 64-bit virtualization. In contrast to AMD, however, Intel was already offering hardware-supported virtualization for its 64-bit processors with VT-x at this point in time.

Hardware-assisted virtualization

In 2005 and 2006, respectively, Intel and AMD launched (independently of each other) processor models with instruction set extensions for virtualization support. The first generation of these processors primarily addressed the problem of deprivation . Improvements to virtualized system memory management for VMs were added in later processor models. This includes, in particular, the hardware expansion of certain memory registers in order to enable virtual machines to access these resources directly without going through the Virtual Machine Manager (VMM). In the years that followed, this technology was then adapted under various names, mainly to server chipsets and server network cards.

Processors (CPU)

Virtual 8086 mode

Due to the great difficulties with the protected mode of the 80286 , which itself was only partially suitable for running several MS-DOS applications in parallel , Intel introduced the Virtual 8086 mode with the 80386 processor , which enabled a virtualized 8086 environment. Hardware support for the virtualization of Protected Mode was implemented by Intel in the processor instruction set a good 20 years later.

The Virtual 8086 mode can, however, be recognized by the software and allows the programs access to the expansions that were introduced with the 286 later processor generations.

AMD virtualization (AMD-V)

AMD developed the first generation of instruction set extensions for virtualization support under the name "Pacifica" and finally brought them onto the market under the name AMD Secure Virtual Machine (SVM). The technology was later renamed again and is still marketed under the name AMD Virtualization - or AMD-V for short .

On May 23, 2006, AMD launched the Athlon 64, Athlon 64 X2 and Athlon 64 FX as the first processors with AMD-V support.

AMD-V is also available on the Athlon 64 and Athlon 64 X2 processor families with revision numbers "F" and "G", based on the AM2 socket , Turion 64 X2 and Opteron processors of the 2nd and 3rd generation, as well as the processors Phenom and Phenom II available. The AMD Fusion processor family also supports AMD-V. The only Sempron processors that support AMD-V are the Huron and Sargas versions. AMD-V is not supported by any processors with a 939 socket .

AMD Opteron CPUs from the 0x10 Barcelona Line and Phenom II CPUs support an advanced virtualization technology that AMD calls " Rapid Virtualization Indexing " (during development it was called "Nested Page Tables"). Intel later introduced a similar technology called Extended Page Tables (EPT). The technology, generally known as “ Second Level Address Translation ” (SLAT for short), supports page table virtualization, which primarily solves the problem of shadow structure synchronization for VMs to be implemented on the software side.

AMD A4 series notebook processors, such as the A-9120, incorporate AMD virtualization.

Intel Virtualization Technology (VT-x)

Intel Core i7 (Bloomfield) CPU

At the beginning still under the code name “Vanderpool”, the technology finally called “VT-x” provides hardware support for virtualization on Intel x86 processors. On November 13, 2005, Intel introduced the first two processors with VT-x support with the models 662 and 672 of the Pentium 4 series. At the same time, a comparable technology for the Itanium processor family was presented under the name “VT-i”.

One of the most important innovations by VT-x is the introduction of a further authorization concept, intended exclusively for virtualization, in addition to the ring concept. Two new authorization levels “VMX Root Operation” and “VMX non Root Operation” are introduced. The hypervisor is executed in the “VMX Root Operation”, whereas VMs are executed in the “VMX non Root Operation”. In both modes, Ring-0 to Ring-3 are available as authorizations - however, Ring-0 instructions that are executed by VMs in the “VMX non Root Operation” can now be caught by the hypervisor in the “VMX Root Operation” it is therefore an implementation of the "trap-and-emulate" method. This solves the problem of deprivation and no longer has to be implemented on the software side using binary translation.

However, not all Intel processors still support VT-x. Whether or not VT-x is supported can even vary for different versions (identifiable by Intel's sSpec Number ) of the same processor model. Even in May 2011, the P6100, which was primarily designed for laptop use, did not support VT-x. A complete list of all Intel processors with VT-x support can be found on Intel's own website.

With some motherboards , Intel's VT-x feature must also be activated explicitly via the BIOS settings.

With the Nehalem processor family, Intel introduced a technology called Extended Page Tables (EPT) by Intel itself . The technology, generally known as “ Second Level Address Translation ” (SLAT for short), supports page table virtualization, which primarily solves the problem of shadow structure synchronization for VMs to be implemented on the software side.

With the Westmere processor series, Intel has added a feature that allows logical processors to be started directly in "real mode". The feature is called “Unrestricted Guest” by Intel and requires the previously introduced EPT feature.

A technology known as VMCS shadowing has allowed hardware-supported nested virtualization with processors from the Haswell processor family since its introduction : The so-called Virtual Machine Control Structure (VMCS), a memory structure that exists exactly once for each VM, is managed by the VMM, i.e. each time the execution context changes from one VM to another, the respective VMCS is restored and defines the state of the virtual machine. As soon as more than one VMM or a VMM is nested in a VMM, a problem similar to that with page table access (which was solved with EPT, RVI or Second Level Address Translation) arises: the VMCS structure must now be shadowed several times (within the guest VMM, the VMM and again on the actual processor or memory). In order to reduce the effort involved, hardware-supported shadow VMCS were introduced with the Haswell generation.

VIA virtualization (VIA VT)

With the VIA-Nano 3000 processors and later processors, VIA introduced hardware support for virtualization, known as "VIA VT", which is compatible with Intel's VT-x expansion.

Interrupt virtualization (AMD AVIC, Intel APICv)

In 2012, AMD announced their instruction set extension called Advanced Virtual Interrupt Controller ( AVIC ), which aims to implement the management and virtualization of interrupts more efficiently through hardware support. However, there are still no AMD processors that implement this extension.

Also in 2012, Intel announced a comparable technology for interrupt virtualization, which initially had no name of its own. It was later called APIC virtualization ( APICv ) and was first implemented in the Ivy Bridge processor family, which are sold under the name Xeon E5-26xx v2 (available since late 2013) and Xeon E5-46xx v2 (available since early 2014) .

Graphics processors (GPU)

Graphics processors virtualization technology (Intel GVT-d, GVT-g, GVT-s)

With the integrated GPU Intel Iris Pro, Intel announced on January 1, 2014 a technology (referred to as Intel GVT-d, GVT-g and GVT-s) for hardware-based support of virtualization for graphics processors . Intel's integrated graphics processor Iris Pro can either be explicitly assigned to a VM with the help of the GVT-d extension or shared on a timesharing basis between several VMs, whereby the native graphics driver can be used (GVT-g) or between several VMs using a virtual graphics driver be shared (GVT-s).

PC chipset

Memory and I / O virtualization must be supported by the chipset , as this also provides the corresponding functions on the hardware side for the processor. Normally this feature has to be activated in the BIOS , and the BIOS has to be able to support and use these functions. This also means that the BIOS must be in a version that is adapted to the virtualization functions of the chipset.

I / O MMU virtualization (AMD-Vi and VT-d)

An input / output memory management unit (IOMMU) allows guest VMs to directly use peripheral devices, e.g. B. network cards, graphics cards, hard disk controllers through the mapping of memory accesses and interrupts. This technique is sometimes referred to as PCI passthrough .

An IOMMU also allows operating systems and VMs to use buffering more easily peripheral devices whose memory or processing speed is lower than that of the VM or operating system. The corresponding mechanisms are implemented by the IOMMU and therefore do not have to be implemented by the operating systems or VMs.

Both AMD and Intel have released corresponding specifications:

  • AMD's I / O virtualization technology, AMD-Vi, originally called IOMMU.
  • Intel's Virtualization Technology for Directed I / O (VT-d), which is supported by most high-end (but not all) Nehalem and newer Intel processors.

In addition to the support from the CPU, the mainboard, the chipset as well as the BIOS or UEFI must support the IOMMU virtualization functions in order to actually make them usable.

Network virtualization (VT-c)

Intel's "Virtualization Technology for Connectivity" (VT-c). is a generic term for several technologies (especially VDMQ and SR-IOV) to simplify network management and accelerate network access for the hypervisor or guest VMs.

Virtual Machine Device Queues (VMDQ)

In order to be able to assign the network traffic to the correct virtual machine, the hypervisor needs a function comparable to a network switch to distribute the network traffic to the guest VMs. To support this function on the hardware side, Intel has already implemented a mechanism with VMDQ in the Intel Ethernet controller that takes over this distribution for the hypervisor and thus simplifies and accelerates handling. Each VM is assigned a separate queue for “its” network packets within the network adapter, which simplifies and accelerates the source and destination identification for network packets. The source and target identification as well as the necessary copying of the network packets in the memory between the queues and the VMs is done by a virtual switch within the hypervisor.

PCI-SIG Single Root I / O Virtualization (SR-IOV)
SR-IOV components

PCI-SIG Single Root I / O Virtualization (SR-IOV) provides a set of (not specifically designed for x86) I / O virtualization methods based on PCI Express (PCIe) hardware that are standardized by the PCI-SIG: The technology enables the parallel use of a single Intel Ethernet server adapter port through several virtual functions. IT administrators can use virtual ports provided in this way to establish several separate connections to virtual machines:

  • Address translation services (ATS) supports native IOV over PCI Express through address translation . Use by software requires the support of a new type of transaction in order to enable address translation.
  • Single-root IOV (SR-IOV or SRIOV) supports native IOV in existing single-root PCI Express topologies. Use by software requires the support of new device properties in order to manage virtualized configurations.
  • Multi-root IOV (MR-IOV) supports native IOV in new topologies (e.g. blade servers)

The SR-IOV functionality is implemented in different layers, which can be divided as follows:

  1. Virtual machine with network card based on virtual functions (VM)
  2. Interface with the virtual machine (VM)
  3. Management application (hypervisor)
  4. Management virtual machine (hypervisor)
  5. Hardware functions (network card with activated SR-IOV support)
  6. Virtual functions, derived from hardware functions (network card with activated SR-IOV support)
  7. External network

Web links

Individual evidence

  1. Keith Adams, Ole Agesen: A Comparison of Software and Hardware Techniques for x86 virtualization . (PDF) VMware. ASPLOS'06 October 21-25, 2006, San Jose, California. "Surprisingly, we find that the first-generation hardware support rarely offers performance advantages over existing software techniques. We ascribe this situation to high VMM / guest transition costs and a rigid programming model that leaves little room for software flexibility in managing either the frequency or cost of these transitions. "
  2. a b c Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization . Intel.com. August 10, 2006. Retrieved May 2, 2010.
  3. a b c d e f A Comparison of Software and Hardware Techniques for x86 Virtualization (PDF) VMware. Retrieved September 8, 2010.
  4. ^ A b Gerald J. Popek and Robert P. Goldberg: Formal Requirements for Virtualizable Third Generation Architectures . In: Communications of the ACM . 17, No. 7, 1974, pp. 412-421. doi : 10.1145 / 361011.361073 .
  5. USENIX Technical Program - Abstract - Security Symposium - 2000 . Usenix.org. January 29, 2002. Retrieved May 2, 2010.
  6. Virtualization: architectural considerations and other evaluation criteria (PDF) VMware. Retrieved September 8, 2010.
  7. US Patent US 6397242 B1 - Virtualization system including a virtual machine monitor for a computer with a segmented architecture
  8. a b US Patent US 6496847 B1 - System and method for virtualizing computer systems
  9. VMware and Hardware Assist Technology (PDF). Retrieved September 8, 2010.
  10. Xen and the Art of Virtualization (PDF) Retrieved September 2, 2014.
  11. How retiring segmentation in AMD64 long mode broke VMware . Pagetable.com. November 9, 2006. Retrieved May 2, 2010.
  12. VMware and CPU Virtualization Technology (PDF) VMware. Retrieved September 8, 2010.
  13. VMware KB: Hardware and firmware requirements for 64bit guest operating systems . Kb.vmware.com. Retrieved May 2, 2010.
  14. Software and Hardware Techniques for x86 Virtualization (PDF) Archived from the original on January 5, 2010. Retrieved May 2, 2010.
  15. ^ Tom Yager: Sending software to do hardware's job | Hardware - InfoWorld . Images.infoworld.com. November 5, 2004. Archived from the original on September 14, 2014. Retrieved January 8, 2014.
  16. 33047_SecureVirtualMachineManual_3-0.book (PDF) Accessed May 2, 2010.
  17. What are the main differences between Second-Generation AMD Opteron processors and first-generation AMD Opteron processors? publisher = Amd.com . Archived from the original on April 15, 2009. Retrieved on February 4, 2012.
  18. What virtualization enhancements do Third-Generation AMD Opteron processors feature? . Amd.com. Archived from the original on April 16, 2009. Retrieved February 4, 2012.
  19. Jon Stokes: Microsoft, Intel goof up Windows 7's "XP Mode" . Arstechnica.com. May 8, 2009. Retrieved May 2, 2010.
  20. ^ Processor Spec Finder . Processorfinder.intel.com. Retrieved May 2, 2010.
  21. Intel Processor Number Details . In: Intel . Intel. December 3, 2007. Retrieved October 3, 2008.
  22. Intel Pentium P6100 (3M cache, 2.00 GHz) . Ark.intel.com. Retrieved February 4, 2012.
  23. About Intel Virtualization Technology - accessed August 23, 2014
  24. Windows Virtual PC: Configure BIOS . Microsoft. Retrieved September 8, 2010.
  25. Gil Neiger, A. Santoni, F. Leung and a .: Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization. Archived from the original on March 17, 2008. In: Intel (Ed.): Intel Technology Journal . 10, No. 3, August, pp. 167-178. doi : 10.1535 / itj.1003.01 . Retrieved July 6, 2008.
  26. ^ First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem). (PDF) Intel, accessed July 6, 2008 .
  27. Technology Brief: Intel Microarchitecture Nehalem Virtualization Technology (PDF) Intel. March 25, 2009. Retrieved November 3, 2009.
  28. ^ Matt Gillespie, Best Practices for Paravirtualization Enhancements from Intel Virtualization Technology: EPT and VT-d . In: Intel Software Network . Intel. November 12, 2007. Retrieved July 6, 2008.
  29. Intel added unrestricted guest mode on Westmere micro-architecture and later Intel CPUs, it uses EPT to translate guest physical address access to host physical address. With this mode, VMEnter without enable paging is allowed.
  30. If the 'unrestricted guest' VM-execution control is 1, the 'enable EPT' VM-execution control must also be 1. (PDF).
  31. 4th Gen Intel Core vPro Processors with Intel VMCS Shadowing. (PDF) Intel, accessed on August 17, 2013 .
  32. Understanding Intel Virtualization Technology (VT). ( Memento of September 8, 2014 in the Internet Archive ) ( MS PowerPoint ) Retrieved September 1, 2014.
  33. ^ The 'what, where and why' of VMCS shadowing . Retrieved September 1, 2014
  34. VIA Introduces New VIA Nano 3000 Series Processors . ( Memento of January 22, 2013 in the Internet Archive ) via.com
  35. ^ Wei Huang: Introduction of AMD Advanced Virtual Interrupt Controller . XenSummit 2012
  36. Jörg Rödel: Next-generation Interrupt Virtualization for KVM . AMD. August 2012. Retrieved July 12, 2014.
  37. Jun Nakajima: Reviewing Unused and New Features for Interrupt / APIC Virtualization . Intel. December 13, 2012. Retrieved July 12, 2014.
  38. ^ Khang Nguyen: APIC Virtualization Performance Testing and Iozone . December 17, 2013. Retrieved July 12, 2014.
  39. Product Brief Intel Xeon Processor E5-4600 v2 Product Family . Intel. March 14, 2014. Retrieved July 12, 2014.
  40. ^ Sunil Jain: Intel Graphics Virtualization Update . Intel. May 2014. Retrieved May 11, 2014.
  41. Intel platform hardware support for I / O virtualization . Intel.com. August 10, 2006. Retrieved February 4, 2012.
  42. Linux virtualization and PCI passthrough . IBM. Retrieved November 10, 2010.
  43. AMD I / O Virtualization Technology (IOMMU) Specification Revision 1.26 . Retrieved May 24, 2011.
  44. Intel Virtualization Technology for Directed I / O (VT-d) Architecture Specification (PDF) Retrieved February 4, 2012.
  45. Intel Virtualization Technology for Directed I / O (VT-d) Supported CPU List . Ark.intel.com. Retrieved February 4, 2012.
  46. Intel Virtualization Technology for Connectivity (VT-c) . Intel.com. Retrieved May 27, 2014.
  47. a b c Intel Connectivity Virtualization Technology. Retrieved September 1, 2014
  48. Are VMDq and SR-IOV performing the same function?
  49. PCI-SIG I / O Virtualization (IOV) Specifications . Pcisig.com. March 31, 2011. Retrieved February 4, 2012.