Slab allocator
The slab allocator is a method of managing memory that is used by many operating systems and also by applications . The aim of the algorithm is to use the available main memory as efficiently as possible, i.e. with little waste , when small memory areas are reserved frequently .
Note: Some details of this technical description of the slab allocator are based on the implementation in the Linux kernel 2.6.10-rc2. These can differ from other systems and may not be technically up to date.
history
The slab allocator was developed in 1994 by Jeff Bonwick for SunOS 5.4. Since he has publicly documented his approach, it has been adopted by many other operating systems (e.g. Linux ) and applications (e.g. Perl ). In 2001 , Bonwick released a significantly improved version that was also publicly documented.
In 1999 the slab allocator found its way into the Linux kernel. The current implementation already contains some elements from Bonwick's 2001 version, but this is not yet fully implemented.
idea
The aim of memory management should be to use the available memory as efficiently as possible while enabling the required objects to be released and reserved quickly. The problem of fragmentation should also not be neglected: During operation, as few small, unused memory areas as possible should arise between the used ones. For technical reasons, the memory is organized in large ( IA-32 : 4096 bytes ) memory pages , which is not particularly efficient for dynamically requested memory.
Properties of core objects
Most of the objects that an operating system uses are relatively small and have a limited lifespan, so they are returned to the system after a short period of time. In addition, multiple objects of the same type are often required.
Private caches
In his elaboration, Bonwick distances himself from the idea of private caches , i.e. temporary storage that every process or kernel module in the operating system manages itself. Although a process itself has the best overview of its own storage requirements, it has almost no overview of the storage situation in the system as a whole and none of the requirements of other processes.
Clients & Central Allocator
Bonwick therefore divides memory management into two parts. The central allocator provides functions to manage the memory. He ensures the most efficient use possible. Opposite are the clients . These only describe the required objects and do not have to worry about details.
Structure of the central allocator
The following structure is now proposed by Bonwick:
- Objects of the same type are grouped into slabs (English: tile, tile).
- These slabs are organized under a cache , so other objects of the same type are kept in stock.
- This caching reduces time-consuming recourse to the buddy memory management above , which only supplies entire memory pages.
- The memory pages required for slabs and caches must be fetched from the buddy system.
In other words: The slab allocator divides the memory provided by the buddy system further. The central allocator offers interfaces ( APIs ) via which the clients can request storage.
Caches
tasks
The caches have three tasks:
- They provide stocks of frequently used objects already initialized.
- They provide general stores of objects of certain sizes (for objects that are less frequently required or for individual memory areas), under Linux from to bytes.
- You should use the hardware caches ( TLB , L1 cache ) as well as possible.
construction
Each cache is represented by a data structure (under Linux of the type: kmem_cache_s). This contains three doubly linked lists (within the substructure list of the type kmem_list3) under which the slabs are lined up. One list contains the slabs that are only filled with unused objects ("all free"), one with the objects that are completely in use ("all used"), and the third list contains slabs with used and currently unused objects (" mixed free / used ").
There is also (since the 2001 version of Bonwick's draft) a cache (of the array_cache type) for every processor in the system that remembers the most recently released objects. These objects are also assigned again first, as the probability is very high that they will still be found in one of the processor caches, which means that they are better used.
Example: Information about the caches
This concept becomes clearer using the example. Under Linux there is runtime information on the slab allocator in the file / proc / slabinfo. The following sample edition has been greatly abridged.
Surname | <active_objs> | <num_objs> | <objsize> | <objperslab> | <pagesperslab>: tunables | <batchcount> |
---|---|---|---|---|---|---|
raid5 / md0 | 256 | 264 | 364 | 11 | 1: tunables | 54 |
travel_inode_cache | 2955 | 5650 | 376 | 10 | 1: tunables | 54 |
scsi_cmd_cache | 2 | 11 | 352 | 11 | 1: tunables | 54 |
sgpool-128 | 32 | 32 | 2048 | 2 | 1: tunables | 24 |
sgpool-64 | 32 | 32 | 1024 | 4th | 1: tunables | 54 |
sgpool-32 | 32 | 32 | 512 | 8th | 1: tunables | 54 |
sgpool-16 | 32 | 45 | 256 | 15th | 1: tunables | 120 |
sgpool-8 | 32 | 62 | 128 | 31 | 1: tunables | 120 |
clip_arp_cache | 0 | 0 | 160 | 25th | 1: tunables | 120 |
rpc_buffers | 8th | 8th | 2048 | 2 | 1: tunables | 24 |
rpc_tasks | 8th | 25th | 160 | 25th | 1: tunables | 120 |
rpc_inode_cache | 0 | 0 | 416 | 9 | 1: tunables | 54 |
unix_sock | 243 | 270 | 384 | 10 | 1: tunables | 54 |
size-128 (DMA) | 0 | 0 | 128 | 31 | 1: tunables | 120 |
size-128 | 4282 | 4402 | 128 | 31 | 1: tunables | 120 |
size-64 (DMA) | 0 | 0 | 64 | 61 | 1: tunables | 120 |
size-64 | 856 | 1403 | 64 | 61 | 1: tunables | 120 |
size-32 (DMA) | 0 | 0 | 32 | 119 | 1: tunables | 120 |
size-32 | 2176 | 8211 | 32 | 119 | 1: tunables | 120 |
Meaning of the columns:
- name: name of the cache
- <active_objs>: Number of objects currently in use
- <num_objs>: Number of all objects
- <objsize>: size of an object (in bytes)
- <objperslab>: Number of objects per slab
- <pagesperslab>: Number of memory pages per slab
- <batchcount>: Number of objects that are created or released at once (when creating or releasing a slab)
The first part of the table shows the caches already mentioned for specific objects, the second part the caches in general sizes, each in a variant for DMA- compatible memory and for non-DMA-compatible memory.
Fragmentation
Fragmentation can be divided into two types: internal and external fragmentation.
Internal fragmentation
Bonwick defines internal fragmentation as "per buffers wasted space", ie the space that is wasted per slab. Unused gaps arise from:
- Rounding up the object size to an integer multiple of the word size (sizeof (void *)) of the processor used. Thanks to aligned addresses, this speeds up access on most architectures (the processor offers optimized instructions for aligned addresses), but costs memory.
- Scrap at the end of the slab, because in the rarest cases the objects in the slab exactly result in its size.
The waste at the end is smaller than the slab size, so it can be influenced by the size of the slab: the larger the slab, the more objects it contains and the smaller the waste. This off-cut is also used for so-called coloring , which will be discussed below.
External fragmentation
Bonwick defines external fragmentation as "unused buffers in the freelist", i.e. unused objects in the slabs. As can already be seen in the table above, external fragmentation exists with this method. However, it is relatively small.
Because the same objects have similar lifetimes, there are few, only partially used slabs. Most slabs are either fully occupied or not at all. The exception here are the caches in certain sizes, but these are not used that often.
Furthermore, the memory is not divided further, as other methods (e.g. the buddy method ) do. Because the more objects a slab contains, the higher the external fragmentation, since the probability of getting a slab completely empty decreases.
Coloring
Coloring tries to make better use of the CPU caches by aligning the same objects differently in different slabs.
To do this, the scrap is taken at the end of the slab and parts of it are inserted in front of the first object in the slab. Thus, the objects are “offset” in comparison to another slab. By skilfully aligning with multiples of the cache line size , mutual displacement of the objects from the cache and the addresses from the TLB can be reduced.
Expiry of a memory request
The typical sequence of a memory request results from the previous one:
- An attempt is made to take an object from the per-CPU cache.
- If this fails, an object from an already existing slab is delivered.
- If there are no more free objects in the slabs, a new slab must be created, with the detour via the higher-level buddy system.
The negative influence on the caches and thus on the performance increases from point to point.
Slabs
construction
A slab consists of
- an administration header that contains information such as the first object in the slab, the index of the first free object, the number of occupied objects, the shift for the coloring ( color space ) of this slab and a link to the previous and next slab in the list of free objects / occupied / partially used slabs,
- an administration structure for the free objects,
- the objects, rounded up to an integer multiple of the word size of the processor,
- the remaining waste, if any.
The management of free objects
In the Linux kernel, the free objects are managed using a list of integers. It contains as many numbers as there are slab objects. Each number field only has a meaning if the corresponding object is not used and then indicates the index of the next free object. The information about the first free object in the head of the slab can thus quickly determine and return the next one.
literature
- Wolfgang Mauerer: Linux kernel architecture. Hanser, 2004. ISBN 3-446-22566-8
- Andrew S. Tanenbaum: Modern Operating Systems. Prentice Hall, 2001. English, ISBN 0-13-092641-8
Web links
- Bonwick's description of the slab allocator, 1994 ( PostScript )
- Bonwick's description of the slab allocator, 2001 ( PDF ; 131 kB)
- Lecture on the Linux slab allocator Sven Krohlas, University of Karlsruhe, WS 2004/2005