Fragmentation (file system)

from Wikipedia, the free encyclopedia
Visualization of the fragmentation and the subsequent defragmentation process

Under fragmentation (fragment = Fragment) refers to the scattered storage of logically related data blocks of the file system on a disk , which as a special case of the general memory fragmentation can be considered. In the case of storage media with sequential access such as hard drives , fragmentation can lead to a noticeable slowdown in reading and writing processes.

Under defragmentation means the feasible with special programs reorganization of fragmented data blocks on the storage medium, so logically related data blocks are stored sequentially as possible on the disk. This can accelerate the sequential access and thus increase the operating speed of the entire system .

Emergence

A data carrier (on which reading and writing is permitted) is not a static structure, but read, write and delete operations take place continuously. Some operations free up storage space , some require new storage space. This constant release and re- allocation of memory blocks means that, on the one hand, the unused memory blocks are distributed on the data carrier (free space fragmentation) and, on the other hand, the logically related data blocks of files and metafiles are no longer stored one after the other on the data carrier (file fragmentation). Another cause of fragmentation is the extension of existing files.

In simple terms: If the operating system wants to store a file on the storage medium, it can happen that it is not stored contiguously but rather scattered on the data medium. Some file systems do not check whether the data can be stored contiguously within the free storage space, but simply begin the storage process in the first free storage space area. It is not uncommon, however, that this storage area is not large enough to hold the entire file. In this case, the file system writes to the first free memory area until it is full and stores the remaining part of the file in the next free memory area.

The following diagram illustrates this process:

process Memory block
  1     2     3     4th     5     6th  
File file.odt arises file.odt free
The film.avi file is created file.odt film.avi free
file.odt gets bigger file.odt (1/2) film.avi file.odt (2/2) free
after defragmentation file.odt film.avi free

species

There are several types of fragmentation (although fragmentation usually means intrafile fragmentation):

  • Intrafile fragmentation is the distribution of useful files on the data carrier so that small additional delay times can occur when reading files sequentially .
  • Interfile fragmentation is the spreading of files and metafiles that are usually read one at a time. It is particularly important when reading many small files in a correlated manner (for example, directories with many small images).
  • Metafile fragmentation is the fragmentation of directories, block allocation tables and similar meta information.
  • Fragmentation of unused space ; it does not in itself represent a disadvantage for accessing files, but is one of the causes of fragmentation (creating new files).

Effects

When reading and writing fragmented data, depending on the nature of the storage medium, there may be a loss of speed. With modern file systems ( ReiserFS , XFS , NTFS ) there is also an increase in the metadata that describes where the data is located on the storage medium.

The negative effect of fragmentation can be illustrated particularly well using the example of a hard disk: When reading a fragmented file that is stored scattered all over the hard disk's magnetic disk, the hard disk's read head has to be very often on the magnetic disk at short intervals and over long distances be repositioned. This causes many small delays that are in the range of a few milliseconds. Depending on the size of the file, the degree of fragmentation and the access time of the hard disk, these delays can add up to a noticeable slowdown over the entire read process.

In contrast to this, there are also other types of storage media that do not have moving parts and are therefore not affected by mechanically based delays in read and write operations. This includes data carriers based on memory chips (primarily flash memory ), such as USB sticks , memory cards or SSDs . Hybrid hard disks , which use both magnetic disks and memory chips for permanent storage of data, are only partially affected by this problem .

However, such storage media can also be slowed down by a fragmented file system. However, due to the different internal organization of storage, the effects of fragmentation vary greatly depending on the device and manufacturer and cannot be compared to a hard drive. The controller of the storage media does not specify the data depending on the model sometimes in physical order on the memory chips from being displayed by the file system, because since memory chips can not be described as often, for example, must frequent writing on the same physical address using wear leveling avoided become. As a result, no reliable statement can be made regarding the effects of fragmentation on flash storage media. What is certain, however, is that performing a defragmentation has a negative effect on the service life of flash storage media, as they cannot be written to an unlimited number of times.

Avoid and reduce

There are several strategies for reducing file system fragmentation. These strategies are primarily related to hard disks as the data carriers used and could have little or no beneficial effect on other storage media.

For all types of file systems, leaving 5 to 20 percent of the space free will reduce fragmentation. The more occupied a file system is, the more likely it is that fragmentation will occur as the free areas become smaller and smaller. As soon as a file is to be stored that is larger than the largest “gap”, continuous storage is no longer possible and it inevitably fragments.

Avoiding fragmentation

A tried and tested means is to separate programs and user data. The file system makes no difference to this, but fragments of user data arise continuously and in programs with every update. A separation therefore requires distribution over different partitions. There are suitable tools for this, especially for relocating user data. As a result, defragmenting the partitions will also take less time.

Parameterizing the partitions

Simple measures ensure less fragmentation according to the volume of the various file classes.

  • Skilful distribution of the operating system, directory trees and user data over the hard drives using suitable software
  • Use a larger block size of the file system (but this sometimes wastes disk space)

Parameterization of the applications and the system services

Further measures relate to the parameterization of the applications:

  • Preallocation (blocks are reserved as a precaution, although they are not yet needed)
  • Defining the memory blocks to be used later (late allocation) instead of defining them immediately (early allocation)
  • Multi-level allocation systems (dividing a hard disk into clusters, block groups, blocks).

Degree of fragmentation of a file system

There are several ways to specify the fragmentation of a file system:

  • Ratio of the read or write speed of the fragmented (real) file system to a non-fragmented (optimal) file system
  • Ratio of the number of fragmented files to the total number of files
  • Ratio of the space occupied by fragmented files to the total space occupied
  • Ratio of the space used by fragmented files to the total available space
  • Number of connected blocks to the total number of occupied blocks
  • Number of blocks connected to the total number of blocks available.

The degrees of fragmentation determined by different methods are not comparable; Fragmentations not even determined by the same process are comparable, because the real effects of many processes still depend on the block size, average file size, speed and internal storage organization of the medium.

In addition, the degree of fragmentation alone is not a reliable measure of the performance of a file system. On the one hand, this is due to the fact that in some file systems the metadata can also be fragmented, but not in others. (The MFT in NTFS, for example, is affected by this.) On the other hand, different file systems have different mechanisms to reduce the performance drop in the presence of fragmentation.

The extended filesystem of Linux implements several mechanisms to reduce fragmentation. This includes grouping blocks into block groups. Files are then distributed as evenly as possible to block groups. This creates an even distribution of files (and free areas) on the storage medium. In contrast to FAT, files are now always placed in free areas that fit as poorly as possible, i.e. small files in large gaps in order to be able to react to changes in the size of the files concerned without loss. Behind each file tries ext2 also to leave enough space, not promptly lead so that smaller size changes of files to fragmentation. This mechanism was further improved by extents in ext4 . By distributing the data over the entire usable area, fragmentation is less important in server operation than when reading in "burst" mode. Modern hard disk schedulers arrange the read or write requests in such a way that search times (e.g. of the read / write head of the disk) are kept to a minimum.

A further reduction in fragments is achieved, among other things, by the XFS file system , which delays write operations (delayed write) and temporarily stores them as completely as possible in the RAM. In the case of small files, the complete file size is known to the file system before the file is written to the data carrier. This allows XFS to find an optimal location for the file. If the file does not completely fit into the RAM, the buffer will of course be written to the data carrier beforehand.

When assessing the fragmentation of file systems, a distinction must be made between throughput and performance: Performance is the maximum data rate that a user can achieve with a single process from the file system (in MB / s). Throughput is the data rate that the file system can deliver over all users and processes (with several competing IO requests). In multitasking or multiuser systems, the throughput is often much greater than the performance. The Linux operating system has algorithms that increase the throughput, not the performance of the system. Severe fragmentation has a greater impact on performance. In systems with many competing requests to the file system, a (larger) file is not read in one piece anyway; other requests are made at the same time.

Defragmentation

There are different methods of defragmentation depending on the file system. While there are actually no in-house defragmentation methods, especially for older file systems, defragmentation programs are available for modern file systems.

The following methods are common:

Copy the data to an empty partition
This method is more complex, but more reliable. Since each file is copied individually, all parts of the file are put together on the source partition and are saved in successive memory blocks on the destination partition. Alternatively, the files can be stored in an archive on the same partition; the original files are then deleted and the archive unpacked again. The disadvantage is the double storage space requirement in the form of an additional partition or free space on the partition. This option can be used with any file system.
Use of defragmentation programs
A distinction must be made between offline and online defragmentation. The file system must not be included in the former. For example, it is not possible to defragment the system partition of a running system. Typically, this problem does not exist with online defragmentation. An example of a file system that supports online defragmentation is XFS , which can be defragmented with the xfs_fsr program .

On Windows

New fragmentation continues to occur among the various autonomous system services. Every system update creates numerous new fragments. Defragmentation should therefore be repeated if the system performance drops significantly after updates or new installations of large application packages such as Office packages.

MS-DOS-based operating systems (historical)

Under MS-DOS the file directory and file contents were not separated. Therefore a defragmentation was only achieved by writing to another location. The defrag.exe program was introduced in MS-DOS version 6 and was a limited licensed version of Norton SpeedDisk. Like the full version of SpeedDisk, it could in certain cases lead to a significant increase in speed on the computers commonly used at the time (386/486). After the defragmentation, the computer no longer had to find individual fragments of a file across the entire hard drive. This phenomenon could even be "followed" on these older computers by analyzing the hard drive noises (clacking from one side to the other, later: continuous rattling).

Defrag.exe under DOS allowed up to Windows Me (the last version of DOS-based Windows, see Windows 9x ) an exact visual tracking of this process. For example, blocks of files were initially marked green for read access, a free space was searched for on the hard drive and highlighted in red for writing. If a sufficiently large free space was not found within the processing segment, the data mostly moved to the end of the hard disk. After clearing, this block was moved again.

Programs from third-party providers ( Norton , Digital Research ) were also widespread under DOS (DISKOPT.EXE / DSKSPEED.EXE). What the defragmentation programs have in common is that they optionally offer sorting by file name, size or date, but this has little or no effect on the efficiency of the file system. Depending on the option selected (e.g. sorting by modification date or by specific file name extensions), certain system files can be moved to the beginning of the partition so that this area is hardly affected by renewed fragmentation. This can slow down the process of future fragmentation and the system performance remains stable for longer.

NT-based operating systems

With the new NTFS file system , the file directory and the file contents are written separately. Defragmentation therefore takes place in several phases for the directories (folders) and files (files). A noticeable gain in time occurs when the file directories are defragmented. This is possible with products from third-party suppliers.

1995 licensed Executive Software to Windows NT - source and published on its base in April Disk Defragmenter Diskeeper for Windows NT 3.5 and later for the successor to Windows NT 3.51 . However, this caused major problems as soon as the user installed a service pack , whereupon Microsoft decided to work with Executive Software to integrate a standardized API for defragmentation into Windows NT 4.0 . Although this was undocumented and Windows NT 4.0 did not come with a defragmentation program, numerous third-party vendors published defragmentation programs for Windows NT.

Windows 2000 includes a defragmentation program from Microsoft. In comparison to Windows 9x , however, the exact procedure of the program is only visible for the entire hard disk. Individual processed files are displayed in the status bar , but internally only a timer is used to show the progress. With Windows Vista the graphic display was replaced by a percentage display.

Windows 7 defragments itself from time to time when the system load is low. According to measurements, this leads to a noticeable improvement in performance during start-up and operation.

On macOS

The file systems used by macOS APFS and HFS + , historically also HFS , are designed in such a way that they search for the largest free memory block on the hard drive in which a file is to be saved. Only when a file does not fit into the largest free memory block is the file split up (fragmented) and the part that has not yet been written is saved in a further block.

Under Unix and Linux

It is sometimes said that Unix based operating systems do not need defragmentation. This claim arose from the content of the master's thesis A Fast Filesystem for UNIX by Bill Joy , Samuel Leffler and Kirk McKusick . However, the statements made there only refer to the UFS , which in fact does not noticeably lose speed even when used extremely unfavorably, and cannot generally be transferred to other file systems.

As mentioned in the section “Avoiding and reducing fragmentation”, although countermeasures are implemented in the file systems used, fragmentation still occurs in practice. There are also various programs for defragmentation under Unix and similar operating systems.

Tools

Under Windows NT operating systems from Windows 2000 onwards, the defragmenter provided is only available to administrators; normal users are denied access. Compared to third-party products, the Diskeeper- based defragmenter only has a limited range of functions.

As of Windows 7, Windows' own defragmentation service runs in the background once a week by default. This can also be deactivated if the use of other programs is preferred.

In the freeware sector , for example, Ultimate Defrag , MyDefrag and Defraggler are known. They support fragmentation according to various criteria, which saves a considerable amount of time.

The file systems used under Linux and Unix variants mostly prevent fragmentation, so defragmentation is less important. However, such file systems are not resistant to fragmentation. ext2 can be defragmented using defrag , XFS using xfs_fsr .

Other popular file systems such as ReiserFS and ext3 / ext4 do not provide such programs for defragmentation. The only solution for defragmenting is copying to an empty partition (see above).

To defragment the HFS + used by Apple in macOS , a defragmentation program runs in the background, which does not present the user with any output and automatically interrupts the defragmentation when the hard disk is accessed, as Microsoft has also implemented with Windows Vista .

See also

literature

Individual evidence

  1. a b Mark Russinovich: Inside Windows NT Disk Defragmenting . May 1, 1997. Archived from the original on January 13, 2012. Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. Retrieved June 17, 2012. @1@ 2Template: Webachiv / IABot / www.windowsitpro.com
  2. c't 5/2012
  3. Schedule Disk Defragmenter to run regularly

Web links

Wiktionary: Defragmentation  - explanations of meanings, word origins, synonyms, translations