Sparse file
A sparse file ( English sparse file ; sparse for "sparse", "sparse" or "scattered") describes a file that can be stored compactly in a file system , because it contains less data than the specified file size - it contains sections with indefinite content. In a sparse file, areas in which there is already saved data alternate with areas that have not yet been written to. No space needs to be allocated in the file system for these unwritten areas.
Basics
It is a space-saving form of storage for files that contain (large) areas with indefinite content. This type of storage comes from the world of inode-based file systems . In general, the file system specifies that these undefined areas are reproduced as a sequence of null characters during read access . Modern file systems of Unix-like operating systems , NTFS and APFS support this possibility.
In the case of a sparse file, only those parts are stored in the background memory to which data has actually been written. Such files can arise when blocks are written to the file at different locations within the file so that these blocks do not adjoin one another. This creates areas in between within the file that have no defined content. So z. For example, a file with a nominal length of 100 Gi B effectively only comprises one logical block in the file system if, for example, data has only been written at one point in the file.
Not all file and operating systems support sparse files that end up with an undefined area.
Such a form of storage can be very useful with some forms of binary databases , as well as with the mapping of partitions in a file (e.g. with virtualization ).
Problems in using it
Sparse files can cause problems when copying: When copying to another file system that does not support sparse files, the target file cannot be created in a compact form. The undefined areas of the source file are therefore written as null characters in the target file. This provides a significantly larger target file. The target file system must have sufficient capacity. In addition, both the copy or backup program and the target's operating system must support sparse files and be able to recognize them as such.
Files to be saved are generally not automatically created as sparse files by the operating system, but the application that creates / uses files must use separate operating system routines for this - most operating systems do not "automatically" create sparse files.
Due to their inherent fragmentation , linear reading of a sparse file slows down.
NTFS Sparse
In contrast to Unix-based file systems from version 3 onwards, the Windows file system NTFS has a special file attribute which causes the input / output subsystem of the Windows file system not to use any memory on the data carrier for contiguous areas of a file that only consists of zero values .
Both normal and compressed data can be treated as sparse files by NTFS. Under Windows Server 2003 and Windows XP, a file that has been declared as a sparse file can no longer be converted by NTFS into a normal file. With later Windows versions this is only possible if there are no more holes.
The problems mentioned for Unix-based file systems exist in principle in the same way with NTFS, although the file attribute ensures that programs written according to the general programming guidelines can transparently copy sparse files without losing the sparse property.
Handling of sparse files under Unix and similar systems
Generation of sparse files
Sparse files can be created with the Unix command, for example dd
:
dd if=/dev/zero of=sparsefile bs=1 count=1 seek=9999999
This exemplary command creates a 10-megabyte sparse file by the write pointer means seek
is at the position 9999999, and then writes a byte.
The creation of sparse files that end in a "hole" is dd
only possible indirectly with some implementations. To do this, a file must first be created that ends with written data as in the example above. The last data part of the file can then be removed using the system call truncate()
or ftruncate()
. This is true for Solaris, for example. For Linux it is enough count=0
to set to prevent data from being written after the “hole”. Under Linux, if this has been count=0
set, only one is ftruncate()
carried out without a write operation , which creates a sparse file without a character other than the zero byte in it.
With the GNU - dd
an identical file can also be created with the following shortened call:
dd of=sparsefile bs=1 count=0 seek=10000000
Detection of sparse files
Sparse files have a different logical and physical file size. While the logical file size also includes the zero bytes, the physical file size describes the space that the file actually requires in the file system.
The option of -s
the Unix command also ls
shows the physical file size, but in blocks. With -k
the logical size in blocks is displayed, with -h
both are displayed in readable format:
ls -lhs sparse-file ls -lks sparse-file
Alternatively, the Unix command can be used to du
display the logical file size, but initially also in blocks. The option --block-size 1
shows the physical size in bytes, while --bytes
the logical size shows in bytes:
du --block-size 1 sparse-file du --bytes sparse-file
Application example
In the following, a 10 MB sparse file is created. When comparing with a 3 MB file, you only du
notice by a simple call that it is a sparse file, which only requires 10 blocks on the hard disk.
> dd if=/dev/zero of=sparsefile bs=1 count=0 seek=10M
0+0 Datensätze ein
0+0 Datensätze aus
0 Bytes (0 B) kopiert, 2,9615e-05 s, 0,0 kB/s
> dd if=/dev/urandom of=normalfile bs=1M count=3
3+0 Datensätze ein
3+0 Datensätze aus
3145728 Bytes (3,1 MB) kopiert, 1,71034 s, 1,8 MB/s
> ls -lh
insgesamt 3,1M
-rw-r--r-- 1 sven users 3,0M 18. Mai 03:08 normalfile
-rw-r--r-- 1 sven users 10M 18. Mai 03:06 sparsefile
> du *
3075 normalfile
10 sparsefile
Handling of sparse files under Microsoft Windows
Generation of sparse files
A file can be marked fsutil
as a sparse file with the Windows command :
fsutil sparse setflag <Dateiname>
This means that unwritten areas of the file will not be allocated to the data carrier in future write operations. The command can also be used to release existing areas of a file marked as a sparse file:
fsutil sparse setrange <Dateiname> <Position in Byte> <Länge in Byte>
This will deallocate the specified area. It should be noted that only complete blocks whose length is a multiple of 64 KiB and whose starting position is a multiple of 64 KiB can be released.
The kernel function DeviceIoControl
with the control codes FSCTL_SET_SPARSE
and FSCTL_SET_ZERO_DATA
can be used to perform these operations programmatically . The latter code also works with files that are not sparse files, but the data areas are not released, but filled with zero bytes.
Detection of sparse files
The fsutil command can also be used to determine whether a file is a sparse file:
fsutil sparse queryflag <Dateiname>
To list the areas actually allocated, the command is called as follows:
fsutil sparse queryrange <Dateiname>
Generation of sparse files with MSSQL
The creation of sparse files by MSSQL from version 2005 is possible as a database snapshot . The following SQL statements create a sparse file with a size of 2 gigabytes under the nameC:\UnCompressed\Dummy_Snap.mdf
CREATE DATABASE [Dummy] ON PRIMARY (NAME=N'Dummy',FILENAME=N'C:\UnCompressed\Dummy.mdf',SIZE=2097152KB) LOG ON (NAME=N'Dummy_log',FILENAME=N'C:\UnCompressed\Dummy_log.ldf') GO CREATE DATABASE [Dummy_Snap] ON PRIMARY (NAME=N'Dummy',FILENAME=N'C:\UnCompressed\Dummy_Snap.mdf') AS SNAPSHOT OF [Dummy]
See also
literature
- Dominic Giampaolo: Practical File System Design with the Be File System . (PDF) Morgan Kaufmann, 1999, ISBN 1-55860-497-9 .
Web links
- SEEK_HOLE or FIEMAP? - Technical article on discovering holes in sparse files
- Fsutil: sparse Microsoft Developer Network , Microsoft Windows Server Center: Explanation of DOS commands for generating sparse files under Windows
- Understanding the size of sparse files in database snapshots Microsoft Developer Network , Technical article on the importance of sparse files in database backups
- mkfile - Sun documentation: sparse files under Solaris generate (English)
Individual evidence
- ↑ FSCTL_SET_SPARSE control code (Windows). Microsoft, accessed January 17, 2013 .