Sparse file

from Wikipedia, the free encyclopedia

A sparse file ( English sparse file ; sparse for "sparse", "sparse" or "scattered") describes a file that can be stored compactly in a file system , because it contains less data than the specified file size - it contains sections with indefinite content. In a sparse file, areas in which there is already saved data alternate with areas that have not yet been written to. No space needs to be allocated in the file system for these unwritten areas.

Basics

Principle of a sparse file: Indefinite areas of the file do not need to be saved; instead, only information about their size is saved in the file's metadata

It is a space-saving form of storage for files that contain (large) areas with indefinite content. This type of storage comes from the world of inode-based file systems . In general, the file system specifies that these undefined areas are reproduced as a sequence of null characters during read access . Modern file systems of Unix-like operating systems , NTFS and APFS support this possibility.

In the case of a sparse file, only those parts are stored in the background memory to which data has actually been written. Such files can arise when blocks are written to the file at different locations within the file so that these blocks do not adjoin one another. This creates areas in between within the file that have no defined content. So z. For example, a file with a nominal length of 100  Gi B effectively only comprises one logical block in the file system if, for example, data has only been written at one point in the file.

Not all file and operating systems support sparse files that end up with an undefined area.

Such a form of storage can be very useful with some forms of binary databases , as well as with the mapping of partitions in a file (e.g. with virtualization ).

Problems in using it

Sparse files can cause problems when copying: When copying to another file system that does not support sparse files, the target file cannot be created in a compact form. The undefined areas of the source file are therefore written as null characters in the target file. This provides a significantly larger target file. The target file system must have sufficient capacity. In addition, both the copy or backup program and the target's operating system must support sparse files and be able to recognize them as such.

Files to be saved are generally not automatically created as sparse files by the operating system, but the application that creates / uses files must use separate operating system routines for this - most operating systems do not "automatically" create sparse files.

Due to their inherent fragmentation , linear reading of a sparse file slows down.

NTFS Sparse

In contrast to Unix-based file systems from version 3 onwards, the Windows file system NTFS has a special file attribute which causes the input / output subsystem of the Windows file system not to use any memory on the data carrier for contiguous areas of a file that only consists of zero values .

Both normal and compressed data can be treated as sparse files by NTFS. Under Windows Server 2003 and Windows XP, a file that has been declared as a sparse file can no longer be converted by NTFS into a normal file. With later Windows versions this is only possible if there are no more holes.

The problems mentioned for Unix-based file systems exist in principle in the same way with NTFS, although the file attribute ensures that programs written according to the general programming guidelines can transparently copy sparse files without losing the sparse property.

Handling of sparse files under Unix and similar systems

Generation of sparse files

Sparse files can be created with the Unix command, for example dd:

dd if=/dev/zero of=sparsefile bs=1 count=1 seek=9999999

This exemplary command creates a 10-megabyte sparse file by the write pointer means seekis at the position 9999999, and then writes a byte.

The creation of sparse files that end in a "hole" is ddonly possible indirectly with some implementations. To do this, a file must first be created that ends with written data as in the example above. The last data part of the file can then be removed using the system call truncate()or ftruncate(). This is true for Solaris, for example. For Linux it is enough count=0to set to prevent data from being written after the “hole”. Under Linux, if this has been count=0set, only one is ftruncate()carried out without a write operation , which creates a sparse file without a character other than the zero byte in it.

With the GNU - ddan identical file can also be created with the following shortened call:

dd of=sparsefile bs=1 count=0 seek=10000000

Detection of sparse files

Sparse files have a different logical and physical file size. While the logical file size also includes the zero bytes, the physical file size describes the space that the file actually requires in the file system.

The option of -sthe Unix command also lsshows the physical file size, but in blocks. With -kthe logical size in blocks is displayed, with -hboth are displayed in readable format:

 ls -lhs sparse-file
 ls -lks sparse-file

Alternatively, the Unix command can be used to dudisplay the logical file size, but initially also in blocks. The option --block-size 1shows the physical size in bytes, while --bytesthe logical size shows in bytes:

 du --block-size 1 sparse-file
 du --bytes sparse-file

Application example

In the following, a 10 MB sparse file is created. When comparing with a 3 MB file, you only dunotice by a simple call that it is a sparse file, which only requires 10 blocks on the hard disk.

> dd if=/dev/zero of=sparsefile bs=1 count=0 seek=10M
0+0 Datensätze ein
0+0 Datensätze aus
0 Bytes (0 B) kopiert, 2,9615e-05 s, 0,0 kB/s
> dd if=/dev/urandom of=normalfile bs=1M count=3
3+0 Datensätze ein
3+0 Datensätze aus
3145728 Bytes (3,1 MB) kopiert, 1,71034 s, 1,8 MB/s
> ls -lh
insgesamt 3,1M
-rw-r--r-- 1 sven users 3,0M 18. Mai 03:08 normalfile
-rw-r--r-- 1 sven users 10M 18. Mai 03:06 sparsefile
> du *
3075 normalfile
10 sparsefile

Handling of sparse files under Microsoft Windows

Generation of sparse files

A file can be marked fsutilas a sparse file with the Windows command :

fsutil sparse setflag <Dateiname>

This means that unwritten areas of the file will not be allocated to the data carrier in future write operations. The command can also be used to release existing areas of a file marked as a sparse file:

fsutil sparse setrange <Dateiname> <Position in Byte> <Länge in Byte>

This will deallocate the specified area. It should be noted that only complete blocks whose length is a multiple of 64  KiB and whose starting position is a multiple of 64 KiB can be released.

The kernel function DeviceIoControlwith the control codes FSCTL_SET_SPARSEand FSCTL_SET_ZERO_DATAcan be used to perform these operations programmatically . The latter code also works with files that are not sparse files, but the data areas are not released, but filled with zero bytes.

Detection of sparse files

The fsutil command can also be used to determine whether a file is a sparse file:

fsutil sparse queryflag <Dateiname>

To list the areas actually allocated, the command is called as follows:

fsutil sparse queryrange <Dateiname>

Generation of sparse files with MSSQL

The creation of sparse files by MSSQL from version 2005 is possible as a database snapshot . The following SQL statements create a sparse file with a size of 2 gigabytes under the nameC:\UnCompressed\Dummy_Snap.mdf

 CREATE DATABASE [Dummy]
 ON PRIMARY (NAME=N'Dummy',FILENAME=N'C:\UnCompressed\Dummy.mdf',SIZE=2097152KB)
 LOG ON  (NAME=N'Dummy_log',FILENAME=N'C:\UnCompressed\Dummy_log.ldf')
 GO
 CREATE DATABASE [Dummy_Snap]
 ON PRIMARY (NAME=N'Dummy',FILENAME=N'C:\UnCompressed\Dummy_Snap.mdf')
 AS SNAPSHOT OF [Dummy]

See also

literature

Web links

Individual evidence

  1. FSCTL_SET_SPARSE control code (Windows). Microsoft, accessed January 17, 2013 .