Network block device

from Wikipedia, the free encyclopedia

A Network Block Device ( Engl. For network block device , abbreviated NBD ) is a kind of virtual hard drive on which a computer via Internet Protocol can access. The NBD is provided by an NBD server . For this purpose, it offers its own hard drive, hard drive partition or a file as NBD for certain other computers ( clients ). Another computer (or the same computer) can connect to the NBD server via a TCP connection and then use the NBD like its own local hard drive.

There is currently a full NBD implementation only for Linux . Linux addresses all mass storage devices as so-called block devices . If a Linux computer is to use a network block device, NBD support must be activated in the Linux kernel configuration or the kernel module nbd.ko must be loaded. A user space - utility called nbd-client now provides the TCP connection to the NBD server, the existing connection indicates the kernel on and then exit. This has the advantage that the kernel does not have to deal with the establishment of the connection (and possible authentication , etc.).

The NBD server is independent of the operating system. It can also run on a non-Linux system, since no Linux-specific functions are required. There is a program called nbd-server that does nothing more than provide a given file (or partition etc.) to a given TCP port .

In principle, it is possible to operate a computer without hard drives via NBD, which has an NBD as the only mass storage device. However, since an external program ( nbd-client ) is required to establish the connection , this can only be implemented with concepts such as init-ramdisk , a virtual file system that is held in RAM and stored in the kernel itself so that it can be used after available for boats.

Since the original version of NBD has some weaknesses (e.g. the limitation to 4 gigabytes per NBD), there are various extensions, some of which are referred to as "enhanced NBD". However, these are incompatible with the original NBD.

NBD protocol (from version 2.6)

The protocol is a binary protocol. All multibyte values ​​are sent in network byte order .

Handshake

First there is an initialization phase in which data is exchanged between the NBD server and the NBD client program. This protocol is independent of the NBD driver in the Linux kernel and varies with different NBD implementations.

Version ≤2.9.16

The old handshake protocol supports exactly one block device per port. As soon as a client has connected to the NBD server, the server sends the following data structure:

NBD initialization packet (server → client)
Offset Data type Surname description
0 char [8] INIT_PASSWD Identification string {'N', 'B', 'D', 'M', 'A', 'G', 'I', 'C'}
8th uint64_t cliserv_magic Magic Number 0x00420281861253
16 uint64_t export_size Size of the exported block device (in bytes)
24 uint32_t flags Flags:
  • Bit 0: There are flags
  • Bit 1: device is read-only
  • Bit 2: Device supports "FLUSH" command for flushing write caches
  • Bit 3 and 4: unused
  • Bit 5: The device supports the TRIM command with which the file system can inform the block layer that has been released
28 char [124] reserved Reserved (currently filled with zero bytes)

If the client does not accept the identification string or the magic number, it closes the connection. Otherwise the connection is considered to have been successfully established.

Version ≥ 2.9.17

The new handshake protocol uses the IANA-registered port 10809 and another message format that allows the server to offer several block devices via a TCP port, from which the client can select one by name. In addition, the 32-bit flags have been split into 2 16-bit parts, which make it possible to separate server-global and device-dependent flags.

Server Init packet (Server → Client)
Offset Data type Surname description
0 char [8] INIT_PASSWD Identification string {'N', 'B', 'D', 'M', 'A', 'G', 'I', 'C'}
8th uint64_t cliserv_magic Magic Number 0x49484156454F5054 (= "IHAVEOPT")
16 uint16_t server_flags Flags that apply to the entire server. The flags usually have the value 0003 hex . The bits mean in detail:
Bit 0
NBD_FLAG_FIXED_NEW_STILE: Set bit indicates that a certain handshake bug has been fixed in the server
Bit 1
NBD_FLAG_NO_ZEROES: Set bit indicates that the header message is not padded with 124 zero bytes

The client responds with its flags. Since no flags have been defined so far, they only consist of 32 zero bits:

Client Init packet (Server → Client)
Offset Data type Surname description
0 uint32_t client_flags Previously the same meaning as server_flags, also usually 0000'0003 hex .

The client then sends various options, which the server acknowledges either accepting or rejecting:

Option packet (client → server)
Offset Data type Surname description
0 uint64_t cliserv_magic Magic Number 0x49484156454F5054 (= "IHAVEOPT")
8th uint32_t option_number Identification number / type of option
12 uint32_t option_length Length of the option (in bytes)
16 variable option_data Option data (depending on option type)

So far, 3 options have been defined:

NBD options
Surname value meaning
NBD_OPT_EXPORT_NAME 1 Client selects the name of the block device: the name follows in the option_datafield. This option automatically exits the option list. The server sends the device-dependent part of the initialization (see below).
NBD_OPT_ABORT 2 Client wants to end the connection
NBD_OPT_LIST 3 Client wants a list of the names of the exported block devices

The server responds to an option packet with a reply packet:

Reply packet (server → client)
Offset Data type Surname description
0 uint64_t reply_magic Magic Number 0x0003e889045565a9
8th uint32_t option_number Identification number / type of option that is answered
12 uint32_t reply_type Type of answer
16 uint32_t reply_length Length of the response data
20th variable reply_data Reply data if reply_length> 0

The following response types have been defined so far:

Reply types
Surname value meaning
NBD_REP_ACK 1 The server accepts the option or has no further response data (with NBD_OPT_LIST)
NBD_REP_SERVER 2 Description of the block device. This is followed by the length of the name as a 32-bit number, the name and - if there is still space in the response packet - any further descriptive details in plain text.
NBD_REP_ERR_UNSUP 8000 0001 hex Client sent an unknown option
NBD_REP_ERR_POLICY 8000 0002 hex The server understood the option, but the server is not allowed to accept the option (e.g. NBD_OPT_LIST can be allowed or forbidden in the configuration file)
NBD_REP_ERR_INVALID 8000 0003 hex The server understood the option, but it was syntactically invalid
NBD_REP_ERR_PLATFORM 8000 0004 hex The option is not supported by the platform on which the server is running. (currently unused)
Hexdump of the data traffic in the initialization phase between NBD client and server. The client asks for the list of devices offered and the server lists two devices (named "alfa" and "bravo").

The negotiation phase is completed as soon as the server has positively acknowledged the NBD_OPT_EXPORT_NAME option. It then sends the identification data of the exported block device to the client:

Device Init packet (server → client)
Offset Data type Surname description
0 uint64_t device_size Size of the exported block device (in bytes)
8th uint16_t device_flags Flags that apply to the exported device:
  • Bit 0: There are flags
  • Bit 1: device is read-only
  • Bit 2: Server & device support the NBD_CMD_FLUSH command
  • Bit 3: Server & device support the NBD_CMD_FLAG_FUA flag
  • Bit 4: NBD_FLAG_ROTATIONAL: exported data are on a rotating medium (classic hard disk), which the client can take into account in the access pattern to the blocks
  • Bit 5: Server & device support the NBD_FLAG_SEND_TRIM command
10 uint8_t [124] padding unused, all 0

Data phase

The NBD client forwards the information about the size of the block device, any flags and the open socket to the kernel via special system calls and terminates itself. The kernel then takes over the further communication via this socket.

The kernel on the client side now sends read and write requests to the server. These have the following package structure:

NBD Request (Client → Server)
Offset Data type Surname description
0 uint32_t magic Magic Number 0x25609513
4th uint32_t type 0: read access; 1: write access; 2: controlled end of connection; 3: flush cache; 4: TRIM command
8th char [8] act 8 bytes, which are sent identically in the reply so that it can be assigned to a request
16 uint64_t from Offset (in bytes) from which to read / write
24 uint32_t len Length of the data block

In the case of write access, the data to be written follow immediately. The server replies to every request with a reply. This has the following structure:

NBD Reply (Server → Client)
Offset Data type Surname description
0 uint32_t magic Magic Number 0x67446698
4th uint32_t error 0 = OK (no error occurred)
8th char [8] act Copy of the handle in the associated request

In the case of responses to read requests, the requested data follow immediately.

See also

Web links

Individual evidence

  1. From the source code of nbd-2.9.13 / cliserv.h
  2. From the source code of nbd-3.2 / proto.txt
  3. Sent in host byte order contrary to the protocol description. Value from client ignored when reading.
  4. a b From the header /usr/include/linux/nbd.h