Network block device
A Network Block Device ( Engl. For network block device , abbreviated NBD ) is a kind of virtual hard drive on which a computer via Internet Protocol can access. The NBD is provided by an NBD server . For this purpose, it offers its own hard drive, hard drive partition or a file as NBD for certain other computers ( clients ). Another computer (or the same computer) can connect to the NBD server via a TCP connection and then use the NBD like its own local hard drive.
There is currently a full NBD implementation only for Linux . Linux addresses all mass storage devices as so-called block devices . If a Linux computer is to use a network block device, NBD support must be activated in the Linux kernel configuration or the kernel module nbd.ko must be loaded. A user space - utility called nbd-client now provides the TCP connection to the NBD server, the existing connection indicates the kernel on and then exit. This has the advantage that the kernel does not have to deal with the establishment of the connection (and possible authentication , etc.).
The NBD server is independent of the operating system. It can also run on a non-Linux system, since no Linux-specific functions are required. There is a program called nbd-server that does nothing more than provide a given file (or partition etc.) to a given TCP port .
In principle, it is possible to operate a computer without hard drives via NBD, which has an NBD as the only mass storage device. However, since an external program ( nbd-client ) is required to establish the connection , this can only be implemented with concepts such as init-ramdisk , a virtual file system that is held in RAM and stored in the kernel itself so that it can be used after available for boats.
Since the original version of NBD has some weaknesses (e.g. the limitation to 4 gigabytes per NBD), there are various extensions, some of which are referred to as "enhanced NBD". However, these are incompatible with the original NBD.
NBD protocol (from version 2.6)
The protocol is a binary protocol. All multibyte values are sent in network byte order .
Handshake
First there is an initialization phase in which data is exchanged between the NBD server and the NBD client program. This protocol is independent of the NBD driver in the Linux kernel and varies with different NBD implementations.
Version ≤2.9.16
The old handshake protocol supports exactly one block device per port. As soon as a client has connected to the NBD server, the server sends the following data structure:
Offset | Data type | Surname | description |
---|---|---|---|
0 | char [8] | INIT_PASSWD | Identification string {'N', 'B', 'D', 'M', 'A', 'G', 'I', 'C'}
|
8th | uint64_t | cliserv_magic | Magic Number 0x00420281861253 |
16 | uint64_t | export_size | Size of the exported block device (in bytes) |
24 | uint32_t | flags | Flags:
|
28 | char [124] | reserved | Reserved (currently filled with zero bytes) |
If the client does not accept the identification string or the magic number, it closes the connection. Otherwise the connection is considered to have been successfully established.
Version ≥ 2.9.17
The new handshake protocol uses the IANA-registered port 10809 and another message format that allows the server to offer several block devices via a TCP port, from which the client can select one by name. In addition, the 32-bit flags have been split into 2 16-bit parts, which make it possible to separate server-global and device-dependent flags.
Offset | Data type | Surname | description |
---|---|---|---|
0 | char [8] | INIT_PASSWD | Identification string {'N', 'B', 'D', 'M', 'A', 'G', 'I', 'C'}
|
8th | uint64_t | cliserv_magic | Magic Number 0x49484156454F5054 (= "IHAVEOPT") |
16 | uint16_t | server_flags | Flags that apply to the entire server. The flags usually have the value 0003 hex . The bits mean in detail:
|
The client responds with its flags. Since no flags have been defined so far, they only consist of 32 zero bits:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint32_t | client_flags | Previously the same meaning as server_flags, also usually 0000'0003 hex . |
The client then sends various options, which the server acknowledges either accepting or rejecting:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint64_t | cliserv_magic | Magic Number 0x49484156454F5054 (= "IHAVEOPT") |
8th | uint32_t | option_number | Identification number / type of option |
12 | uint32_t | option_length | Length of the option (in bytes) |
16 | variable | option_data | Option data (depending on option type) |
So far, 3 options have been defined:
Surname | value | meaning |
---|---|---|
NBD_OPT_EXPORT_NAME | 1 | Client selects the name of the block device: the name follows in the option_data field. This option automatically exits the option list. The server sends the device-dependent part of the initialization (see below).
|
NBD_OPT_ABORT | 2 | Client wants to end the connection |
NBD_OPT_LIST | 3 | Client wants a list of the names of the exported block devices |
The server responds to an option packet with a reply packet:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint64_t | reply_magic | Magic Number 0x0003e889045565a9 |
8th | uint32_t | option_number | Identification number / type of option that is answered |
12 | uint32_t | reply_type | Type of answer |
16 | uint32_t | reply_length | Length of the response data |
20th | variable | reply_data | Reply data if reply_length> 0 |
The following response types have been defined so far:
Surname | value | meaning |
---|---|---|
NBD_REP_ACK | 1 | The server accepts the option or has no further response data (with NBD_OPT_LIST) |
NBD_REP_SERVER | 2 | Description of the block device. This is followed by the length of the name as a 32-bit number, the name and - if there is still space in the response packet - any further descriptive details in plain text. |
NBD_REP_ERR_UNSUP | 8000 0001 hex | Client sent an unknown option |
NBD_REP_ERR_POLICY | 8000 0002 hex | The server understood the option, but the server is not allowed to accept the option (e.g. NBD_OPT_LIST can be allowed or forbidden in the configuration file) |
NBD_REP_ERR_INVALID | 8000 0003 hex | The server understood the option, but it was syntactically invalid |
NBD_REP_ERR_PLATFORM | 8000 0004 hex | The option is not supported by the platform on which the server is running. (currently unused) |
The negotiation phase is completed as soon as the server has positively acknowledged the NBD_OPT_EXPORT_NAME option. It then sends the identification data of the exported block device to the client:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint64_t | device_size | Size of the exported block device (in bytes) |
8th | uint16_t | device_flags | Flags that apply to the exported device:
|
10 | uint8_t [124] | padding | unused, all 0 |
Data phase
The NBD client forwards the information about the size of the block device, any flags and the open socket to the kernel via special system calls and terminates itself. The kernel then takes over the further communication via this socket.
The kernel on the client side now sends read and write requests to the server. These have the following package structure:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint32_t | magic | Magic Number 0x25609513 |
4th | uint32_t | type | 0: read access; 1: write access; 2: controlled end of connection; 3: flush cache; 4: TRIM command |
8th | char [8] | act | 8 bytes, which are sent identically in the reply so that it can be assigned to a request |
16 | uint64_t | from | Offset (in bytes) from which to read / write |
24 | uint32_t | len | Length of the data block |
In the case of write access, the data to be written follow immediately. The server replies to every request with a reply. This has the following structure:
Offset | Data type | Surname | description |
---|---|---|---|
0 | uint32_t | magic | Magic Number 0x67446698 |
4th | uint32_t | error | 0 = OK (no error occurred) |
8th | char [8] | act | Copy of the handle in the associated request |
In the case of responses to read requests, the requested data follow immediately.
See also
- Loop device : The same idea with a local device
- iSCSI : A Competing System
- Network File System : Acts on a different level, but is also much more widely known