BeeGFS

from Wikipedia, the free encyclopedia
BeeGFS

BeeGFS logo
Basic data

developer ThinkParQ / Fraunhofer ITWM
Publishing year 2007
Current  version 7.1
(December 11, 2018)
operating system Linux
category Distributed file system
License BeeGFS End-User License Agreement, Open Source
https://www.beegfs.io/

BeeGFS (formerly FhGFS ) is an open source parallel file system that is specially developed and optimized with regard to data throughput for high-performance computers (" High Performance Computing "). During the development, particular emphasis was placed on simple handling as well as high flexibility and scalability.

BeeGFS was originally implemented at the Fraunhofer Center for High Performance Computing under the direction of Sven Breuner, who later took over the management of ThinkParQ. ThinkParQ GmbH was founded as a spin-off in 2014 to professionally maintain the file system and offer services such as support.

The BeeGFS software can be downloaded free of charge from the project website.

history

BeeGFS started in 2005 as an internally developed file system at the Fraunhofer Center for High Performance Computing to replace the previously used file system on the institute's new cluster.

In 2007 the first beta version of the software was announced during ISC 07 in Dresden and presented to the public during SC 07 in Reno, Nevada. A year later the first major release came on the market.

In 2014, the Fraunhofer spin-off ThinkParQ was founded, which took over sales, customer service and professional support for the software, as well as supporting development. At the same time, the FhGFS was renamed BeeGFS. While ThinkParQ has mainly been responsible for sales and support since then, Fraunhofer ITWM continues to develop and optimize the software in cooperation with ThinkParQ.

Another milestone for BeeGFS was reached in early 2016 when it was announced that BeeGFS is now available as open source.

Since BeeGFS is available to users free of charge, it is not known exactly how many installations have been made so far. However, there are now over 250 customers who are professionally supported by ThinkParQ. These include numerous scientific institutions such as universities and research institutes around the world, as well as commercial companies from the life sciences, finance, automotive and energy sectors.

BeeGFS is currently used in several supercomputing facilities, including some of the fastest high-performance computers in the world (according to the Top 500 classification). Examples: The Loewe-CSC Cluster at Goethe University Frankfurt, Germany (# 22 for installation), the Vienna Scientific Cluster of the Technical University of Vienna, Austria (# 56 for installation), and the Abel Cluster of the University of Oslo, Norway (# 96 during installation).

Concept & features

When developing BeeGFS, three main areas were particularly important to the developers: easy handling, high flexibility and high scalability.

BeeGFS runs on every Linux system and consists of several components: the client services, the metadata servers and storage servers as well as the management service.

BeeGFS architecture overview

To use BeeGFS, at least one instance of the metadata server and the storage server is required. However, with the BeeGFS it is possible to start any number of instances of metadata and storage servers in order to distribute the load with a large number of clients.

Access to user data is parallelized by dividing the data into so-called chunks. The chunks are stored independently of one another on several servers. The size of the chunks can be determined by the administrator. The administration of the data and the assignment of a file to the corresponding chunks are carried out by special metadata servers. BeeGFS supports metadata distributed over several servers, which means that file access is very scalable. The connection of the individual servers takes place either by means of RDMA (e.g. InfiniBand, Omni-Path, RoCE) or via TCP / IP connections (e.g. Ethernet).

Clients as well as metadata and storage servers can be added to an existing system without interruption. The client service is a lightweight module for the Linux kernel that does not require any kernel patches. With the BeeGFS it is possible to run the server over an existing local Linux file system (e.g. ext4, xfs, zfs), regardless of which file system it is, as long as it supports POSIX. It is recommended to use ext4 for the metadata server and xfs for the storage server. Both types of server run in user space.

There are no strict hardware specifications, so the software design allows the administrator the freedom to use the servers in any combination on the machines. A very popular option among BeeGFS users is therefore to run the metadata server and storage server on the same machine in order to save hardware costs.

BeeGFS supports various network connections with dynamic failover such as B. Ethernet or InfiniBand and various Linux distributions and Linux kernels (from Linux kernel 2.6.18 up to the latest available kernel versions). The BeeGFS uses init scripts for easy setup and start, but you can also use a graphical interface, the Java-based GUI (AdMon for "Administration & Monitoring"). This enables one to monitor and manage the BeeGFS or to identify performance problems.

BeeOND (BeeGFS on-demand)

BeeOND enables the creation of a BeeGFS within a node set with only one command line. The possible uses range from a dedicated file system for a specific cluster job to cloud computing or the quick and easy creation of test environments.

Benchmarks

The following benchmarks were made on the internal SSDs of the compute nodes of the Fraunhofer Seislab. The Fraunhofer Seislab is a development cluster of the Fraunhofer ITWM with 25 nodes (20 Compute + 5 Storage) and a 3-tier storage: 1 TB RAM, 20 TB SSD, 120 TB HDD. The performance for the internal SSDs of a single node on the local file system without BeeGFS is 1,332 MB / s (write) and 1,317 MB / s (read).

The nodes are equipped with 2 × Intel Xeon X5660, 48 GB RAM, 4 × Intel 510 Series SSD (RAID 0), ext4, QDR InfiniBand and run with Scientific Linux 6.3, Kernel 2.6.32-279 and FhGFS 2012.10-beta1.

See also

Web links

Individual evidence

  1. Latest stable BeeGFS release . December 11, 2018. Retrieved January 13, 2019.
  2. FhGFS: A Fast and Scalable Parallel Filesystem | FileSystems | Columns. Retrieved January 13, 2019 .
  3. a b ThinkParQ - The Company Behind BeeGFS. Retrieved May 4, 2017 (American English).
  4. Getting started - BeeGFS . In: BeeGFS . ( beegfs.com [accessed May 4, 2017]).
  5. Competence Center High Performance Computing - Fraunhofer Institute for Industrial Mathematics ITWM. Retrieved May 4, 2017 .
  6. A parallel file system - made in Germany. (PDF) March 7, 2012, accessed May 4, 2017 .
  7. BeeGFS Parallel File System Now Open Source . In: HPCwire . ( hpcwire.com [accessed May 4, 2017]).
  8. Bernd Lietzow: An Introduction to BeeGFS: Solid, fast, flexible - and easy! (PDF) December 13, 2016, accessed May 4, 2017 .
  9. BeeGFS Flyer. (PDF) Retrieved May 4, 2017 .
  10. Storage newsletter "... And Fraunhofer. Retrieved May 4, 2017 .
  11. VSC-2 - MEGWARE Saxonid 6100, Opteron 6132 HE 8C 2.2GHz, Infiniband QDR | TOP500 supercomputer sites. Retrieved May 4, 2017 .
  12. Abel - MEGWARE MiriQuid, Xeon E5-2670 8C 2.600GHz, Infiniband FDR | TOP500 supercomputer sites. Retrieved May 4, 2017 .
  13. Jan Heichler: An introduction to BeeGFS. (PDF) November 2014, accessed on May 4, 2017 .
  14. Jan Heichler: An introduction to BeeGFS. (PDF) November 2014, accessed on May 4, 2017 .
  15. BeeGFS Flyer. (PDF) November 2016, accessed on May 4, 2017 .
  16. David Ramírez Alvarez: BeeGFS Solid, fast and made in Europe. (PDF) February 2016, accessed May 4, 2017 .