IBM General Parallel File System

from Wikipedia, the free encyclopedia

GPFS is an abbreviation for General Parallel File-System , a cluster file system from IBM . It arose from several research projects on parallel working file systems and was and is sold under several trade names:

  • IBM General Parallel Filesystem
  • Elastic Storage
  • Spectrum Scale

history

GPFS emerged from the IBM research projects "Tiger Shark File System" and "Vesta File System" and was originally referred to as "Multimedia" file system, which can still be found in internal names today. It quickly became apparent that GPFS is particularly suitable for high-performance computers due to its parallel architecture. In 1998 GPFS appeared as the official IBM product and the successor to Vesta / PIOFS as a Posix -compliant file system.

It became the file system behind the ASCI White and Purple supercomputers at the Lawrence Livermore National Laboratory. It was later ported to other operating systems:

Other network protocols such as Windows CIFS were supported. Originally a file system behind large storage installations, it was later sold as a software product independently of the hardware. Capabilities such as shared nothing clusters have recently been added. On July 14, 2014, IBM announced a cloud service called Elastic Storage . On February 17, 2015, GPFS was renamed Spectrum Scale by IBM .

GPFS in supercomputing

GPFS is used as a cluster file system with high read / write bandwidth in several installations of the TOP500 super computer list, examples:

Functions

Integrated storage systems from IBM consisting of hardware and software with GPFS under the Linux operating system are:

GPFS / Spectrum Scale has the following functional properties:

  • Several NAS computers can mount a cluster volume at the same time (parallel) for writing, so the file system is scalable for a large number of clients.
  • Striping and thus parallel reading and writing are supported at the level of the mass storage device and individual files. This parallelism enables very high throughput rates to be achieved.
  • Distributed lock manager: Parallel writing to a file system is made possible by the fact that a file can only be written by one process at a time
  • Metadata and data can be distributed across different disks to improve performance
  • Several GPFS servers (also called nodes) work as a highly available cluster, failures are intercepted
  • GPFS can also be based on the principle of version 4.1 shared-nothing cluster work (FPO - File Placement Optimizer) and can thus as HDFS work
  • very large limits for file size (8 EB ), directory size , file system size (8 YB ), number of files per file system (2 ^ 64)
  • Support for HSM / Hierarchical Storage Management
  • The volumes can be shared with the CIFS and NFS protocol at the same time, from version 4.1 also as a Hadoop Distributed File System.
  • Access rights control works for NFS (for Unix systems) with POSIX file rights and for CIFS (Windows systems) with ACLs . These file access rights can be controlled independently of one another
  • The file system works according to the copy-on-write principle. Similar to Windows "shadow copies", snapshots can be accessed via any exported directory, both via NFS and via CIFS
  • Asynchronous replication between different GPFS volumes is possible (Active File Management)

Web links

Individual evidence

  1. ^ FAST 2002 Conference on File and Storage Technologies. Retrieved October 30, 2017 .
  2. ^ ASCI Purple. Retrieved October 30, 2017 .
  3. File Placement Optimizer. Retrieved October 30, 2017 .
  4. Elastic Storage Announcement. Retrieved January 27, 2018 .