High availability

High Availability ( English highavailability , HA ) refers to the ability to guarantee despite failure of one of its components with a high probability (often 99.99% or better) the operation of a system. In contrast to the fault tolerance , operation can be interrupted in the event of a fault.

Availability and high availability

A system is said to be available when it is able to perform the tasks for which it is intended. As Availability the probability is referred to that a system is functional (available) within a specified time period. Availability is measured as the ratio of unplanned (error-related) downtime (= downtime) and total production time of a system:

${\ displaystyle \ mathrm {Availability {\ ddot {u}} availability (in ~ percent)} = \ left (1 - {\ frac {\ mathrm {downtime}} {\ mathrm {production time} + \ mathrm {downtime}} } \ right) \ cdot 100}$

or:

${\ displaystyle \ mathrm {Availability {\ ddot {u}} availability (in ~ percent)} = \ left ({\ frac {\ mathrm {production time (uptime)}} {\ mathrm {production time (uptime)} + \ mathrm {Downtime}}} \ right) \ cdot 100}$

The exact definition of high availability can vary. The Institute of Electrical and Electronics Engineers (IEEE) gives the following definition:

"High Availability (HA for short) refers to the availability of resources in a computer system, in the wake of component failures in the system."

Another definition of high availability is:

“A system is considered to be highly available if an application is still available in the event of an error and can continue to be used without direct human intervention. As a consequence, this means that the user perceives no or only a short interruption. High availability ( HA for short , derived from high availability ) describes the ability of a system to guarantee unrestricted operation if one of its components fails. "

- Andrea Held : Oracle 10g high availability

High availability and availability classes

The question of the availability class from which a system is to be classified as highly available is answered differently depending on the definition of availability.

An availability of 99% does not generally define high availability; it is generally regarded nowadays as basic or normal, at least for high-quality IT equipment. As a result, high availability is only spoken of at 99.9% or higher. However, whether 3 * 9 are already sufficient or only 4 * 9 or 5 * 9 make a system a high-availability system depends on the source and manufacturer and has to be assessed under the respective application scenario. In general, a system can be classified as highly available if its annual downtime is in the range of a few minutes (~ 99.999% or AEC-2) or less. In English one also speaks of dial-tone availability (' dial tone availability'), since this high availability is achieved for landline telephony .

If the above formula is used to calculate the availability over a period of one year, an availability of 99.99% corresponds, for example, to a downtime of 52.6 minutes. The number of nines in the percentage is usually used to identify the availability class: the above example with 99.99% means availability class 4.

Given a given maximum downtime, the following is an overview of the relevant classes 2 to 6, whereby a year is calculated with an average of 365.25 days and the month as 1/12 year:

Availability class 2: 99% ≡ 438 minutes / month or 7:18:18 hours / month = 87.7 hours / year, i.e. H. 3 days and 15:39:36 h
Availability class 3: 99.9% ≡ 43:48 minutes / month or 8:45:58 hours / year
Availability class 4: 99.99% ≡ 4:23 minutes / month or 52:36 minutes / year
Availability class 5: 99.999% ≡ 26.3 seconds / month or 5:16 minutes / year
Availability class 6: 99.9999% ≡ 2.63 seconds / month or 31.6 seconds / year

The calculated availability with a total downtime of one day per year would be 99.73% (almost VK3), one hour 99.989% (practically VK4), one minute 99.99981% (almost VK6) and one second 99.9999968% (VK7). This corresponds pretty closely to the 3σ, 4σ, 5σ, and 6σ levels of the standard normal distribution .

Availability Environment Classification

The Harvard Research Group (HRG) divides high availability into six classes in its Availability Environment Classification (AEC).

HRG class	designation	Explanation
AEC-0	Conventional	Function can be interrupted, data integrity is not essential
AEC-1	Highly Reliable	Function can be interrupted, but data integrity must be guaranteed
AEC-2	High availability	Function may only be minimally interrupted within specified times or during main operating hours
AEC-3	Fault resilient	Function must be maintained without interruption within specified times or during main operating hours
AEC-4	Fault Tolerant	Function must be maintained without interruption, 24/7 operation (24 hours, 7 days a week) must be guaranteed
AEC-5	Disaster Tolerant	Function must be available under all circumstances

Agreed period of availability

In companies, high availability is often defined as part of Service Level Agreements (SLA) and represents an essential evaluation criterion for IT services .

Many high-availability systems have to be online 24 hours * 7 days, in other words "around the clock" all year round. However, some of these systems only need to be highly available for a certain period of time: Deutsche Börse trading systems, for example, do not need to be highly available at night and on non-trading days. In these systems, high availability only relates to the time of day and / or the working days on which it is required.

Requirements for high availability

In general, HA systems strive to eliminate so-called single point of failure risks (SPOF) (a SPOF is a single component whose failure leads to the failure of the entire system).

A manufacturer of a high-availability system must equip it with the following features:

Redundancy of critical system components
fault-tolerant and robust behavior of the overall system

Typical examples of components that are used to achieve increased fault tolerance, are uninterruptible power supplies (UPS; engl. Uninterruptible power supply , UPS ), multiple power supplies, ECC -Speicher or the use of RAID systems. Techniques for server mirroring or redundant clusters are also used.

The higher the required availability, the more effort the operator has to invest in:

quickly accessible specialist staff
Spare parts availability
preventive maintenance
qualified error reporting and fast communication system

Highly specialized systems with the highest availability are for example

the Continuum series from Stratus
the Integrity NonStop series at HP , resulting from the acquisition of Tandem (1997) and the Digital Equipment Corporation (1998) on Compaq emerged
generally mainframes, e.g. B. those of the System z series from IBM
Telephone exchanges .

literature

Martin Wieczorek, Uwe Naujoks, Bob Bartlett (eds.): Business Continuity . Springer, 2003, ISBN 3-540-44285-5 .
Marcus, Evan et al. Stern, Hal: Blueprints for High Availability: Designing Resilient Distributed Systems . John Wiley & Sons, 2000, ISBN 0-471-35601-8 .
Floyd Piedad, Michael Hawkins: High Availability: Design, Techniques and Processes . Prentice Hall Ptr, 2000, ISBN 0-13-096288-0 .

Web links

BSI high availability compendium

Individual evidence

↑ High Availability (HA). (No longer available online.) IEEE Task Force on Cluster Computing, archived from the original on July 14, 2010 ; accessed on October 26, 2010 (English). Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2
↑ Andrea Held: Oracle 10g high availability . Addison-Wesley, 2004, ISBN 3-8273-2163-8 .
^ Matthew Portnoy: Virtualization for Beginners . Wiley-VCH Verlag, Weinheim, 1st edition 2012. ISBN 978-3-527-76023-7 .
↑ HRG 2002, see also Andrea Held: High availability: key figures and metrics. (No longer available online.) In: TEC Channel. June 6, 2005, archived from the original on April 20, 2008 ; Retrieved October 26, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2

[1] High Availability (HA). (No longer available online.) IEEE Task Force on Cluster Computing, archived from the original on July 14, 2010 ; accessed on October 26, 2010 (English). Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2

[2] Andrea Held: Oracle 10g high availability . Addison-Wesley, 2004, ISBN 3-8273-2163-8 .

[3] Matthew Portnoy: Virtualization for Beginners . Wiley-VCH Verlag, Weinheim, 1st edition 2012. ISBN 978-3-527-76023-7 .

[4] HRG 2002, see also Andrea Held: High availability: key figures and metrics. (No longer available online.) In: TEC Channel. June 6, 2005, archived from the original on April 20, 2008 ; Retrieved October 26, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2