Scalability

Under Scalability refers to the ability of a system, network or process for resizing. Mostly the ability of the system to grow is referred to.

In electronic data processing , scalability means the ability of a system of hardware and software to increase its performance by adding resources - e. B. further hardware - to increase proportionally (or linearly) in a defined area.

However, a generally applicable definition of this term is not trivial. It is necessary to always specify an area for the respective special case (e.g. a system does not necessarily have to scale equally well with 100 simultaneous accesses as with 100,000 accesses). Resources can e.g. B. CPU, RAM, hard disks or network bandwidth.

The scalability of a system is indicated by the scaling factor - also called SpeedUp .

In business administration , the term is used quite generally to denote the expandability of a business model through capacity expansion in order to achieve higher efficiency and profitability. The scalability of business models without (high) additional investments and fixed costs is particularly interesting for investors . This is particularly possible in the internet economy. Scalability is also used in relation to capital markets, provided that efficiency also increases with increasing trading volumes.

Vertical vs. horizontal scaling

There are two ways to improve the performance of a system:

Vertical scaling (scale up)

Vertical scaling means increasing the performance by adding resources to a node / computer in the system. Examples of this would be increasing memory, adding a CPU, or installing a more powerful graphics card.

A characteristic of this type of scaling is that a system can be made faster regardless of the implementation of the software. In other words, you don't have to change a line of code to see a performance boost from scaling up vertically. The big disadvantage here, however, is that sooner or later you will reach a limit at which you cannot upgrade the computer if you are already using the best hardware that is currently on the market.

Horizontal scaling (scale out)

In contrast to vertical scaling, there are no limits to horizontal scaling (from the hardware perspective). Horizontal scaling means increasing the performance of a system by adding additional computers / nodes. However, the efficiency of this type of scaling is heavily dependent on the implementation of the software, since not all software can be parallelized equally well.

Types of scalability

Dependency between the types of scalability
	A affects B
A.		B.
Load scalability		Spatial scalability	No
		Temporal and spatial scalability	No
		Structural scalability	No
Spatial scalability		Load scalability	Possibly.
		Temporal and spatial scalability	Possibly.
		Structural scalability	No
Temporal and spatial scalability		Load scalability	Possibly.
		Spatial scalability	No
		Structural scalability	No
Structural scalability		Load scalability	Possibly.
		Spatial scalability	No
		Temporal and spatial scalability	No

There are basically four types of scalability:

Load scalability

Load scalability stands for constant system behavior over larger load ranges. This means that, on the one hand, a system does not have too long a delay with low, medium or high load and the requests can be processed quickly.

Example museum cloakroom

The first-come-first-served principle applies to a cloakroom in a museum where visitors drop off their jackets and pick them up again . There is a limited number of coat hooks and a larger number of visitors. The cloakroom, where the visitors line up, is a carousel. In order to find a free hook or jacket, every visitor searches for it linearly .

Our goal now is to maximize the time a visitor can actually spend in the museum.

The performance of this system is dramatically poor under heavy loads. Firstly, the search for free hooks becomes more and more complex, the fewer free hooks are available. Second, a deadlock is inevitable when the workload is high (e.g. in winter) . While all visitors hand in their jackets in the morning, they all pick them up again in the evening. Deadlock is expected to occur around noon and early afternoon when there are no more free hooks available and more visitors are at the end of the line to pick up their jackets.

People who want to pick up their jacket could resolve this deadlock by asking the arriving visitors to be admitted in line. Since the people who pick up their jacket will only ask for it after a certain timeout , this system is extremely inperformant.

Increasing the number of coat hooks would only delay the problem, but not solve it. The load scalability is consequently very poor.

Spatial scalability

A system or application exhibits spatial scalability if the storage requirements do not increase to an unacceptably high level with a growing number of elements to be managed. Since “unacceptable” is a relative term, in this context one usually speaks of acceptable if the storage requirement increases at most sub-linearly. To achieve this, z. As a sparse matrix (engl. Sparse matrix ) or data compression to be applied. Since data compression takes a certain amount of time, however, this often contradicts load scalability.

Temporal and spatial scalability

A system has a spatiotemporal scalability if increasing the number of objects a system comprises does not significantly affect its performance. For example, a search engine with linear complexity has no temporal-spatial scalability, while a search engine with indexed or sorted data, e.g. B. using a hash table or a balanced tree , could very well demonstrate a temporal-spatial scalability.

Structural scalability

Structural scalability characterizes a system, the implementation of which does not significantly hinder increasing the number of objects within a self-defined area.

Dependency between the types of scalability

Since a system can of course have several types of scalability, the question arises as to how and whether these are related to one another. See the table above .

The load scalability of a system is not necessarily negatively influenced by poor spatial or structural scalability. Systems with poor spatial or temporal-spatial scalability may also have poor load scalability due to the overhead of memory management or the high search effort. Systems with good temporal-spatial scalability may have poor load scalability, if z. B. was not sufficiently parallelized.

The relationship between structural scalability and load scalability is as follows: While the latter has no effect on the former, the reverse may very well be the case.

So the different types of scalability are not entirely independent of one another.

Scaling factor

The scaling factor ( SpeedUp ) describes the actual increase in performance of an additional resource unit. For example, a second CPU can provide 90% additional performance.

From a super-linear scalability is when the scaling factor in adding resources increases.

Linear scalability means that the scaling factor of a system remains the same per added resource unit.

Sub-linear scalability, in contrast, represents the decrease in the scaling factor as resources are added.

Negative scalability is achieved when the performance even deteriorates due to the addition of resources / computers. This problem has to be struggled with when the administrative effort, which arises from the additional computer, is greater than the increase in performance achieved.

Amdahl's law is a relatively pessimistic model for estimating the scaling factor. Based on this, Gustafson's Law is another way of calculating this factor.

System as a layer model

In order to build a system that is as scalable as possible, it has proven itself in practice to implement such a layer model , since with this approach the individual layers are logically separated from one another and each layer can be scaled individually.

A very popular architecture in the web area is the 3-tier architecture. In order to achieve a high level of scalability, a decisive factor is that each of these 3 layers scales well.

While the presentation layer can be scaled horizontally relatively easily, the logic layer requires a specially designed implementation of the code. It has to be taken into account that the largest possible proportion of the logic can be parallelized (see Amdahl's law and Gustafson's law above). Most interesting, however, is the horizontal scaling of the data storage layer, which is why a separate section is devoted to this topic (see horizontal scaling of the data storage layer below).

Practical ways to improve the scalability of websites

The scalability of websites can be improved by increasing the performance, as this enables a server to serve more clients at the same time.

Martin L. Abbott and Michael T. Fisher have created 50 rules to keep in mind when it comes to scalability. The following rules, among others, are relevant for websites:

Reduce DNS lookups and number of objects

When viewing the loading of a page in any browser with a debugging tool (e.g. Firebug ), it is noticeable that similarly large elements take different loading times. A closer look reveals that some of these elements require an additional DNS lookup. This process of address resolution can be accelerated by DNS caching on different levels (e.g. browser, operating system, Internet provider, etc.). In order to reduce the number of lookups, you could now combine all JavaScript and CSS files into one and you could combine all images into one large one and only display the desired image section using CSS sprites . In general, the following rule can be set up: The fewer DNS lookups are required when loading a page, the better the performance. The following table shows how expensive the DNS lookup and the connection establishment are.

Object download time	DNS lookup	TCP connection	Send request	Receive request
http://www.example.org/	50 ms	31 ms	1 ms	3 ms
http://static.example.org/styles.css	45 ms	33 ms	1 ms	2 ms
http://static.example.org/fish.jpg	0 ms	38 ms	0 ms	3 ms
http://ajax.googleapis.com/ajax/libs/jquery.min.js	15 ms	23 ms	1 ms	1 ms

However, modern browsers can keep several connections open at the same time to a server in order to download several objects in parallel. According to HTTP / 1.1 RFC 2616 , the maximum number of simultaneous connections per server in the browser should be limited to 2. However, some browsers ignore this policy and use a maximum of 6 simultaneous connections and more. However, if you reduce all JavaScript and CSS files as well as all images to just one file on a website, you reduce the load on the offering server, but at the same time bypass this mechanism of parallel connections of the browser.

Ideally, you use this parallelization in the browser to the full and at the same time have as few DNS lookups as possible. The best way to achieve this is to distribute a website over several subdomains (e.g. you can call up images from one subdomain while loading videos from another). This procedure allows a considerable increase in performance to be achieved relatively easily. However, there is no general answer to how many subdomains to use for the best performance. However, simple performance tests of the side to be optimized should provide quick information.

Horizontal scaling of the data storage layer

Scaling in terms of database access

The part of a system that is most difficult to scale is usually the database or the data storage layer (see above). The origin of this problem can be traced back to the paper A Relational Model of Data for Large Shared Data Banks by Edgar F. Codd, which introduces the concept of a Relational Database Management System (RDBMS) .

One way to scale databases is to take advantage of the fact that most applications and databases have significantly more reads than writes. A very realistic scenario, which is described in the book by Martin L. Abbott and Michael T. Fisher, is a book reservation platform, which has a ratio between read and write accesses of 400: 1. Systems of this type can be scaled relatively easily by making several read-only duplicates of this data.

There are several ways to distribute the copies of this data, depending on how current the data of the duplicates really needs to be. Basically it shouldn't be a problem that this data is only synchronized every 3, 30 or 90 seconds. In the book platform scenario, there are 1,000,000 books and 10% of them are reserved daily. Assuming the reservations are evenly distributed over the entire day, there is approximately one reservation per second (0.86 seconds). The probability that at the time (within 90 seconds) of a reservation another customer would like to reserve the same book is (90 / 0.86) /100,000 = 0.104%. Of course, this case can and will occur at some point, but this problem can easily be countered by a final, renewed check of the database.

One way to implement this method is to use the data, e.g. B. with a key value store ( e.g. Redis ) to cache . The cache only has to be renewed after its validity has expired and thus relieves the database enormously. The simplest way to implement this cache is to install it in an already existing layer (e.g. the logic layer). For better performance and scalability, however, you use your own layer or your own server between the logic layer and the data storage layer.

The next step is now to replicate the database. Most known database systems already have such a function. MySQL accomplishes this with the master-slave principle, whereby the master database is the actual database with write permissions and the slave databases are the duplicated read-only copies. The master database records all updates, inserts, deletes etc. in the so-called binary log, and the slaves reproduce them. These slaves are now placed behind a load balancer (see below) in order to distribute the load accordingly.

This type of scaling makes the number of transactions scalable with relative ease. The more duplicates of the database are used, the more transactions can be handled in parallel. In other words, this means that any number of users (of course depending on the number of servers) can access our database at the same time. This method doesn't help us scale the data itself too. In order to be able to save any amount of data in the database, a further step is necessary. This problem is covered in the next point.

Scaling with regard to database entries - denormalization

What you want to achieve with this is to divide a database over several computers and expand its capacity with additional computers as required. To do this, the database must be de-normalized to a certain extent. Under denormalization means the conscious withdrawal of normalization for the purpose of improving the runtime behavior of a database application .

In the course of denormalization, the database must be fragmented .

Fragmentation

A distinction is made between horizontal and vertical fragmentation.

With the horizontal fragmentation ( Eng. Sharding ) the totality of all data records of a relation is divided into several tables. If these tables are on the same server, it is mostly a question of partitioning. However, the individual tables can also be located on different servers. So z. B. the data for the business in the USA are stored on a server in the USA and the data for the business with Europe are on a server in Germany. This division is also known as regionalization.

Horizontal fragmentation does not create redundancy of the stored data, but of the structures. If a relation has to be changed, then not only one table has to be changed, but all tables over which the data from the relevant relation is distributed have to be changed. There is a risk of anomalies in the data structures here.

With vertical fragmentation , the dependent attributes (non-key attributes) of a table are divided into two or more groups. Each group becomes its own table, which is supplemented by all key attributes of the original table. This can be useful if the attributes of a relation result in data records with a very large record length. If the accesses mostly only concern a few attributes, then the few frequently accessed attributes can be combined in one group and the rest in a second group. Frequent accesses become faster because a smaller amount of data has to be read from the hard disk . The infrequent accesses to the remaining attributes are not faster, but neither are they slower.

The length of the record from which it makes sense to split into several smaller tables also depends on the database system. Many database systems save the data in the form of blocks with a size of 4 KiB , 8 KiB or 16 KiB. If the average record length is a little larger than 50% of a data block, then a lot of storage space remains unused. If the average record length is larger than the block size used, data access becomes more complex. If BLOBs appear in a relation with other attributes, vertical fragmentation is almost always an advantage.

Partitioning

Partitioning is a special case of horizontal fragmentation.

Large data stocks can be administered more easily if the data of a relation is divided into several small parts (= partitions) and these are saved separately. If a partition in a table is being updated, other partitions in the table can be reorganized at the same time. If an error is discovered in one partition, this single partition can be restored from a data backup, while programs can continue to access the other partitions. Most established database manufacturers offer partitioning, see e.g. B. Partitioning in DB2 and partitioning in MySQL .

Most database systems offer the option of either addressing individual partitions or addressing all partitions under a uniform table name.

The data access can be accelerated by partitioning. The main advantage, however, is the easier administration of the entire table.

Scalability in business administration

The scalability of a business model is defined as the ability to use additional resources to achieve growth in capacity and sales without a corresponding expansion of investments and fixed costs . For founders and investors, the form of scalability of a business model is particularly interesting, as it enables capacity and sales growth to be achieved without a corresponding expansion of investments and fixed costs.

In the case of business start-ups aimed at the local market, scalability is seldom given, as the trade is tied to one location. Even with start-ups that are heavily dependent on the individual professional competence of the founder (e.g. in consulting and other service professions), the limits of one's own working hours mark the limits of scalability. In both cases, sales cannot simply be increased, so that one has to use additional resources and invest in a new location or hire new employees and thereby cause new fixed costs.

For production units with limited capacity, scaling up beyond the maximum capacity requires large investments in order to set up a second, third, etc. production unit. In the digital economy, e.g. For example, in the case of an online business, investments must first be made in website, software, advertising, etc.; subsequently, however, sales increases can be achieved without using additional resources, if one disregards the logistics costs.

The following characteristics of a scalable business model are cited in general:

Low fixed assets
Low fixed costs (in relation to the total costs)
High proportion of variable costs
Effective marketing and sales activities in order to be able to sell the products and services quickly when capacity increases
Expansion into neighboring markets and countries

Assessing the scalability of a business model is important for professional investors as it increases the likelihood of a high return on their investments and / or a rapid increase in the value of the company with a decreasing need for large additional capital injections. This is interesting for venture capitalists, but also for founders who avoid diluting their own shares and have the prospect of increasing profit distributions.

Business models based on franchising are also more easily scalable, as the investments for setting up new locations and capacities are taken over by the franchisees. This also makes it possible to scale local business models that otherwise have tight capacity limits.

Vertical scaling can be described as extending the value chain in order to increase sales, while horizontal scaling means marketing existing products and services in neighboring markets, expanding the portfolio with similar products and services, or transferring a proven business model to other markets.

While the importance of the innovative character of a business model is often overestimated, inexperienced entrepreneurs tend to neglect scalability.

Web links

Wiktionary: scale - explanations of meanings, word origins, synonyms, translations

Individual evidence

^ Mark D. Hill: What is scalability? In: ACM SIGARCH Computer Architecture News . tape 18 , no. 4 , December 1990, ISSN 0163-5964 , p. 18-21 .

^ Leticia Duboc, David S. Rosenblum, Tony Wicks: A Framework for Modeling and Analysis of Software Systems Scalability . In: Proceeding of the 28th international conference on Software engineering ICSE '06 . ACM, New York, NY, USA 2006, ISBN 1-59593-375-1 , pp. 949-952 , doi : 10.1145 / 1134285.1134460 .

↑ M. Michael, JE Moreira, D. Shiloach, RW Wisniewski: Scale-up x Scale-out: A Case Study using Nutch / Lucene . In: IEEE International (Ed.): Parallel and Distributed Processing Symposium, 2007. March 30, 2007, pp. 1-8 , doi : 10.1109 / IPDPS.2007.370631 .

^ André B. Bondi: Characteristics of Scalability and Their Impact on Performance . In: Proceedings of the 2nd international workshop on Software and performance (WOSP '00) . ACM, New York, NY, USA 2000, ISBN 1-58113-195-X , pp. 195-203 , doi : 10.1145 / 350391.350432 .

↑ ^a ^b L. M. Abbott, MT Fisher: Scalability Rules: 50 principles for scaling Web sites. Addison-Wesley, Indiana 2011, ISBN 978-0-321-75388-5 , pp. 12-34.

^ Edgar F. Codd : A Relational Model of Data for Large Shared Data Banks . In: Communications of the ACM . ACM Press, June 13, 1970, ISSN 0001-0782 , pp. 377–387 ( eecs.umich.edu ( Memento from January 30, 2012 in the Internet Archive ) [PDF]).

↑ Patrick Stähler: Business Models in the Digital Economy: Features, Strategies and Effects. Josef Eul Verlag, Cologne-Lohmar 2001.

↑ Urs Fueglistaller, Christoph Müller, Susan Müller, Thierry Volery: Entrepreneurship: Models - Implementation - Perspectives. Springer Verlag, 2015, p. 116.

[1] Mark D. Hill: What is scalability? In: ACM SIGARCH Computer Architecture News . tape 18 , no. 4 , December 1990, ISSN 0163-5964 , p. 18-21 .

[2] Leticia Duboc, David S. Rosenblum, Tony Wicks: A Framework for Modeling and Analysis of Software Systems Scalability . In: Proceeding of the 28th international conference on Software engineering ICSE '06 . ACM, New York, NY, USA 2006, ISBN 1-59593-375-1 , pp. 949-952 , doi : 10.1145 / 1134285.1134460 .

[3] M. Michael, JE Moreira, D. Shiloach, RW Wisniewski: Scale-up x Scale-out: A Case Study using Nutch / Lucene . In: IEEE International (Ed.): Parallel and Distributed Processing Symposium, 2007. March 30, 2007, pp. 1-8 , doi : 10.1109 / IPDPS.2007.370631 .

[4] André B. Bondi: Characteristics of Scalability and Their Impact on Performance . In: Proceedings of the 2nd international workshop on Software and performance (WOSP '00) . ACM, New York, NY, USA 2000, ISBN 1-58113-195-X , pp. 195-203 , doi : 10.1145 / 350391.350432 .

[Abbott,Fisher-5] L. M. Abbott, MT Fisher: Scalability Rules: 50 principles for scaling Web sites. Addison-Wesley, Indiana 2011, ISBN 978-0-321-75388-5 , pp. 12-34.

[Codd1970-6] Edgar F. Codd : A Relational Model of Data for Large Shared Data Banks . In: Communications of the ACM . ACM Press, June 13, 1970, ISSN 0001-0782 , pp. 377–387 ( eecs.umich.edu ( Memento from January 30, 2012 in the Internet Archive ) [PDF]).