Online analytical processing

from Wikipedia, the free encyclopedia

Online analytical processing ( OLAP ) is one of the methods of analytical information systems alongside data mining . OLAP is still assigned to the hypothesis-based analysis methods. Before the actual investigation, the analyst must know which requests he would like to make to the OLAP system. His hypothesis is then confirmed or refuted by the result of the analysis. In this context, OLAP systems often form the technological basis for current business intelligence applications. Typical application scenarios for corresponding OLAP systems include: a. reporting and analysis, but also planning and budgeting in the following areas: controlling, finance departments, sales, production, human resources and management, corporate control.

OLAP systems obtain their data either from the operational databases of a company or from a data warehouse . The use of a data warehouse prevents the analysis data from coming into contact with the transaction-oriented databases and impairing performance. The performance of an OLAP system also depends on the type of data storage used and its connection to the analysis client.

In contrast to online transaction processing (OLTP), the focus here is on carrying out complex analysis projects that cause a very high volume of data. The aim is to obtain an analysis result that supports decision-making through a multidimensional view of this data. Management in its role as decision-maker is mentioned as a special target group .

The OLAP underlying structure is a OLAP cube ( English Cube ), who from the operational database was created. This follows a multidimensional, data point-oriented logic in contrast to the line-oriented logic in OLTP.

species

A distinction is made between ROLAP ("relational OLAP"), which accesses a relational database , and MOLAP ("multidimensional OLAP"), which accesses a multidimensional database . HOLAP ("H" for "Hybrid") is an intermediate form between ROLAP and MOLAP. Each type has advantages and disadvantages.

MOLAP stores numbers in the form of data points. This gives MOLAP a performance advantage over ROLAP systems, which store data on a relational basis as data records.

Pre-calculated OLAP systems perform better than OLAP systems that calculate at runtime.

In-memory systems perform better than disk-based systems, but they have to be careful with memory.

ROLAP scales better, but is slower than MOLAP depending on the performance of the relational sources used. With ROLAP, this is due to the fact that the data are stored in a versatile, but possibly slower relational database, in addition to the aggregations that may have already been calculated in advance, while with MOLAP these data are available in a suitable, quickly accessible form as a data point. Another advantage of ROLAP is that less storage space is required because data is queried from existing databases. This is particularly useful when evaluating mass data in complex data warehouse environments.

HOLAP often offers a good compromise between ROLAP and MOLAP.

A fourth type of architecture is called DOLAP ( D  for desktop ). The basic data are first imported locally into the analysis client in order to be able to carry out a local analysis. However, a hardware design that may be too weak can be seen as a disadvantage. At DOLAP it is not time-consuming to evaluate the data, but rather to create and refresh the created cubes.

Another, increasingly popular type is memory-based OLAP. Here, all data is stored in RAM and all values ​​are calculated in real time. In the past, this technology was limited in terms of the amount of data. Due to the increasing prevalence of 64-bit computer architectures (see 4 GB limit ), large amounts of data can now be analyzed with memory-based OLAP.

OLAP tools are often characterized by multidimensionality. This multidimensionality should enable relevant business indicators (e.g. sales or cost figures ) to be viewed and evaluated in a multidimensional way using different dimensions (e.g. customers, regions, time). OLAP cubes are used for visual representation . These cubes are divided into different dimensions, which in turn are divided into elements. These elements form a compression tree or, more generally, a non-cyclic directed graph that represents the aggregations .

Requirements for an OLAP system

12 rules according to Codd

The OLAP term was coined in 1993 by Edgar F. Codd . He initially formulated 12 rules, which he expanded to 18 rules until the end. These evaluation rules were the first list of requirements for an OLAP system. The importance of the rules for evaluating an OLAP system can no longer be classified as particularly high today. This is due in particular to their strongly application-oriented orientation and their sometimes controversial rules. The rules arose from the collaboration with the company Arbor, which had recently presented the OLAP database to Essbase - Essbase is now being further developed and marketed by Oracle under the product name Hyperion Solutions .

Because of their pioneering status, the rules are often quoted:

  1. Multidimensional conceptual view of the data (most important criterion for OLAP)
  2. Transparency (clear separation between user interface and the underlying architecture)
  3. Access options (obtaining basic data from external or operational databases)
  4. Consistent reporting performance (reporting functionality as fast as possible)
  5. Client-server architecture (load distribution optimized for the purpose)
  6. Generic dimensionality (all dimensions uniform in their structure and functionality)
  7. Dynamic handling of sparsely populated matrices (dynamic memory structure adaptation)
  8. Multi-user support
  9. Unconstrained cross-dimensional operations
  10. Intuitive data analysis (direct navigation within the data cubes )
  11. Flexible reporting (results can be freely arranged in the report)
  12. Unlimited number of dimensions and consolidation levels (15 to 20 dimensions with any number of aggregation levels)

FASMI rules according to Pendse and Creeth

Pendse and Creeth presented in 1995 ( Ref : Pendse) under the acronym FASMI five-vendor evaluation rules in order to describe the OLAP concept. FASMI stands for "Fast Analysis of Shared Multidimensional Information" and states in detail:

  1. F ast: Queries should be allowed to take an average of five seconds. Simple queries should not take longer than one second and only a few, more complex queries up to 20 seconds of processing time.
  2. A nalysis: A OLAP system should be able to cope with any required logic. It should be possible for the user to define a more complex analysis query with little programming effort.
  3. S hared: An OLAP system should be designed for multi-user operation. This requires the availability of suitable access protection mechanisms.
  4. M ultidimensional: Pendse and Creeth demand a multi-dimensional structuring of the data with full support of the dimensional hierarchies as the main criterion.
  5. I nformation: During the analysis, all necessary data should be transparently available to a user. An analysis must not be influenced by limitations of the OLAP system.

In summary, it can be stated that the FASMI rules deal more with user requirements than technical requirements. Overall, however, they are less specific than the Codd rules, which is why considerably more systems can be assigned to OLAP according to this definition.

Market overview

In The OLAP Report 2006 the international OLAP market is divided as follows:

providers Market share comment
Microsoft Corporation 31.6%
Hyperion Solutions 18.9% now taken over by Oracle
Cognos 12.9% now taken over by IBM
Business Objects 7.3% now taken over by SAP
MicroStrategy 7.3%
SAP AG 5.8%
Cartesis 3.7% in the meantime taken over by Business Objects , Business Objects then taken over by SAP
Applix 3.6% now taken over by Cognos , then taken over by IBM
Infor Global Solutions 3.5% after the takeover of MIS GmbH ("ALEA")
Oracle Corporation 2.8%
Digital equipment 0.2% meanwhile taken over by HP

In addition, there are competitors Mondrian from Pentaho and Palo from Jedox from Freiburg im Breisgau in the field of open source software .

See also

literature

  • Nils Clausen: OLAP - Multidimensional Databases . Addison-Wesley-Longman, Bonn 1998, ISBN 3-8273-1402-X .
  • Edgar F. Codd, SB Codd, CT Salley: Providing OLAP to User-Analysts: An IT Mandate . Codd & Associates, Ann Arbor / Michigan 1993 ( uni-jena.de [PDF; 124 kB ]).
  • Bernd Held, Hartmut Erb: Advanced Controlling with Excel. Corporate management with OLAP and PALO . Franzis, Poing 2006, ISBN 3-7723-7585-5 .
  • Hartmut Messerschmidt, Kai Schweinsberg: OLAP with the SQL server. An introduction to theory and practice . dpunkt, Heidelberg 2003, ISBN 3-89864-240-2 .
  • Nigel Pendse, Richard Creeth: The OLAP Report . In: Business Intelligence . 1995 ( bi-verdict.com ).
  • Carsten Bange u. a .: OLAP & BI - 8 multidimensional databases and 17 reporting and analysis tools in comparison . Oxygon Verlag, Munich 2005, ISBN 3-937818-05-7 .

Web links

Wiktionary: OLAP  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Peter Gluchowski, Peter Chamoni: Lines of Development and Architectural Concepts of On-Line Analytical Processing. In: Analytical Information Systems: Business Intelligence Technologies and Applications. 4th, completely revised edition, 2010, pp. 200–202
  2. http://www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf
  3. http://www.mendeley.com/research/providing-olap-online-analytical-processing-to-useranalysts-an-it-mandate/
  4. Codd, EF and Codd, SB and Salley, CT: Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate . In: Codd and Date . Vol. 32, 1993 (English).
  5. Bauer, A. and Günzel, H .: Data Warehouse Systems . 2nd Edition. dpunkt-Verlag, Heidelberg 2004, ISBN 3-89864-251-8 .
  6. Nigel Pendse: Market share analysis. The OLAP market grew faster than predicted in 2006. In: The OLAP Report. April 10, 2007, accessed May 10, 2007 .