Formal concept analysis

from Wikipedia, the free encyclopedia

The Formal Concept Analysis (abbreviated FBA ) (English Formal Concept Analysis , abbreviated FCA ) is a mathematical theory. It provides a method to recognize relationships in given amounts of data. Hierarchical relationships between terms are typical . The formal conceptual analysis can be understood as applied order and association theory.

The formal concept analysis is used in practice e.g. B. in data and text mining , knowledge management , semantic web , software engineering , economics and bioinformatics .

introduction

The formal concept analysis examines relationships in data collections and makes structures in the data clear. Here are objects (eg. As described by records) combined into groups based on common characteristics. Such groups are then further subdivided on the basis of further characteristics. This results in a hierarchical structure that can be illustrated in the form of a network-like order diagram. The aim is a mathematically based methodology that suits the conceptual thinking of humans.

Scope, content and concept

Every set of objects determined by common features is interpreted as a conceptual scope, the associated set of all common features as conceptual content . Both parts together, i.e. a scope and the associated content, form a formal term , whereby the addition “formal” indicates that it is a mathematical construction. A formal term is therefore always clearly defined both by its scope and by its content.

General and sub-concept

A formal term is a sub- term of a second formal term if its scope is entirely contained in the scope of the second. Then the content of the generic term (i.e. the term with the larger scope) is contained in the content of the sub-term.

Relation to the mathematical association theory

This sub-term-generic term order of the formal terms turns out to be an order structure . As a rule, it is network-like, i.e. not usually tree-like or even linear. But it can be proven that these systems have specific and well-studied characteristics: it always involves so-called complete lattices (ger .: complete lattices).

A term can have several generic terms, e.g. B. the term raptor combines the characteristics of both its generic term bird and those of its second generic term predator .

Emergence

The theory in its current form goes back to the Darmstadt research group around Rudolf Wille , Bernhard Ganter and Peter Burmeister , in which the formal concept analysis was developed in the early 1980s. However, the mathematical foundations were already created by Garrett Birkhoff in the 1930s within the framework of general association theory . Before the work of the Darmstadt group there were already approaches in various French groups. The writings of Charles S. Peirce and Hartmut von Hentig influenced the development of the formal conceptual analysis .

Motivation and philosophical background

In the article Restructuring Lattice Theory (1982), which established the formal conceptual analysis as a discipline, the discomfort with association theory and pure mathematics in general is named as a motivation: The production of theoretical results often achieved through "intellectual high-performance sport" is impressive, the links between neighboring ones However, areas and even parts of a theory would become weaker.

"The restructuring of association theory is an attempt to re-strengthen links to our general culture by interpreting the theory as concretely as possible and thereby promoting better communication between association theorists and potential users of association theory."

- Rudolf Wille : Restructuring lattice theory: An approach based on hierarchies of concepts

This goal relates to Hartmut von Hentig, who in 1972 called for a restructuring of the sciences “in order to make them easier to learn, mutually available and more generally (ie beyond professional competence) open to criticism.“ democratic control of research.

While a concept is reduced to its scope as a single-digit predicate in formal logic , formal concept analysis makes the conceptual theory less abstract again by also taking into account the conceptual content. Thus, the formal concept analysis is based on the categories of extension and intension of linguistics and the classical conceptual logic .

Clarity of terms is striven for in the sense of Charles S. Peirce's Pragmatic Maxime in that observable, elementary properties of the subsumed objects are developed. In his late philosophy, Peirce assumed that logical thinking aims to grasp reality through the three steps of concept, judgment and conclusion . Mathematics abstracts logical thinking, develops forms of possible reality and can therefore support rational communication . Against this background, Rudolf Wille defines:

“The aim and meaning of formal concept analysis as a mathematical theory of concepts and concept hierarchies is to support the rational communication of people by developing mathematically suitable conceptual structures that can be logically activated.”

- Rudolf Wille : Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies

Historically, however, a comparable motivation was the basis for the projects to develop analytical planned languages, cf. also universal language .

example

The following example, which is deliberately kept small, serves to explain the basics of formal concept analysis. It is part of a more extensive word field study in which bodies of water were systematized on the basis of characteristics. The example dealt with there has been reduced somewhat for the purposes here.

In the following, in addition to the table, the graphic that can be constructed from it, the line diagram , is shown.

Example of a formal context: "Waters"
Waters features
temporarily fluently Naturally standing constant maritime
G
e
g
e
n
s
t
like
n
d
e
Brook X X X
flow X X X
Haff X X X X
channel X X
laugh X X X
Puddle X X X
puddle X X X
trickle X X X
electricity X X X
Maar X X X
sea X X X X
lake X X X
pond X X X
Pool X X X
Pond X X

 

Line diagram according to the table of waters on the left.

Output data in tabular form

For an analysis, the data to be examined must be available in tabular form or converted into such a form.

In the table Example for a formal context: "Waters" , various types of water are listed as formal objects in rows. The corresponding formal characteristics determine the columns of this table.

If an object has a certain characteristic, there is a mark at the intersection in the respective row and column, usually an "X". If it does not have this characteristic or it is unclear whether it has this characteristic, then this marking is missing.

Real world and formal structures

If one wants to emphasize the difference between real objects and features on the one hand and their abstractions (the data in the table) on the other hand, one speaks of formal objects and formal features when referring to abstractions . Similarly, the entirety of the entire table is referred to as a formal context . Formal terms are introduced in more detail later .

Often the formal objects correspond to real objects in the world and the formal characteristics correspond to their real qualitative or quantitative properties. A formal object can also represent an abstraction - like the types of water in the example above. A formal feature can also represent an abstraction.

Line diagram

The above line graph includes circles and connecting lines. Circles represent formal terms. The sub-term-generic term hierarchy can be read from the lines.

With the reduced labeling used here , each item name and each feature name is entered exactly once in the diagram, with items below and features above concept circles. This is done in such a way that a feature can be reached from an object via an ascending line if the object has the characteristic.

In the diagram shown, e.g. B. the subject pond has the features standing and constant , but not the features temporary, natural, flowing and maritime . Accordingly, pools and puddles have exactly the characteristics temporary, standing and natural .

For each term you can read its scope and content on the line diagram. The scope of a concept consists of the objects from which an ascending line leads to the circle of the concept. The term, which is immediately to the left of the pond in the diagram , has the content standing and natural and the scope of pool, puddle, pond, maar, lake, pond, pond, lagoon and sea .

Mathematical basics

Associations of terms are well suited to organize and present data in such a way that they can be easily understood even without prior mathematical training. The mathematical fundamentals are briefly presented here.

Formal contexts and formal terms

Let two sets and a relation be given . The triple is then referred to as a formal context , as its set of objects and as its set of features ; for an object and a characteristic means "the object has the characteristic ". Often written instead of . The amount is called the incidence relation of the formal context.

If the quantities and finite, formal contexts can be good in the form of "cross-tabs" represent. Please note that the objects and features in this representation can be arranged arbitrarily. This order is then not part of the formal context, but only its representation.

If a set of objects is in a formal context , it is designated with

the set of common characteristics of the objects in . Is defined similarly for a lot of characteristics of the quantity

all objects that possess all characteristics . The sets and are called the derivatives of the corresponding sets and and the functions, which are both named with , are called derivative operators of .

The derivative operators have a number of very basic properties. If there are quantities of objects and quantities of features, then applies

  • and dual ,
  • and dual ,
  • and ,
  • .

In fact, the derivative operators thus define an antitonic Galois connection between the power set lattices of the object set and the feature set. Conversely, every such Galois connection between power set lattices can be represented as a pair of derivative operators of a formal context.

In a formal context , a pair is now called a formal concept of , if

  • a set of objects of is
  • a set of features of is
  • and
  • applies.

The amount is then called the scope and the amount is called the content of the term . The set of all terms is denoted by. If formal contexts are presented as crosstabs, formal terms can be understood as maximum, completely filled rectangles in this crosstab, given a suitable order of the objects and features .

Are now , so can be with

to define a partial order . This order then makes the structure a complete association . In fact, conversely, according to the main theorem of formal concept analysis, every complete lattice is order isomorphic to a concept lattice.

Associations of terms can be displayed as order diagrams (line diagrams) and thus unfold the data in terms of their structure and context. The objects have all features (connected by edges) above them; In the example opposite, 4 is straight, compound and square.

Mathematically more precisely, the simplified labeling of terms can be justified first. If one considers the set of all concepts for an object that have in its scope, then this set has a main filter in the concept lattice. Therefore , the subject is only noted below the smallest term contained in the scope . Dual to this , the characteristic is noted above the largest term that has a given characteristic in its content . A term in the order diagram has an object in its scope exactly when it is above the term that is labeled with the object. Correspondingly, a term in the order diagram has a feature in its content if it is below the term that is labeled with the feature.

Main Law of Formal Concept Analysis

It is a formal context and its conceptual association. One can for articles and features then the terms

consider. The concept of the object of and the concept of the feature of are mentioned. Furthermore applies

If now is a complete lattice, then is isomorphic to if and only if there are mappings such that

applies. In particular is isomorphic to .

The main clause is also known as Rudolf Willes main clause on concept lattices. Among other things, it states that every complete lattice is isomorphic to a concept lattice.

Implication theory of formal contexts

For a formal context , its implication theory can be examined. There is an implication of simply a pair with what is usually written with. It is said that in holds if every object that has all the features of also has all the features of , so if holds. This condition is equivalent to that .

If a set of implications is and is , then one designates with the smallest set that contains and is closed under . A set is called closed under if it is always or holds for all implications, i.e. if it is always implied. You can then see that the mapping is an envelope operator on the power set of .

Is an implication of so follows from case applies. This is equivalent to the fact that in every formal context in which all implications from apply, the implication always applies.

A basis for is then a set of valid implications from , so that every ( semantically ) valid implication from already follows from , by applying suitable syntactic inference rules such as Armstrong's rules . The set of all implications of closed in this new sense is a theory , since it can also be fulfilled according to the construction, for example with regard to the underlying context .

The basis is called irredundant if it is -minimal with this property. An example of an irredundant basis is the canonical basis (see also feature exploration ), which also has the property of being minimal with regard to the size of the basis.

It is true that a set of implications is a basis of a context if and only if the set of the closed sets under is exactly that of the contents of .

Feature exploration

It is possible to present the implication theory of a certain subject area with the help of a formal context. In particular, this means that one can do this with the help of a sufficient number of examples, which then become the objects of the formal context. In theory, such a set of examples can be given by a human expert or even a machine.

However, the problem arises here that it is neither guaranteed from the outset that a sufficient number of examples is given, nor that some generated examples are redundant, since examples already given are sufficient. From the point of view that generating good examples is difficult, questioning experts or even new experiments are expensive, and literature searches or algorithms can be time-consuming, this is a problem to be taken seriously.

The feature exploration algorithm can help here. Based on an already known set of implications and an already known set of examples from the subject area, the algorithm suggests implications that can then be accepted or rejected by an expert (human or not). An implication should be accepted precisely if it is valid in the subject area in question. If an implication is rejected, the expert has to generate a counterexample, which can then be accepted or rejected by an expert (human or not). An implication should be accepted precisely if it is valid in the subject area in question. By an accepted counterexample, the implication is refuted and thus the smallest possible amount of accepted implications is generated, which in the end fully describes the topic. In addition, the set of examples is also completed.

Application experience with the formal concept analysis

The formal concept analysis can be used as a qualitative method for data analysis. Since the early beginnings of formal concept analysis at the beginning of the 1980s, the research group for formal concept analysis at the TU Darmstadt has gained experience from more than 200 projects in which formal concept analysis was applied (as of 2005). Including from the areas: medicine and cell biology , genetics , ecology , software technology , ontology (computer science) , information and library science , office organization , law , linguistics , political science .

Many other application examples are z. B. described in: Formal Concept Analysis. Foundations and Applications , the proceedings of regularly held conferences such as: International Conference on Formal Concept Analysis (ICFCA), Concept Lattices and their Applications (CLA) or International Conference on Conceptual Structures (ICCS).

literature

Web links

References and comments

  1. ^ Rudolf Wille: Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts . In: I. Rival (Ed.): Ordered Sets: Proceedings of the NATO Advanced Study Institute Held at Banff, Canada . 1982.
  2. See section # Application experiences with the formal concept analysis for examples
  3. A sub-term arises when a formal term is further specified by additional features. It follows that the objects that belong to the sub-term also belong to the generic term. Conversely, all features of the generic term are also features of the subordinate term.
  4. a b Rudolf Wille: Restructuring lattice theory: An approach based on hierarchies of concepts. Reprinted in: ICFCA '09: Proceedings of the 7th International Conference on Formal Concept Analysis, Berlin, Heidelberg, 2009, p. 314. "Geistiger Hochleistungssport": "elaborate mental gymnastics".
  5. Hartmut von Hentig: Magician or Magister? About the unity of science in the process of understanding . 1st edition. Suhrkamp-Taschenbuch-Verlag, Frankfurt am Main 1974, ISBN 3-518-06707-9 . Quoted from Karl Erich Wolff: Order, Will and Concept . Ernst Schröder Center for Conceptual Knowledge Processing, Darmstadt 2003 ( fbmn.h-da.de [MS WORD; 2.0 MB ]). fbmn.h-da.de ( Memento of the original from September 12, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / www.fbmn.h-da.de
  6. ^ A b Johannes Wollbold: Attribute Exploration of Gene Regulatory Processes. (PDF; 4.6 MB) Doctoral thesis, University of Jena 2011. Digital Library Thuringia, p. 9 , accessed on November 14, 2015 (English).
  7. ^ Rudolf Wille: Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies . In: B. Ganter et al .: Formal Concept Analysis. Foundations and Applications , 2005, pp. 1f.
  8. Peter Rolf Lutzeier: Word and field: word semantic issues with special consideration of the word field concept . Dissertation (=  linguistic work . Volume 103 ). Niemeyer, Tübingen 1981, OCLC 8205166 , doi : 10.1515 / 9783111678726.fm .
  9. a b Bernhard Ganter, Rudolf Wille: chap. 1 "Conceptual associations of contexts" . In: Formal Concept Analysis. Mathematical basics . Springer, Heidelberg 1996, ISBN 3-540-60868-0 , urn : nbn: de: 1111-20111002812 .
  10. Bernhard Ganter: Discrete Mathematics: Ordered Sets. 2013, p. 66, pp. 167-168.
  11. ^ WW Armstrong: Dependency structures of data base relationships . In: IFIP Congress . Geneva 1974, p. 580-583 .
  12. a b Bernhard Ganter, Gerd Stumme, Rudolf Wille (eds.): Formal Concept Analysis. Foundations and Applications . Springer Science & Business Media, Berlin Heidelberg 2005, ISBN 3-540-27891-5 , doi : 10.1007 / 978-3-540-31881-1 ( books.google.de [accessed on November 14, 2015]).
  13. Susanne Motameny, Beatrix Versmold, Rita Schmutzler: Formal Concept Analysis for the Identification of Combinatorial Biomarkers in Breast Cancer . In: Raoul Medina, Sergei Obiedkov (ed.): ICFCA 2008 (=  LNAI ). tape 4933 . Springer, Berlin Heidelberg 2008, ISBN 978-3-540-78136-3 , p. 229–240 ( springer.com [accessed January 29, 2016]).
  14. Dominik Endres, Ruth Adam, Martin A. Giese, Uta Noppeney: Understanding the Semantic Structure of Human fMRI Brain Recordings with Formal Concept Analysis . In: Florent Domenach, Dmitry I. Ignatov, Jonas Poelmans (eds.): ICFCA 2012 (=  LNCS ). tape 7278 . Springer, 2012, ISBN 978-3-642-29891-2 , ISSN  0302-9743 , p. 96-111 , doi : 10.1007 / 978-3-642-29892-9 .
  15. Denis Ponomaryov, Nadezhda Omelianchuk, Victoria Mironova, Eugene Zalevsky, Nikolay Podkolodny, Eric Mjolsness, Nikolay Kolchanov: From Published Expression and Phenotype Data to Structured Knowledge: The Arabidopsis Gene Net Supplementary Database and Its Applications . In: Karl Erich Wolff, Dmitry E. Palchunov, Nikolay G. Zagoruiko, Urs Andelfinger (eds.): KONT 2007, KPP 2007 (=  LNCS ). tape 6581 . Springer, 2011, ISBN 978-3-642-22139-2 , ISSN  0302-9743 , p. 101-120 , doi : 10.1007 / 978-3-642-22140-8 .
  16. Mehdi Kaytoue, Sergei Kuznetsov, Amedeo Napoli, Sébastien Duplessis: Mining gene expression data with pattern structures in formal concept analysis . In: Information Sciences . tape 181 , no. 10 . Elsevier, 2011, p. 1989–2001 , doi : 10.1016 / j.ins.2010.07.007 ( hse.ru [PDF; accessed February 13, 2016]).
  17. ^ Aurélie Bertaux, Florence Le Ber, Agnès Braud, Michèle Trémolières: Identifying Ecological Traits: A Concrete FCA-Based Approach . In: Sébastien Ferré, Sebastian Rudolph (Ed.): ICFCA 2009 (=  LNAI ). tape 5548 . Springer-Verlag, Berlin Heidelberg 2009, ISBN 978-3-642-01814-5 , p. 224-236 , doi : 10.1007 / 978-3-642-01815-2 .
  18. Gregory Snelting, Frank Tip: Reengineering class hierarchies using concept analysis . In: Proceeding. SIGSOFT '98 / FSE-6 . tape 23 , no. 6 . ACM, New York 1998, ISBN 1-58113-108-9 , pp. 99–110 , doi : 10.1145 / 291252.288273 ( dl.acm.org [accessed February 4, 2016]).
  19. Gerd Stumme, Alexander Maedche: FCA-Merge: Bottom-up merging of ontologies . In: Uni Leipzig (Ed.): IJCAI . Leipzig 2001, p. 225–230 ( se-pubs.dbs.uni-leipzig.de [PDF; accessed on February 13, 2016]).
  20. ^ Uta Priss: Formal Concept Analysis in Information Science . In: American Documentation Institute (Ed.): Annual Review of Information Science and Technology . tape 40 , no. 1 . Information Today Inc., 2006, ISSN  0066-4200 , p. 521–543 , doi : 10.1002 / aris.1440400120 ( upriss.org.uk [PDF; accessed February 4, 2016]).
  21. Jens Illig, Andreas Hotho, Robert Jäschke, Gerd Stumme: A Comparison of Content-Based Tag Recommendations in Folksonomy Systems . In: Karl Erich Wolff, Dmitry E. Palchunov, Nikolay G. Zagoruiko, Urs Andelfinger (eds.): KONT 2007, KPP 2007 (=  LNCS ). tape 6581 . Springer, 2011, ISBN 978-3-642-22139-2 , ISSN  0302-9743 , p. 136-149 , doi : 10.1007 / 978-3-642-22140-8 .
  22. ^ Claudio Carpineto, Giovanni Romano (Eds.): Concept Data Analysis: Theory and Applications . John Wiley & Sons, 2004, ISBN 0-470-85055-8 ( eu.wiley.com [accessed February 4, 2016]).
  23. ^ Richard Cole, Gerd Stumme: CEM - A Conceptual Email Manager . In: Bernhard Ganter, Guy W. Mineau (eds.): Conceptual Structures: Logical, Linguistic, and Computational Issues (=  LNAI ). tape 1867 . Springer-Verlag, Berlin Heidelberg 2000, ISBN 3-540-67859-X , p. 438-452 , doi : 10.1007 / 10722280 .
  24. Dieter Eschenfelder, Wolfgang Kollewe, Martin Skorsky, Rudolf Wille: An exploration system for building law: Methods of developing a TOSCANA system . In: Gerd Stumme, Rudolf Wille (Hrsg.): Conceptual knowledge processing - methods and applications . Springer, Berlin Heidelberg 2000, ISBN 3-540-66391-6 , pp. 254-272 , doi : 10.1007 / 978-3-642-57217-3_12 .
  25. Nada Mimouni, Adeline Nazarenko; Sylvie Salotti: A Conceptual Approach for Relational IR: Application to Legal Collections . In: Jaume Baixeries, Christian Sacarea, Manuel Ojeda-Aciego (Ed.): ICFCA 2015 (=  LNAI ). tape 9113 . Springer, 2015, ISBN 978-3-319-19544-5 , ISSN  0302-9743 , p. 303-318 , doi : 10.1007 / 978-3-319-19545-2_19 .
  26. ^ Uta Priss: Linguistic Applications of Formal Concept Analysis . In: Bernhard Ganter , Gerd Stumme, Rudolf Wille (Eds.): Formal Concept Analysis - Foundations and Applications (=  LNCS ). tape 3626 . Springer, 2005, ISBN 3-540-27891-5 , ISSN  0302-9743 , p. 149-160 , doi : 10.1007 / 978-3-540-31881-1 .
  27. Beate Kohler-Koch , Frank Vogt: Norms and Rules Guided International Cooperation . "Quoted from: Peter Becker et al. The ToscanaJ Suite for Implementing Conceptual Information Systems". In: Gerhard Stumme, Rudolf Wille (Ed.): Conceptual knowledge processing - methods and applications . Springer, Berlin, Heidelberg, New York 2000, ISBN 978-3-540-66391-1 , pp. 325-340 .
  28. ^ International Conference on Formal Concept Analysis. dblp , accessed February 14, 2016 .
  29. ^ CLA: Concept Lattices and Their Applications. Conference homepage with open access articles from all conferences since 2004. CLA, accessed on November 14, 2015 (English).
  30. International Conferences On Conceptual Structures - Conferences and Workshops. New Mexico State University, accessed February 14, 2016 .