Feature exploration

The attribute exploration is an interactive algorithm Formal Concept Analysis for finding implications between characteristics of data . The algorithm calculates a complete and minimal root base from which all the implications applicable in the area under investigation can be inferred.

idea

The aim of the feature exploration is to describe all possible feature combinations in a special subject area, i.e. H. to find the feature logic. This subject area is given by a formal context, the basic data structure of the formal concept analysis . This crosstab shows relationships between items and features. With the help of implications - in an extension also clauses - dependencies and connections on the part of the characteristics are explained.

The main task is to determine a root base . This is based on a collection of examples that are given in a partial context. In a clearly defined order, the algorithm proposes to an expert (or a supporting program) the implications that apply in the sub-context. If the expert accepts them, they are included in the trunk base. Otherwise, the expert has to provide a counterexample, i.e. H. a new line of context. In this way even infinite sets of objects (with a finite set of features) can be examined.

It is also possible to automatically calculate the root base for a given data set; then all suggested implications are accepted.

Mathematical basics

Pseudo shells

Let be a hull operator on the (finite) feature set M. A subset is called a pseudo hull if and only if it fulfills the following conditions: ${\ displaystyle X \; \ mapsto \; X ''}$ ${\ displaystyle P \; \ subseteq \; M}$

${\ displaystyle P \; \ neq \; P ''}$ ,
${\ displaystyle P}$ contains the hull of each pseudo-hull that is a proper subset of . ${\ displaystyle Q ''}$ ${\ displaystyle Q}$ ${\ displaystyle P}$

Implications and stem basis

One implication on a set M is a pair of subsets of M. Notation: . ${\ displaystyle A \; \ rightarrow \; B}$

A subset respects an implication if or holds. ${\ displaystyle T \; \ subseteq \; M}$ ${\ displaystyle A \; \ rightarrow \; B}$ ${\ displaystyle A \; \ not \ subseteq \; T}$ ${\ displaystyle B \; \ subseteq \; T}$

${\ displaystyle A \; \ rightarrow \; B}$ applies in a formal context if every subject matter (i.e. every line of the context) respects the implication. ${\ displaystyle (G, \; M, \; I)}$ ${\ displaystyle g '}$

An implication follows from a set of implications by definition if and only if holds in every formal context in which every implication from is also valid. ${\ displaystyle A \; \ rightarrow \; B}$ ${\ displaystyle {\ mathcal {L}}}$ ${\ displaystyle A \; \ rightarrow \; B}$ ${\ displaystyle {\ mathcal {L}}}$

If one looks at the set of all implications that apply to a context, the question arises whether there is a subset that represents all these implications. Such a base must meet the following conditions: ${\ displaystyle {\ mathcal {L}}}$

Every implication from applies in the formal context ( correctness ). ${\ displaystyle {\ mathcal {L}}}$

Any implication that holds in context follows from ( completeness ). ${\ displaystyle {\ mathcal {L}}}$

No implication follows from the other implications, i.e. from ( irredundancy ). ${\ displaystyle A \ rightarrow B \ in {\ mathcal {L}}}$ ${\ displaystyle {\ mathcal {L}} \ setminus \ {A \ rightarrow B \}}$

Duquenne and Guigues have shown that such a basis exists for finite sets of features. It is even uniquely determinable and of minimal cardinality under all possible bases of implications. There are different names for this: root base , canonical base or DG base.

The root base of a formal context is the set of implications . ${\ displaystyle (G, \; M, \; I)}$ ${\ displaystyle \ left \ {P \ mapsto P '' \ mid P {\ text {is pseudo content of}} (G, M, I) \ right \}}$

example

Formal context (incomplete) for the position of two squares in the plane.

Term association after completion of feature exploration.

The starting point is a mostly incomplete or empty sub-context.
The algorithm suggests valid implications in the sub-context . The expert can accept it or give a counterexample: ${\ displaystyle P \; \ longrightarrow \; P ''}$

ce → pa, cv, cs: accepted
cs → pa: accepted
pa, cv, cs → ce: accepted
ov, cv → pa, cs, ce: counterexample - new object with features ov and cv
ov, pa, cs → cv, ce: counterexample - new object with features ov, pa and cs
ov, pa, cv → cs, ce: accepted
di, cv → all characteristics (contradicting premise, the corresponding set of objects is empty): accepted
di, pa, cs → all characteristics: accepted
di, ov → all characteristics: accepted

The implications accepted by the expert form the base of the roots.
The term content of the extended context and thus the term association are clearly determined by this.

Exploration with background knowledge

Formal context for passing a driving test.

The trunk base for the example opposite is

${\ displaystyle P ^ {-} \ longrightarrow \; G ^ {-}}$
${\ displaystyle T ^ {-} \ longrightarrow \; G ^ {-}}$
${\ displaystyle G ^ {-} P ^ {+} \ longrightarrow \; T ^ {-}}$
${\ displaystyle G ^ {-} T ^ {+} \ longrightarrow \; P ^ {-}}$
${\ displaystyle T ^ {+} P ^ {+} \ longrightarrow \; G ^ {+}}$
${\ displaystyle G ^ {+} \ longrightarrow \; T ^ {+} P ^ {+}}$
${\ displaystyle G ^ {-} T ^ {-} P ^ {+} P ^ {-} \ longrightarrow \; \ bot}$
${\ displaystyle G ^ {-} T ^ {-} T ^ {+} P ^ {-} \ longrightarrow \; \ bot}$

However, this is a somewhat complicated way of expressing the simple equivalence :

{\ displaystyle G ^ {+} \ longleftrightarrow \; T ^ {+} P ^ {+}}

Only these two implications are obtained as a base base if one transfers the negation of the pass / fail characteristics through the implication and the completeness of the information through the clause to the algorithm as starting knowledge, analogously for the other characteristics P and T. ${\ displaystyle G ^ {+} \ wedge \; G ^ {-} \ longrightarrow \; \ bot}$ ${\ displaystyle \ top \ multimap \; G ^ {+} \ vee G ^ {-}}$

So if you have background knowledge, the feature exploration can be shortened and the base base can be reduced. For example, a set of cumulative clauses can be used that are more expressive than implications. The root base then consists of a set of implications, each with premises P , which are concluded under the background knowledge, that is to say contain all the features that are required by the background knowledge. For implications as background knowledge, this means a repeated application of the operator defined as follows to P : ${\ displaystyle {\ mathcal {L}}}$ ${\ displaystyle \ bigwedge A \ rightarrow \ bigvee \ bigwedge _ {i \ in I} B_ {i}, \; A, \; B_ {i} \; \ in \; M}$ ${\ displaystyle {\ mathcal {B}}}$ ${\ displaystyle {\ mathcal {L}}}$

{\ displaystyle X ^ {\ mathcal {L}}: = X \ cup \ bigcup \ {B \ subseteq M \ mid A \ subseteq X, A \ rightarrow B \ in {\ mathcal {L}} \}}

The valid implications of the underlying context can then be inferred from. However, this master base with regard to a “relative test context” defined by the background knowledge is not clearly determined in every case. ${\ displaystyle {\ mathcal {L}} \ cup {\ mathcal {B}}}$

application areas

With the help of feature exploration, data sets can be checked for completeness, functional dependencies and association rules can be found and presented in compact form, or knowledge bases expressed in description logics can be completed. An application in systems biology aims at process rules for gene regulatory and other networks by integrating knowledge and experimental data . In mathematics, feature exploration is used to structure evidence.

software

Concept Explorer : Easy-to-use, graphical Java program with the most important functions, including generating a lattice of terms. It is possible to enter background implications when calling from your own Java program.

Conexp-ng : Expandable Java implementation of Concept Explorer.

Conexp-clj : New implementation in the Java-based Lisp dialect Clojure . Particularly suitable for command line calls and script-based programming, with extended options such as Luxenburg bases, context automorphisms (symmetries of features), fuzzy FCA or an interface to the computer algebra system Sage .

ConImp : For Linux and DOS / Windows, command line based, limited to formal contexts with 256 objects, but with extended options such as input of background and incomplete knowledge. In contrast to the approach described above and implemented in conexp-clj, background knowledge does not change the algorithm there, but rather suggested implications are decided by inferring a set of background implications, not by the expert.

literature

Bernhard Ganter, Rudolf Wille: Formal concept analysis; Springer, 1996, ISBN 3-540-60868-0
Bernhard Ganter, Sergei Obiedkov: Conceptual Exploration; Springer, 2016, ISBN 978-3-662-49290-1

Web links

Dresden side to the formal concept analysis (English)
There lecture slides (PDF; 2.1 MB) as a detailed introduction to FCA and feature exploration.

Individual evidence

^ Bernhard Ganter, Rudolf Wille: Formal term analysis; Springer, 1996, ISBN 3-540-60868-0 , Theorem 8, p. 85
^ Bernhard Ganter: Finger Exercises in Formal Concept Analysis (PDF; 2.1 MB); Lecture slides, Dresden 2006, pp. 85–87.
↑ Gerd Stumme, Attribute Exploration with Background Knowledge and Exceptions. In: HH Bock u. a., Data Analysis and Information Systems . Springer 1996, pp. 457-469. Preprint (PDF; 219 kB)
^ Bernhard Ganter: Contextual Attribute Logic of Many-Valued Attributes. In Bernhard Ganter, Gerd Stumme, Rudolf Wille (eds.): Formal Concept Analysis. Foundations and Applications ; Springer, 2005, ISBN 3-540-27891-5 , p. 107. Online version
↑ Franz Baader and Barıs Sertkaya: Usability Issues in Description Logic Knowledge Base completion. In S. Ferre and S. Rudolph (Eds.): Formal Concept Analysis: 7th International Conference, ICFCA 2009 , LNAI 5548; Springer, Heidelberg 2009, p. 1-21.
↑ Johannes Wollbold: Attribute Exploration of Gene Regulatory Processes (PDF; 4.6 MB) . Doctoral thesis, University of Jena 2011.

[1] Bernhard Ganter, Rudolf Wille: Formal term analysis; Springer, 1996, ISBN 3-540-60868-0 , Theorem 8, p. 85

[fingerEx-2] Bernhard Ganter: Finger Exercises in Formal Concept Analysis (PDF; 2.1 MB); Lecture slides, Dresden 2006, pp. 85–87.

[3] Gerd Stumme, Attribute Exploration with Background Knowledge and Exceptions. In: HH Bock u. a., Data Analysis and Information Systems . Springer 1996, pp. 457-469. Preprint (PDF; 219 kB)

[4] Bernhard Ganter: Contextual Attribute Logic of Many-Valued Attributes. In Bernhard Ganter, Gerd Stumme, Rudolf Wille (eds.): Formal Concept Analysis. Foundations and Applications ; Springer, 2005, ISBN 3-540-27891-5 , p. 107. Online version

[5] Franz Baader and Barıs Sertkaya: Usability Issues in Description Logic Knowledge Base completion. In S. Ferre and S. Rudolph (Eds.): Formal Concept Analysis: 7th International Conference, ICFCA 2009 , LNAI 5548; Springer, Heidelberg 2009, p. 1-21.

[6] Johannes Wollbold: Attribute Exploration of Gene Regulatory Processes (PDF; 4.6 MB) . Doctoral thesis, University of Jena 2011.