Machine-Aided Proof

from Wikipedia, the free encyclopedia

Machine-aided proofing (or more ambiguously: automatic proofing; a branch of automatic deduction ) is based on the use of computer programs to generate and test mathematical proofs of logical theorems . In contrast to a computer proof, the attempt is made to construct the entire formal proof consisting of steps and intermediate results.

Computer evidence


The term is used in particular for proofs that have the following scheme: First it is shown that the general problem P holds if another property E holds, i.e. if P can be reduced to E. The decisive factor here is that E can be decided by enumerating a finite number of (usually very many) cases. The reduction from P to E is usually a perfectly normal, informal mathematical proof. Only in the next step does the computer come into play: You write a program that enumerates all cases ( generate ) and then checks whether E actually applies to them ( test ). Basically, E is decided by a brute force method . Proposition P follows from both parts .

Objections to Computer Evidence

Computer evidence are e.g. Sometimes controversial among mathematicians. In addition to more psychological or hypothetical objections, there are also very tangible methodological ones.

  • A psychological objection is the ideal of a brief, logical justification that anyone can easily understand. However, such proofs are becoming increasingly rare in mathematical practice. The monster evidence of current mathematical research can no longer be understood by a single person in all parts (including the auxiliary sentences used).
  • The objection that the compiler or the hardware could have an error is somewhat hypothetical, because the errors of hardware or compilers that have become known up to now mostly did not relate to integer functions or logical functions, but were more active in the area of ​​floating point arithmetic such as B. the well-known Pentium FDIV bug of the Intel Pentium processor. However, mostly only logical and discrete mechanisms are used for computer evidence. This risk can also be minimized by repeating it on different computers and in different implementations.
  • Methodologically problematic is the question of whether the program correctly implements the underlying algorithm, whether the algorithm enumerates all cases in the generate phase, and whether the test phase actually guarantees property E for this case. So there is a program verification problem here .

Machine-Aided Proof


A mathematical proof is formalized , i.e. H. broken down into a sequence of individual logical steps so that they can be reproduced by a computer program. Evidence testing is a universal, only logic-dependent problem, while "generate-and-test" algorithms are problem-specific. So there are good reasons why machine-verified formal evidence should be trusted more than computer evidence.

Problems and theoretical limits

The question of whether a formal proof of any theorem can be constructed in a given logic is called a decision problem . Depending on the underlying logic, the decision problem can vary from trivial to unsolvable. In the case of propositional logic , the problem is decidable (i.e. a proof can always be generated if the theorem is logically valid, otherwise the invalidity is found), but NP-complete . In contrast, first-level predicate logic is only semi-decidable, that is, a valid theorem can be proven using unlimited time and memory resources, but an invalid theorem cannot always be recognized as such. Higher level logic (HOL) is neither decidable nor complete (with regard to so-called standard models ).

Automatic theorem provers

Despite these theoretical limits, practical Automatic Theorem Provers (ATPs) have been implemented that can solve many difficult problems in these logics.

There are a number of successful systems especially for the area of ​​first-level predicate logic. Virtually all of these methods generate evidence by contradiction and are theoretically based on a variant of Herbrand's theorem . The instances necessary for the contradiction are generated via unification . A first calculus of this kind is the resolution calculus introduced by John Alan Robinson in 1965 . Even today's evidence is largely based on extensions and refinements of this idea. B. the superposition calculus . In these calculi, new consequences are successively derived from an initial set of clauses by applying inference rules (i.e. the set is saturated ) until the contradiction becomes explicit by deriving the obviously unsatisfiable empty clause.

While theorem proofs of theorems from axioms about inference steps derived and reproduce in any form mathematical proofs are in the model test ( model checking ) mostly refined implemented techniques used evidence states brute-force enumerate and systematically search search spaces of evidence states. Some systems are also hybrids that use both interactive evidence and model testing.

Interactive theorem provers

A simpler but related problem is proof checking , where a given formal proof is checked to see whether it actually proves a given theorem. All that is required is to check every single step in the evidence, which is usually possible through simple functions.

Interactive theorem provers , also known as evidence assistants , require clues for the search for evidence that must be given to the evidence system by a human user. Depending on the degree of automation, a theorem prover can essentially be reduced to an evidence checker or independently perform significant parts of the evidence search automatically. Interactive proofs are now used for a wide range of tasks, the scope of which ranges from pure mathematics to theoretical computer science to practical problems in program verification .

Scientific and industrial applications

Significant mathematical proofs that have been checked by interactive theorem provers are the proof of the four-color theorem by Georges Gonthier (who replaced the older computer proof with Appel and Haken) and the formalized proof of Kepler's conjecture by the Flyspeck project (2014). But also automatic theorem provers have meanwhile been able to solve some interesting and difficult problems, the solution of which was unknown in mathematics for a long time. However, such successes tend to be sporadic, the processing of difficult problems mostly requires interactive theorem provers.

The industrial use of theorem provers or model checkers is currently still mainly concentrated on the verification of integrated circuits and processors . Since bugs with severe economic effects in commercial processors became known (see Pentium FDIV bug ), their processing units are mostly verified. Theorem provers and model checkers have been employed in the latest processor versions from AMD , Intel, and others to prove many critical operations in them. Recently, theorem provers have also increasingly been used for software verification; large projects include partial verification of Microsoft's Hyper-V or extensive verification of the L4 (microkernel) .

Available implementations

Available implementations for automatic theorem provers are e.g. B. Simplify or Alt-Ergo. Like the newer systems Z3 or CVC-4, they show the (non-) fulfillability of formulas based on decidable background theories (satisfiability modulo theories ).

The well-known proofs SPASS, E, and Vampire are based on classical predicate logic. Lean is a newer system based on type theory (CoC).

Examples of interactive theorem provers are Isabelle and Coq , which use higher level logics (HOL and CC, respectively).

The Theorem Prover Museum is an initiative to preserve the evidence as important cultural and scientific artifacts for scientific and historical research. It makes many of the systems mentioned above accessible in source code.

Examples of evidence techniques


  • Thomas C. Hales Formal Proof (PDF; 524 kB), 55:11, Notices of the American Mathematical Society, 1370-1382, 2008.
  • Georges Gonthier: Formal Proof — The Four-Color Theorem , 55:11, Notices of the American Mathematical Society, 1384-1393, 2008.
  • Dirk Leinenbach, Thomas Santen: Verifying the Microsoft Hyper-V Hypervisor with VCC , in: FM 2009: Formal Methods, Lecture Notes in Computer Science Volume 5850 (2009), 806–809, 2009. doi : 10.1007 / 978-3-642 -05089-3_51 .
  • Leonid Ryzhyk, John Keys, Balachandra Mirla, Arun Raghunath, Mona Vij, Gernot Heiser: Improved Device Driver Reliability Through Hardware Verification Reuse (PDF; 225 kB).
  • M. Nakao, M. Plum, Y. Watanabe (2019) Numerical Verification Methods and Computer-Assisted Proofs for Partial Differential Equations (Springer Series in Computational Mathematics).

Individual evidence

  1. See review article by T. Hales Formal Proof (PDF; 524 kB).
  2. ^ John Alan Robinson: A Machine-Oriented Logic Based on the Resolution Principle . In: Journal of the ACM . Volume 12, No. 1 , 1965.
  3. See Formal Proof — The Four-Color Theorem
  4. ^ A formal proof of the Kepler conjecture .
  5. Dirk Leinenbach, Thomas Santen: Verifying the Microsoft Hyper-V Hypervisor with VCC , in: FM 2009: Formal Methods, Lecture Notes in Computer Science Volume 5850 (2009), 806–809, 2009. doi : 10.1007 / 978-3- 642-05089-3_51 .
  6. Improved Device Driver Reliability Through Hardware Verification Reuse ( Memento of the original from February 20, 2011 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (PDF; 225 kB) @1@ 2Template: Webachiv / IABot /
  7. ^ Theorem Proving in Lean
  8. ^ Lean (fork) web editor
  9. ^ Theorem Prover Museum

Web links

Overview representations