Fast Interactive Verification through Strong Higher-Order Automation

European Research Council (ERC) Starting Grant 2016
Grant agreement No. 713999
March 2017 – February 2022
Principal investigator:   Jasmin Christian Blanchette, Vrije Universiteit Amsterdam
Senior collaborators:   Pascal Fontaine, Université de Lorraine and Inria
Stephan Schulz, DHBW Stuttgart
Uwe Waldmann, Max-Planck-Institut für Informatik
Host institution:   Vrije Universiteit Amsterdam, the Netherlands
Additional beneficiary:   Inria, France
Linked third party:   Université de Lorraine, France

Proof assistants are increasingly used to verify hardware and software and to formalize mathematics. However, despite the success stories, they remain very laborious to use. The situation has improved with the integration of first-order automatic theorem provers—superposition provers and SMT (satisfiability modulo theories) solvers—through middleware such as Sledgehammer for Isabelle/HOL and HOLyHammer for HOL Light and HOL4; but this research has now reached the point of diminishing returns. Only so much can be done when viewing automatic provers as black boxes.

To make interactive verification more cost-effective, we propose to deliver very high levels of automation to users of proof assistants by fusing and extending two lines of research: automatic and interactive theorem proving. This is our grand challenge. Our starting point is that first-order (FO) automatic provers are the best tools available for performing most of the logical work. Our approach will be to enrich superposition and SMT with higher-order (HO) reasoning in a careful manner, to preserve their desirable properties. We will design proof rules and strategies, guided by representative benchmarks from interactive verification.

With higher-order superposition and higher-order SMT in place, we will develop highly automatic provers building on modern superposition provers and SMT solvers, following a novel stratified architecture. To reach end users, these new provers will be integrated in proof assistants and will be available as backends to more specialized verification tools. The users of proof assistants and similar tools stand to experience substantial productivity gains: From 2010 to 2016, the success rate of automatic provers on interactive proof obligations from a representative benchmark suite called Judgment Day has risen from 47% to 77%; with this project, we aim at 90%–95% proof automation.

Proof assistants (also called interactive theorem provers) are interactive tools with a graphical user interface that make it possible to develop computer-checked, formal proofs of theorems, usually expressed in some variant of higher-order (HO) logic. The primary advantage of formal proofs over their pen-and-paper counterparts is the high trustworthiness of the result; but increasingly, proof assistants are also used for their convenience, especially for program verification, where the proof obligations can be very large and unwieldy.

Compared with other formal methods, the hallmark of proof assistants is their wide applicability and expressiveness. They have been employed since the 1990s for hardware and software verification at AMD and Intel. Some mathematical proofs, such as the four-color theorem and the Kepler conjecture, are so complex and require such massive computations that they could be fully trusted only after they had been conducted in a proof assistant. Vladimir Voevodsky, a leading homotopy theorist and Fields medalist, advocates the use of these systems as the only satisfactory answer to the unreliability of modern mathematics.

In computer science, two recent groundbreaking developments are the C compiler CompCert verified using the Coq proof assistant, and the seL4 operating system microkernel, verified using Isabelle. CompCert is entering Airbus's tool chain because it generates trustworthy optimized code; without a formal proof, certification authorities would not condone the use of compiler optimizations. The seL4 microkernel was developed jointly with a formal proof that it meets its specification. By building on the verified kernel, it is now possible to design verified software chains, with formalized safety and security properties. There are ongoing projects in the automotive industry, aviation, space flight, and consumer devices based on seL4.

So why is only a tiny fraction of software verified? There is a misconception that proof assistants can be mastered only by mathematical virtuosi, but this has been repeatedly refuted in practice. A much more serious issue that affects all users, whether novice or expert, is that proof assistants are very laborious to use. The formal verification of seL4 required about 20 person-years, compared with 2.2 person-years to develop the microkernel itself. To formalize one step in an informal mathematical argument, users must often introduce intermediate properties, interacting with the proof assistant through special commands called tactics. The need for users to engage in so many interactions—the lack of automation—stands out as the primary cause of low productivity.

The situation has improved substantially in recent years with the rise of first-order (FO) automatic theorem provers, especially superposition provers and SMT (satisfiability modulo theories) solvers. These are fully automatic command-line tools that take self-contained problems as input and systematically explore the proof space. Systems such as Sledgehammer for Isabelle, as well as HOLyHammer, MizAR, and Why3, provide a one-click connection from proof assistants to first-order provers, relieving the user from having to perform tedious tactic manipulations and to memorize lemma libraries. Using Isabelle without Sledgehammer has been compared to walking instead of running. According to Thomas Hales, the mathematician who proved the Kepler conjecture and who led the Flyspeck project that formalized it,

Sledgehammers and machine learning algorithms have led to visible success. Fully automated procedures can prove … 47% of the HOL Light/Flyspeck libraries, with comparable rates in Isabelle. These automation rates represent an enormous saving in human labor.

When formalizing Gödel's incompleteness theorems, Lawrence Paulson estimated his improvement in productivity in a private communication as a "factor of at least three, maybe five"—an anecdotal but telling remark.

Example  While formalizing some results that depend on multisets, the principal investigator found himself needing the basic property |A| + |B| = |AB| + |AB|, where A and B range over finite multisets and |  | denotes cardinality. Within 30 seconds, Sledgehammer produced a proof text invoking a suitable tactic with ten existing lemmas. Without Sledgehammer, proving the property could easily have taken 5 to 15 minutes, and even longer for a novice.

Not all proof goals arising in interactive verification are this easy. All too often, tools like Sledgehammer fail to prove trivial-looking goals. This can happen for a number of reasons:

First-order automatic provers are very useful, but they are greatly hampered by their lack of support for higher-order constructs. More research is necessary to combine the most successful first-order methods with strong higher-order reasoning.

Native higher-order automated reasoning has been researched since the late 1960s. However, this work has not produced a viable alternative to the Sledgehammer-style HO-to-FO translation. We see two reasons for this:

  1. Higher-order reasoning has not yet assimilated the most successful first-order methods, namely superposition and SMT.
  2. The existing provers are designed for small and tricky higher-order problems, whereas typical proof goals are only mildly higher-order but large.
Moreover, these provers do not support polymorphism, which is needed to reason about parameterized theories, and they usually fail to discover even trivial proofs by induction.

Interactive theorem proving has grown considerably in recent years. Proof assistants are employed to build safety- and security-critical systems, and it is not uncommon for research papers to be accompanied by formal proofs. The memory models of Java and C++ have been mechanically verified. Proof assistants are even deployed in the classroom, replacing pen-and-paper proofs. These circumstances point to a future where these tools will be routinely used for critical computing infrastructure, for programming language design, and more broadly for research in computer science and mathematics—contributing to more reliable systems and science. But to make this a reality, we must take the hard, tedious labor out of interactive verification. Despite the success of Sledgehammer-style automation, interactive verification is rarely cost-effective. Much more ought to be done.

A fundamental problem is that automatic provers and proof assistants are developed by two mostly disjoint communities, pursuing their own goals. Sledgehammer tries to compensate for the mismatch between these two worlds, but only so much can be done when using automatic provers as black boxes. A more radical approach is necessary for further progress.

Our grand challenge is to create efficient proof calculi and higher-order provers that will target proof assistants and their applications to software and hardware development.

High levels of automation will arise only by building on the strengths of both communities. Higher-order reasoning, including induction, really belongs in high-performance automatic provers. The principles underlying the tactics of proof assistants, largely based on higher-order rewriting, must be integrated into the proof calculi of automatic provers, to solve problems that require both higher-order rewriting and systematic search. Briefly, we want to enrich state-of-the-art automatic methods with concepts motivated by interactive verification. The first-order provers and calculi are sophisticated artifacts with fragile properties; combinations require careful theoretical and pragmatic considerations.

The grand challenge will be met by pursuing four objectives, presented below. Our starting point is that first-order provers are the best tools available for performing most of the logical work. First-order provers, in turn, may delegate this work to SAT (satisfiability) solvers, in a game of Russian dolls. Most problems that are higher-order are only mildly so—a little higher-order can go a long way. Often, expanding definitions and normalizing λ-terms is all the higher-order reasoning that is needed; but the proof assistants are often too weak to carry out the remaining reasoning steps automatically, and first-order provers cannot manipulate the encoded λs efficiently. Typical proof goals arising in interactive verification are not very difficult, but they require a mixture of undirected proof search and long chains of straightforward reasoning.

Objective 1: Extend superposition and SMT to higher‑order logic

Superposition and SMT are currently the most successful proof calculi for classical first-order logic. Despite some convergence, they remain very different technologies, with complementary strengths and weaknesses. The experience with Sledgehammer and similar tools is that many goals can be proved only with superposition or with SMT. With multi-core processors being the norm, it is beneficial to let different kinds of provers run in parallel. Another reason for pursuing both approaches is that there might be hard-to-predict synergies between higher-order reasoning and each calculus.

The challenge is to preserve the desirable properties of the underlying first-order calculi—whether theoretical (soundness and completeness) or practical (effciency)—while extending them to perform higher-order rewriting and other higher-order reasoning. This is especially problematic for superposition. The calculus relies on a term ordering to prune the search space. This has sometimes been seen as an insurmountable obstacle for the extension of superposition to higher-order logic, but we are confident that a λ-term ordering with the desired properties can be designed. Furthermore, recent work has shown that polymorphism, induction, and extensionality reasoning are compatible with superposition, all of which are desirable features of higher-order provers.

Proofs about proof calculi tend to be quite tedious. For this reason, we plan to "eat our own dog food" and carry them out in a proof assistant. Our contributions will be part of the IsaFoL (Isabelle Formalization of Logic) repository.

Objective 2: Design practical methods and heuristics based on well‑chosen benchmarks

We will develop high-level, technology-agnostic rules, heuristics, and strategies for instantiating higher-order variables (i.e., variables representing functions or formulas) and for reasoning about λ-terms, polymorphic types, (co)datatypes, (co)induction, and arithmetic.

A major part of this objective is to devise strong automation for higher-order logic (also called simple type theory). Given that higher-order automation has been researched for decades, with limited success, relying on a breakthrough is a high-risk aspect of our proposal. Yet several factors play in our favor:

Moreover, first-order provers often fail because induction is necessary. Recent work on automating structural induction in superposition and SMT are welcome developments; but to reach its full potential, this work must be generalized to arbitrary induction schemas, including well-founded induction and rule induction. Codatatypes and coinduction, which are used in the metatheory of programming to model infinite traces or processes, are other examples of general, widely applicable theories that could be automated e ciently. Higher-order logic is also powerful enough to capture mathematical binder notations such as summations and integrals, which arise when studying quantitative properties of programs and protocols; automating these will have practical value in a variety of domains.

It has been argued that to make further progress in automated reasoning, we must introduce a planning layer that governs the proving process at a higher level than proof calculi, performing plausible reasoning, emulating a human. Proof planning has not yet made a strong practical impact on interactive or automatic proving, but the time has come to revisit this idea in conjunction with machine learning, as part of this objective.

Objective 3: Design stratified architectures to build higher-order provers on top of state‑of‑the‑art first‑order provers

In the past decade, we have seen the emergence of higher-order provers based on a cooperative architecture. These are full-fledged provers that regularly invoke an external first-order prover as a terminal procedure, in an attempt to finish the proof quickly. The external prover may succeed if all the necessary higher-order instantiations have been performed. But because the less efficient higher-order part of the prover steers the search, this architecture leads to comparatively poor performance on problems with a substantial first-order component.

We propose a radically different stratified architecture for higher-order provers, code-named Matryoshka. At an abstract level, a first-order prover combines three ingredients: (1) a collection of FO formulas derived so far; (2) a set of FO inference rules; and (3) a main loop that applies the rules to derive more formulas. Our envisioned architecture will extend this setup with (1') a collection of HO formulas and (2') HO inference rules that operate on all formulas and put any derived formula in the appropriate collection. The principal modification to the underlying first-order prover is to the main loop, which must interleave FO and HO rule applications. Unlike with the cooperative architecture, a Matryoshka prover is a single program, written in one language—typically, C or C++.

A first-order prover A Matryoshka prover

With this novel architecture, the automatic prover behaves exactly like a FO prover on FO problems, performs mostly like a FO prover on HO problems that are mostly first-order, and scales up to arbitrary HO problems, in keeping with the zero-overhead principle ("What you don't use, you don't pay for"). The architecture is reminiscent of the interaction between the SAT solver and quantifier instantiation inside an SMT solver, but there are many open questions because the two logics involved in a stratified higher-order prover are more expressive.

The concrete results of this research project will include at least one higher-order superposition prover, based on E or SPASS, and at least one, perhaps two higher-order SMT solvers, based on veriT and possibly CVC4. The expressive formalisms and powerful automation o ered by these unconventional provers will set them apart from their predecessors.

Objective 4: Integrate our new provers into proof assistants, with proof reconstruction to ensure trustworthiness

Ultimately, the crucial question is whether our methods will bring substantial benefits to end users. Most of these will work with a proof assistant, which o ers a convenient interface for controlling automatic provers and for performing manual proof steps when necessary. Thus, it will be vital to integrate our new stratified provers in proof assistants, by developing or extending Sledgehammer-like tools. We plan to target Coq, Isabelle/HOL, and the TLA+ Proof System, which cover the main higher-order formalisms in use today: dependent type theory, higher-order logic, and set theory. Coq and Isabelle/HOL are probably the two most popular proof assistants, with hundreds of users. Coq's native automation is weaker than Isabelle's, and there is no equivalent to Sledgehammer. As for the TLA+ Proof System, it is a newer tool for verifying concurrent and distributed systems based on Leslie Lamport's Temporal Logic of Actions, with potential in industry. The proofs generated by the new provers will be translated, or "reconstructed," to yield self-contained Coq, Isabelle/HOL, and TLA+ proof texts.

A Matryoshka-enabled Sledgehammer

Automated reasoning requires highly optimized data structures and algorithms, written in an imperative language. A separation of concerns means that the same automatic provers can be shared across many proof assistants. We contend that too much work has gone into engineering the individual proof assistants, and too little into developing compositional methods and tools with a broad applicability across systems.

Funded by the project

Jasmin Christian Blanchette  (VU Amsterdam, principal investigator)
Pascal Fontaine  (U. Lorraine and Inria Nancy, senior collaborator)
Alexander Bentkamp  (VU Amsterdam, Ph.D. student)
Johannes Hölzl  (VU Amsterdam, postdoctoral researcher, expected to start in October 2017)
Robert Y. Lewis  (VU Amsterdam, postdoctoral researcher, expected to start in January 2018)

Associated members

Stephan Schulz  (DHBW Stuttgart)
Uwe Waldmann  (MPII Saarbrücken)

Main collaborators

Haniel Barbosa  (Inria Nancy)
Simon Cruanes  (Inria Nancy)
Mathias Fleury  (MPII Saarbrücken)
Stephan Merz  (Inria Nancy)
Anders Schlichtkrull  (DTU Copenhagen)
Daniel Wand  (TU München and MPII Saarbrücken)
Christoph Weidenbach  (MPII Saarbrücken)

Additional collaborators

Heiko Becker  (MPI-SWS Saarbrücken)
Tobias Nipkow  (TU München)
Giles Reger  (U. Manchester)
Andrew Reynolds  (U. Iowa)
Martin Riener  (Inria Nancy)
Cesare Tinelli  (U. Iowa)
Dmitriy Traytel  (ETH Zürich)
Jørgen Villadsen  (DTU Copenhagen)

Invited researchers

Sophie Tourret  (NII Tokyo):  9–13 January 2017 at MPII Saarbrücken and Inria Nancy
Andrew Reynolds  (U. Iowa):  10 July to 8 September 2017 at Inria Nancy
Simon Robillard  (Chalmers Gothenburg):  14 August to 17 November 2017 at VU Amsterdam


Daniel El Ouraoui  (U. Paris Diderot):  March—July 2017 at Inria Nancy


The project is thankful for the support and advice of the following friends and colleagues: Jeremy Avigad, Jean-Pierre Banâtre, Chad Brown, Jip Chong, Fabienne Elbar, Wan Fokkink, Carsten Fuhs, Kris de Jong, Cezary Kaliszyk, Steve Kremer, Leonardo de Moura, Anja Palatzke, Lawrence Paulson, Sylvain Petitjean, Andrei Popescu, Femke van Raamsdonk, Trenton Schulz, Mark Summerfield, Nicolas Tabareau, Dmitriy Traytel, Josef Urban, and Paul Zimmermann.


An efficient proof-producing framework for formula processing
Haniel Barbosa, Jasmin Christian Blanchette, and Pascal Fontaine.
Draft paper (PDF)Draft report (PDF)

Transfinite Knuth–Bendix orders for lambda-free higher-order terms
Heiko Becker, Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand.
Draft paper (PDF)Draft report (PDF)


Foundational nonuniform (co)datatypes for higher-order logic
Jasmin Christian Blanchette, Fabian Meier, Andrei Popescu, and Dmitriy Traytel. Accepted at 32nd Annual IEEE Symposium on Logic in Computer Science (LICS 2017).
Draft paper (PDF)

Congruence closure with free variables
Haniel Barbosa, Pascal Fontaine, and Andrew Reynolds. Accepted at 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2017).
Postprint (PDF)Report (PDF)

A lambda-free higher-order recursive path order
Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand. Accepted at 20th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS 2017).
Postprint (PDF)Report (PDF)

Friends with benefits: Implementing corecursion in foundational proof assistants
Jasmin Christian Blanchette, Aymeric Bouzy, Andreas Lochbihler, Andrei Popescu, and Dmitriy Traytel. Accepted at 26th European Symposium on Programming (ESOP 2017).
Postprint (PDF)Report (PDF)

An Isabelle formalization of the expressiveness of deep learning (extended abstract)
Alexander Bentkamp, Jasmin Christian Blanchette, and Dietrich Klakow. Accepted at 2nd Conference on Artificial Intelligence and Theorem Proving (AITP 2017).
Abstract (PDF)


An Isabelle formalization of the expressiveness of deep learning
Alexander Bentkamp. M.Sc. thesis, Universität des Saarlandes, 2016.
Thesis (PDF)

Formalization of nested multisets, hereditary multisets, and syntactic ordinals
Jasmin Christian Blanchette, Mathias Fleury, and Dmitriy Traytel. In Klein, G., Nipkow, T., Paulson, L. (eds.), Archive of Formal Proofs, 2016.
Formal proof development

Formalization of Knuth–Bendix orders for lambda-free higher-order terms
Heiko Becker, Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand. In Klein, G., Nipkow, T., Paulson, L. (eds.), Archive of Formal Proofs, 2016.
Formal proof development

Expressiveness of deep learning
Alexander Bentkamp. In Klein, G., Nipkow, T., Paulson, L. (eds.), Archive of Formal Proofs, 2016.
Formal proof development

Formalization of recursive path orders for lambda-free higher-order terms
Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand. In Klein, G., Nipkow, T., Paulson, L. (eds.), Archive of Formal Proofs, 2016.
Formal proof development

We are looking for outstanding candidates for Ph.D. and intern positions due to start between 2017 and 2019. Candidates should ideally have some experience with automatic or interactive theorem proving and be at ease with theory and engineering. Please contact Jasmin Blanchette and Pascal Fontaine for more information.

The matryoshka-devel mailing list is used by members of the Matryoshka team for project-related discussions. You can browse the archives. If you are interested in the project, please consider subscribing to the list.

The project's public relations staff consists of Jasmin Blanchette and Pascal Fontaine.

      This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 713999.