Automated Theorem Proving: A Retrospection & Applications of Formal Methods

Automated Theorem Proving:A Retrospection&Applications of Formal Methods CS3234 Aquinas Hobor and Martin Henz

Outline • Reflections on Coq • Applications of Formal Methods • How to study for the Final

A proverb “Only those who have been bitten by the snake can understand how it feels.”

Why are theorem provers used? Very high assurance due to mechanical checking • Checkers are very through: don’t get tired, don’t get bored, don’t make mistakes • If anything, the problem is the opposite – trying to convince a checker that a true thing is true can be frustrating.

In our class… • Many times I helped students who were sure that something was true, and just needed a little help to understand how to convince Coq. • Usually it was not more than a line or two…

In our class… • Many times I helped students who were sure that something was true, and just needed a little help to understand how to convince Coq. • Usually it was not more than a line or two… • … several times the difficulty was that the thing that they were sure was true was actually false. (e.g., n – 1 + 1 = n : Not true!)

There are some contexts where bugs have enormous cost. • Proofs about real software are hard to get right; Coq found a previously unknown bug in the proof in the (widely-used, second-edition) textbook.

Disadvantages of Automated Theorem Proving • Developing the hints / proof by hand can be very labor-intensive • It can be very difficult to formalize correctness • “correct” operating system? • “correct” web browser? • “correct” compiler? • Learning curve to use systems can be steep

Disadvantages of Automated Theorem Proving • Developing the hints / proof by hand can be very labor-intensive • It can be very difficult to formalize correctness • “correct” operating system? • “correct” web browser? • “correct” compiler? • Learning curve to use systems can be steep Now you understand…

One more advantage… they are fun to use! • A bit like writing software in a scripting language “Building such scripts is surprisingly addictive, in a videogame kind of way…” - Xavier Leroy • The advantage of never having to worry about bugs in the finished product • Can work on math at 3 AM without fear

One more advantage… they are fun to use! • A bit like writing software in a scripting language “Building such scripts is surprisingly addictive, in a videogame kind of way…” - Xavier Leroy • The advantage of never having to worry about bugs in the finished product • Can work on math at 3 AM without fear Unclear if this is an advantage 

In your own words… • “I found COQ is really an amazing tool. I like to play with it now. haha!” • “I love video games. Coq is the first programming language I use as a game. I really enjoy it haha!” • “[Proof completed.] is the best feeling in the world!” • (and many others) • Of course, it’s harder to give negative feedback, so…

In the words of Nick Benton Senior Researcher, Microsoft Research, Cambridge UK Wrote a paper in 2006 titled, “Machine Obstructed Proof” http://research.microsoft.com/ en-us/um/people/nick/mop.pdf

In the words of Nick Benton “After years doing programming language theory without going near a proof assistant, I was finally convinced by the POPLmark‘buzz’ and conversations at ICFP’05 that it was time to try one.” … [there was a workshop giving an introduction to Coq] … “The workshop description says “the available tools are [...] difficult to learn, inadequately documented, and lacking in specific library facilities required for work in programming languages”. I can confirm that I have rarely felt as stupid and frustrated as I did during my first few weeks using Coq.”

In the words of Nick Benton “Scripts are unreadable by themselves, as one has no idea what the tactics are doing to the proof state, and the documentation for them is incomprehensible to the novice. The only thing that works is lots and lots of trial and error in an interactive environment, and I still couldn’t give a coherent general description of what some of the tactics I’ve used many times actually do, or how they differ from half a dozen apparently similar ones. And basic ones are still missing; I spent days fighting with elim, case, destruct and variations on induction and still kept finding myself having done case splits without the information about which branch I was in. This was so frustrating I gave up on Coq (and spent a week playing with HOL Light) until Georges Gonthier showed me the magic, and frankly bizarre, incantation generalize (refl_equal x); pattern x at -1; case x.”

In the words of Nick Benton “There are bugs. On Windows, CoqIDE falls over, whilst Proof General only works with the original 2004 version of Coq 8. I spent ages defining Setoid structures on everything, only to find Setoidrewriting throws an ‘anomaly’ exception in interesting contexts.” (you may recall a HW assignment (hw5) where I said to use “generalize” instead of “spec”)

In the words of Nick Benton “I had many similar difficulties, but then started to make progress. Just having intermediate stages of the work in a computerized form rather than on many pages of paper proved a major benefit. Far more often than I’d expected, one can alter definitions and then mildly tweak the previous version of a proof to keep it up to date. On paper, I tend to keep going back to the top and doing everything from scratch to be sure everything is still consistent.”

In the words of Nick Benton “Automated proving is not just a slightly more fussy version of paper proving and neither (Curry-Howard notwithstanding) is it really like programming. It’s a strange new skill, much harder to learn than a new programming language or application, or even many bits of mathematics. I’m resistant to investing significant effort in tools (I don’t write clever TeX or Emacs macros), but the payoff really came the second time I used Coq: I was able to prove some elementary but delicate results for a different paper in just a day or so. Coq is worth the bother and it, or something like it, is the future, if only we could make the initial learning experience a few thousand times less painful.”

Of note • There is a somewhat similar course being offered in the past year to undergraduates • Using Coq to examine program semantics • University of Pennsylvania • Princeton University • Harvard University (a bit different) • Probably others • For good or evil, you have been on the bleeding edge of formal methods teaching…

So… • Well done for getting through it! • Hopefully you have learned a lot (we noticed that your pen-and-paper induction proofs improved significantly), and had some fun as well. • With a bit of luck you don’t hate your Profs (too much!) for putting it in the course… If you have suggestions for making the process easier in future years, please let us know.

How do people use formal methods? • Type theory • Program Analysis • Proof-Carrying Code • Certified compilation

Type Theory • You are familiar with languages like Java, where your variables (& function parameters, etc.) have a declared type. • intmyInt; • List<List<Int>>myListofListofInts; • ObjectmyObject; • The idea is to classify values into sets. During compilation, there is a phase called type-checking, where the compiler looks for type errors, such as • myListofListofInts := 5; // Oops… what does this mean?

Type Theory • Why do we do this? Because type errors almost always are mistakes in the program. • Analogy is to units in science (e.g., a Newton of force is 1 kilogram-meter per second squared). • You check your units when you do science for the same reason that it is a good idea to do type checking when programming. • We can (usually) do type-checking in ~ linear time, and can discover many bugs much faster than testing. • This is a form of proof generation (which we have only explained very little of in this course, but have used a lot, e.g., “auto”, “omega”, etc.) • The idea with a type system is to trade off expressivity for automatic generation.

Type Theory • The field that studies program typing is called Type Theory. • It turns out that there is a deep connection between formal logic and type theory, called the Curry-Howard Isomorphism. • Languages with strong type systems include • Java/ C# • SML / oCaml • Haskell • Languages with weak type systems include • C / C++ • Languages without any type systems include • Python/JavaScript • Assembly (some have weak typing)

Program Analysis • Another use for formal methods is automatic program analysis; e.g., bounds checking. • Lots of code uses integers that are supposed to stay within certain bounds • Arrays • Overflow (max_int + 1 – bad idea) • It is possible to develop software that examines source code and automatically proves that (for example) an array access is always in-bounds.

Bounds Checking • These automatic tools usually require a bit of help from the programmer. • But they are very effective: can catch many bugs long before they would have been caught through testing. • Companies that use this kind of technology: • Airbus • Microsoft • etc.

Bounds Checking How it works (one choice) • Programmer puts annotations in his code • int x __inrange(0,10); • int n __lessthan(sizeof(myArray)); • The tool then uses a form of Hoare logic to propagate these bounds in the code • If it is unable to prove that the invariant is obeyed, it complains • To run on real code, tools must be very complex • Pointer arithmetic, aliasing, etc. etc.

Proof-Carrying Code • Problem: we would like to have a way to get code from an untrusted source, which we then want to run on our computer • e.g., Web Browser • This sounds like a bad idea… • But (it turns out) people really want to do it.

Proof-Carrying Code • So, we will require that the untrusted code provider sends us both • The code, which we will run only after checking • A proof that the code obeys some safety policy • Simplest version you are familiar with: • Java web applets

Proof-Carrying Code • How it works: • Modify compiler to take a proof at the source code level (maybe just a proof that the source program is well-typed) and transform that proof as it compiles into a proof at the machine level. • Modify runtime system to accept both code and proof; it must check proof, then run code

Certified Compilation • Bugs in a program that are due to a mistake in the compiler (“compiler bugs”) are some of the most difficult to find – can take weeks or months. • The problem is, the source code is right. • Compilers are some of the most complicated pieces of software there are. • Hundreds of thousands of lines, highly complicated algorithms

Certified Compilation • The goal of a certified compiler is to have a compiler with no bugs. • That is, the source program is always correctly translated into the target language. • We have seen in lecture 12 some of the tools used: you give both the source and target languages a formal semantics, and then prove: • ¾¾’ ) C(¾) * C(¾’) • That is, that the compiler C preserves the semantics of the program.

Final • There is a paper part and a Coq part • Total: 90 points • Paper: 50 points, 60% • Coq: 40 points, 40% • You will have two hours (combined) to do both parts. • You can switch back and forth if you like.

Paper part: Format • Fair game topics: • Everything covered in lectures 1-12, midterm, quizzes 1-3, homework 1-10 • Not multiple choice; instead:

Paper part: Format • Fair game topics: • Everything covered in lectures 1-12, midterm, quizzes 1-3, homework 1-10 • Not multiple choice; instead: • Proofs! (Oh boy!) • Natural deduction • Induction • Semantic proofs • Hoare proofs • etc.

Coq part: Format • Five Questions • First-order logic * • Induction • Modal logic • Modal logic * • Hoare logic * Similar to questions 1 and 2 on the Coq quiz

Coq part: Format • Each problem is worth 9 points (out of 90, so 40%) • Maximum 36 points: pick 4 out of the 5 • Skip one if you get stuck! • No extra credit • Spend the extra time on the paper part • We are assuming that it will take about an hour • Around 15 minutes per question • Quiz: 2 questions, 23 minutes (11.5 minutes each)

Coq part: study strategy • Make sure that you can do the quiz in a reasonable amount of time (again, two questions on the final are similar to the two quiz questions. If you get them done quickly then you have 20% of the points already.) • Review major frameworks (Modal logic in HW5 and Hoare logic in HW10). You will not have enough time to figure them out again during the exam. • You can redo some problems to remind yourself how those parts work.

Test strategy • Remember: you can skip one Coq problem • If you get stuck, move on: maybe another one is easier. • Don’t Panic! If you get stuck then do paper part for awhile, maybe an idea for Coq part will occur. • Watch the time. It’s not a short exam. • Don’t get so absorbed with the Coq part that you neglect paper part, or vise versa.

Last but not least… Good Luck on your finals (including ours) and Thanks for taking CS3234! – Martin and Aquinas

Automated Theorem Proving: A Retrospection & Applications of Formal Methods