new frontiers in formal software verification

s1 s0 f new frontiers informal software verification Gerard J. Holzmann gholzmann@acm.org

23 spacecraft and 10 science instruments Exploring the Solar System model checking static analysis JASON 2 KEPLER DAWN WISE INSTRUMENTS CLOUDSAT Earth Science • ASTER • MISR • TES • MLS • AIRS Planetary • MIRO • Diviner • MARSIS Astrophysics • Herschel • Planck AQUARIUS MRO JUNO VOYAGER 2 VOYAGER 1 CASSINI EPOXI/ DEEP IMPACT STARDUST ACRIMSAT MARS ODYSSEY GRAIL SPITZER JASON 1 GRACE OPPORTUNITY MSL GALEX

formal software verification • after some ~30 years of development, is still rarely used on industrial software • primary reasons: • it is (perceived to be) too difficult • it is takes too long(months to years) • even in safety critical applications, software verification is often restricted to the verification of models of software, instead of software • goal: • make software verification as simple as testing and as fast as compilation

 verification of the PathStar switch(flashback to 1999) • a commercial data/phone switch designed in Bell Labs research (for Lucent Technologies) • newly written code for the core call processing engine • the first commercial call processing code that was formally verified • 20 versions of the code were verified with model checking during development 1998-2000 with a fully automated procedure

software structure basic call processing plus a long list of features (call waiting, call forwarding, three-way calling, etc., etc.) PathStarCode (C) call processing (~30KLOC) traditional hurdles: feature interaction feature breakage concurrency problems race conditions deadlock scenarios non-compliance with legal requirements, etc. ~10% call processing control kernel

complex feature precedence relations detecting undesired feature interaction is a serious problem

@ the verification was automated in 5 steps code 2 abstraction map 3 context PathStar call processing 1 4 feature requirements Spin msc 5 bug reports property violations

dial else Crdtmf dial1 Crconn else Cronhook error dial2 1. code conversion ... @dial: switch(op) { default: /* unexpected input */ goto error; case Crdtmf: /* digit collector ready */ x->drv->progress(x, Tdial); time = MSEC(16000); /* set timer */ @: switch(op) { default: /* unexpected input */ goto error; case Crconn: goto B@1b; case Cronhook: /* caller hangs up */ x->drv->disconnect(x); @: if(op!=Crconn && op!=Crdis) goto Aidle; ...etc... PathStar implied state machine: a state change a control state

2. defining abstractions • to verify the code we convert it into an automaton: a labeled transition system • the labels (transitions) are the basic statements from C • each statement can be converted via an abstraction – which is encoded as a lookup table that supports three possible conversions : • relevant: keep (~ 60%) • partially relevant: map/abstract (~ 10%) • irrelevant to the requirements: hide (~ 30%) • the generated transition system is then checked against the formal requirements (in linear temporal logic) with the model checker • the program and the negated requirements are converted into -automata, and the model checker computes their intersection

3. defining the context C code bug reporting model extraction map environment model Spin model database of feature requirements

4. defining feature requirements a sample property: “always when the subscriber goes offhook, a dialtone is generated” failure to satisfy this requirement: <> eventually, the subscriber goes offhook /\ and X thereafter, no dialtone is U generated until the next onhook. -automaton in Linear Temporal Logic this is written: <> (offhook /\ X ( !dialtone U onhook)) mechanical conversion

5. verification C code LTL requirement logical negation environment model abstraction map model extractor bug report * no dialtone generated

hardware support (1999) client/server sockets code scripts

t=40 min. t=15 min. iterative search refinement t=5 min. each verification task is run multiple times, with increasing accuracy

performance (1999) “bugs per minute” 15 bug reports in 3 minutes 25% of all bugs 50% of all bugs reported in 8 minutes 100 (60) percent of bugs reported (number of bugs reported) 75 (50) 50 (30) 25 (15) first bug report in 2 minutes 10 20 30 40 minutes since start of check

that was 1999, can we do better now? 1999 2012 32-core off-the-shelf system, running standard Ubuntu Linux (~ $4K USD) 2.5 GHz clockspeed (~80GHz equiv) 64 Gbyte of shared RAM • 16 networked computers running the plan9 operating system • 500 MHz clockspeed (~8GHz equiv) • 16x128 Mbyte of RAM (~2GB equiv) • difference: • approx. 10x faster, and 32x more RAM • does this change the usefulness of the approach?

performance in 2012:“bugs persecond” number of bugs found 32-core PC, 64 GB RAM 2.5GHz per core Ubuntu Linux (2012) 50% of all bugs in 7 seconds (38 bugs) (1999) 11 bug reports after 1 second 16 PCs, 128 MB per PC 500MHz Plan9 OS 10 seconds 10 min number of seconds since start of check

side-by-side comparison 1999 2012 15% of all bugs reported in 1 second (11 bugs) 50% of all bugs reported in 7 seconds (38 bugs) 1 desktop PC • 25% of all bugs reported in 180 seconds (15 bugs) • 50% of all bugs reported in 480seconds (30 bugs) • 16 CPU networked system

generalization: swarm verification • goal: leverage the availability of large numbers of CPUs and/or cpu-cores • if an application is too large to verify exhaustively, we can define a swarm of verifiers that each tries to check a randomly different part of the code • using different hash-polynomials in bitstate hashing and different numbers of polynomials • using different search algorithms and/or search orders • use iterative search refinement to dramatically speedup error reporting (“bugs per second”)

swarm search

swarm configuration file: # range k 1 4 # min and max nr of hash functions # limits depth 10000 # max search depth cpus 128 # nr available cpus memory 64MB # max memory to be used; recognizes MB,GB time 1h # max time to be used; h=hr, m=min, s=sec vector 512 # bytes per state, used for estimates speed 250000 # states per second processed file model.pml # the spin model # compilation options (each line defines a search mode) -DBITSTATE # standard dfs -DBITSTATE -DREVERSE # reversed process ordering -DBITSTATE -DT_REVERSE # reversed transition ordering -DBITSTATE –DP_RAND # randomized process ordering -DBITSTATE –DT_RAND # randomized transition ordering -DBITSTATE –DP_RAND –DT_RAND # both # runtime options -c1 -x -n spin front-endhttp://spinroot.com/swarm/ $ swarm –F config.lib –c6 > script swarm: 456 runs, avg time per cpu 3599.2 sec $ sh ./script the user specifies: # cpus to use mem / cpu maximum time

many small jobs do the work of one large job – but much faster and in less memory 100% coverage states reached swarm search 100% coverage with a swarm of 100using 0.06% of RAM each (8MB) compared to a single exhaustive run (13GB) linear scaling #processes(log) (DEOS O/S model)

tools mentioned: thankyou! • Spin: http://spinroot.com (1989) • parallel depth-first search added 2007 • parallel breadth-first search added 2012 • Modex: http://spinroot.com/modex (1999) • Swarm: http://spinroot.com/swarm (2008)

new frontiers in formal software verification

new frontiers in formal software verification

Presentation Transcript

Formal Verification

Hardware/Software Co-design Formal Verification Techniques

NEW FRONTIERS

New Frontiers in QLR

New Frontiers in Pathology

Formal Verification

Coverage Metrics in Formal Verification

Formal verification in SPIN

Formal Software Verification

Formal verification of software

Formal System Verification

New Frontiers in Pathology

Formal Verification(1)

New Frontiers

Automated Formal Verification of Software

Formal Verification

New Frontiers

Formal Verification

Formal Verification of Flight Critical Software

Coverage Metrics in Formal Verification

Formal verification : SAT

Formal verification : SAT