Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans

Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das

Agenda • Background • Perracotta • Approximate Inference • Contextual Properties • Chaining • Results • Critique

Background • Software tasks require specifications. • What are the intended behaviors of the program? • Expected outputs are necessary for testing. • What aspects can be modified during maintenance of software? • etc. • The problem: • Many programs don't provide precise specifications. • Many implementations are not consistent with specifications. • As maintenance continues, specifications become increasingly incorrect. • So? • Several researchers have been motivated to study the problem of specification inference.

Background (2)‏ • Previous work: • Proposed an approach to dynamically infer temporal properties of programs. • “...dynamically?” • To infer specifications by analysing sample execution traces of a program. • “...temporal properties”? • ...deal with the order of occurrence of program events. • ex) Property: acquiring a lock should eventually be followed by a release of the lock. • This paper addresses only inference of Alternating Properties, because “It's the strictest of the template patterns and has proven the most useful in practice.” • ex) If A and B are events specified to behave according to the Alternating Property, “ABABAB”, not “ABABBAAB”.

Background (3)‏ • Current limitations: • Inference algorithms scale poorly with the size of the program and input trace. • Inferred properties only worked for perfect traces. • ...in other words, there was an assumption that the implementations of the traced programs were correct. • Many of the inferred properties are uninteresting; since uninteresting properties add up, this makes it unfeasible for large programs.

Background (4)‏ • Quick summary of background: • Specifications tend to be inconsistent with implementations. • Researchers have developed techniques to dynamically infer specification properties of programs. • ...however they suffer three notable limitations: • These techniques only work with small programs • The techniques cannot detect specifications from inconsistent implementations. • Many of the inferred properties are uninteresting “noise”. • Contributions of this paper/Perracotta: • Address the above problems.

Perracotta • Contributions: • Approximate Inference • Makes it possible to infer a specification from an implementation that is not always consistent with that specification • Contextual Properties • Allows for more precise inferences by keeping track of contextual data instead of mere static behaviors • Selection Heuristics • Filters out the uninteresting properties, thus greatly reducing the amount of “noise” from the inferred properties.

Approximate Inference • Imperfect traces: • It is expected that allocated memory will be freed eventually, to avoid memory leak. • Unfortunately even skilled programmers fail to be consistent with this temporal property, especially in complex code. • A sample execution trace that is not consistent with this property is an example of an “imperfect trace”. • Previous algorithms failed to associate/infer properties from imperfect traces, thus ruining the whole point of dynamic inference. • Approximate Inference: • Infers a specification from an implementation, even if the implementation is bugged with respect to that specification

Approximate Inference (2)‏ • “STSTSTSTSTSSS” • Events (functions) S and T are called multiple times in a trace. • They alternate n times, but there is no alternation in last three S's. • Will Perracotta (successfully) infer the Alternating Property from this? • Perracotta's approach: • Partition the trace: • [S,T][S,T][S,T][S,T][S,T][SSS] • Number of alternations = 4 • Number of total partitions = 5 • Satisfaction rate of Alternating Property = 4/5 • The higher the satisfaction rate, the more likely it is to infer. • The lower the predefined threshold value, the more likely it is to infer.

Contextual Properties • “An acquired lock should eventually be released.” • But what if there are multiple locks? • Context-neutral: “a lock was acquired” • Context-sensitive: “Lock#1 was acquired” • Without context, the inference tool will treat all locks as if they are the same lock, thus not being able to infer anything about lock behavior.

Selection Heuristics • Goal: reduce the number of uninteresting properties • ex) “There is always a printf() before a readLine() prompt”. Infers nothing about API specifications - uninteresting. • Reachability: • Events with call relationships are less interesting than ones without.

Selection Heuristics (2)‏ • Name Similarities • Functions with similar names are likely to be associated with interesting inferences. • ex) ExAcquireFastMutexUnsafe vs ExReleaseFastMutexUnsafe • Chaining: • Suppose A->B, B->C, and A->C, that is, A and B have Alternating Property, as to BC and AC. There are three inferences. • It is correct to chain them into a single inference – ABC. • ...thus reducing the number of inferences from 3 to 1, reducing “noise”.

Results • Test Programs: • Daisy • JBoss • Windows Kernel APIs

Results (2)‏

Critique • Likes: • The scope of the paper was narrowed down to the Alternating Property. • It tackled problems worth solving in the field of Dynamic Inference. • Actually detected a major bug in Windows. • Dislikes: • Definitions of important keywords were all over the place in the paper. • It's not very clear how they got the algorithm to work with large programs. • The “Approach” section was actually merely the approach for the previous work, not the current one. There was no explicit label to clarify where the overview of Perracotta starts.

Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans

Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans

Presentation Transcript

Data Mining – Intro

David Evans cs.virginia/evans

David Evans cs.virginia/evans

Jinlin Yang and David Evans [jinlin, evans]@cs.virginia.edu Department of Computer Science University of Virginia PASTE

Randomized Strategies and Temporal Difference Learning in Poker

David Evans http://www.cs.virginia.edu/evans

Spatio-Temporal Data Mining

David Evans http://www.cs.virginia.edu/evans

David Evans http://www.cs.virginia.edu/~evans

David Evans http://www.cs.virginia.edu/evans

David Evans evans@cs.virginia.edu http://www.cs.virginia.edu/~evans

David Evans cs.virginia/evans

David Evans cs.virginia/evans

David Evans cs.virginia/evans

Incremental Mining Association Rules

David Evans cs.virginia/evans

Mining temporal interval relational rules from temporal data

David Evans cs.virginia/evans

Spatial and Temporal Data Mining

David Evans cs.virginia/evans

David Evans cs.virginia/evans

Algorithms for Mining Association Rules