110 likes | 228 Vues
This research proposal outlines necessary improvements to the invariant model used by DIDUCE, a dynamic invariant checker. The current model heavily relies on binary integer representation, limiting its effectiveness, particularly for floating point numbers. We propose using a range-based approach for better handling of variable states and enhancing confidence measurements for observed values. Our strategy includes reporting violations as well as efficiency improvements through range merging and de-instrumentation, aimed at streamlining checks over time and ensuring better performance in dynamic analysis.
E N D
Improving the Invariant Model of DIDUCE CS 343 -- Research Proposal 12 June 2002 Katy Innes and Andy Westbrook
Overview • Review of DIDUCE • What’s wrong with DIDUCE’s current model? • How do we propose to fix it? • Related work • Other presentations • Summer break!
Review of DIDUCE • A dynamic invariant checker • Instruments user-specified portions of the Java Bytecode for a particular program • Maintains a hypothetical invariant for the value of many variables at selected program points • Does so using a a bitmask which is the “meet” of all values seen so far • The meet operator used in this case is the bitwise-or operator
What’s Wrong with DIDUCE? • The invariant model • It is heavily associated with the binary representation of integers • If a variable is allowed to take on values 1 and 4, it must also be allowed to take on value 5 • This model is of little use for floating point numbers • Empirically, this model has been shown to be meaningful with reference types only for distinguishing between null and non-null
For Example • The paper mentions a bug found in MAJC where a state variable takes on a new state • This variable is 0 for empty, 1 for occupied, or 2 for pending • The error occurs when it takes on 2 for the first time • But, if the variable took on 1 for empty, 2 for occupied, and 3 for pending DIDUCE would not find this bug • Would DIDUCE be better if it could handle either case?
Our Improvement (Perhaps) • Rather than use a bit vector for each invariant, we will use a set of ranges • For example, we might associate the range 1-2 with the previous example • We might have multiple ranges, or ranges of width one • To handle reference types, we would assign each class type a number and treat reference types as integers taking on the number corresponding to the type to which they point
Confidence • We developed a measurement of confidence for each range in an invariant • It is • This rewards small ranges that contain a large number of observed values
Reporting Violations • When we observe a value that does not fall into a range, we report a violation • These violations are sorted by the confidence of the invariant model violated. • This confidence is the mean of the confidences of the ranges defining the invariant • We also create a new range for that invariant, containing just the observed value
Efficiency Improvements • To improve efficiency ranges are merged • For two ranges to be merged, the difference in the confidence between the initial range with higher confidence and the newer range must be less than some empirically determined constant • This will result in merging ranges that are close together and have similar confidence • We will also limit the number of ranges per program point and will drop ranges with low confidence
More Efficiency Improvements • Deinstrumentation • When the program has been running for suitably long period of time and has no high confidence ranges for a particular invariant, we stop checking that invariant • We hypothesize that this will eliminate checking of variables that hold random or arbitrary values or can take on most of the values allowed by their type
Related Work • Daikon- tracks all observed values and then, after completion, determines invariants • Requires extensive training data • This provides better invariants than our proposal but at a much, much higher cost • A number of languages (e,g. Ada) support range-based subtyping • This supports our hypothesis that ranges are meaningful invariants