310 likes | 428 Vues
Data-Flow Frameworks. Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/ Distributivity. Jeffrey D. Ullman Stanford University. Data-Flow Analysis Frameworks. Generalizes and unifies the DFA examples from previous slide set. Important components :
E N D
Data-Flow Frameworks Lattice-Theoretic FormulationMeet-Over-Paths SolutionMonotonicity/Distributivity Jeffrey D. Ullman Stanford University
Data-Flow Analysis Frameworks • Generalizes and unifies the DFA examples from previous slide set. • Important components: • DirectionD: forward or backward. • DomainV (possible values for IN, OUT). • Meet operator∧(effect of path confluence). • Set of possible transfer functionsF (effect of passing through a basic block in the direction D).
Gary Kildall • This theory was the thesis at U. Wash. of Gary Kildall. • Gary is better known for CP/M, the first real PC operating system. • There is an interesting story. • Google query: [kildallcp/m].
Semilattices • V and ∧form a semilattice if for all x, y, and z in V: • x ∧ y = y ∧ x (commutativity). • x ∧ (y ∧ z) = (x ∧ y) ∧ z (associativity). • x ∧ x = x (idempotence). • Topelement ⊤ such that for all x, ⊤∧ x = x. • Why? Meet is “greatest lower bound,” so meet pushes us downward, and ⊤is above all other elements. • Bottomelement ⊥such that for all x, ⊥∧x = ⊥. • Optional and not generally used in the theory.
Example: Semilattice • V = power set of some set. • ∧ = union. • Union isidempotent, commutative, and associative. • What are the top and bottom elements?
Example: PowersetSemilattice { } Convention: Draw the meet of elements below both. {1} {2} {3} {1,2} {1,3} {2,3 } x {1,2,3} y x ∧ y
Partial Order for a Semilattice x < y means there is a path downward from y to x. • Say x ≤y iff x ∧ y = x. • Also, x < y iff x ≤y and x ≠y. • ≤ is really a partial order: • x ≤y and y ≤z imply x ≤z (proof in text). • x ≤y and y ≤x iff x = y. Proof: • If: Idempotence gives us x ≤x. • Only-if: x ∧ y = x and y ∧ x = y (given). Thus, x = x ∧ y (given) = y ∧ x (commutativity) = y (given). y x
Axioms for Transfer Functions • F includes the identity function. • Why needed? • Constructions often require introduction of an empty block. • F is closed under composition. • Why needed? • The concatenation of two blocks is a block. • Transfer function for a block can be constructed from individual statements.
Good News! • The problems from the last slide set fit the model. • RD’s: Forward, meet = union, transfer functions based on Gen and Kill. • AE’s: Forward, meet = intersection, transfer functions based on Gen and Kill. • LV’s: Backward, meet = union, transfer functions based on Use and Def.
Example: Reaching Definitions • Direction D = forward. • Domain V = set of all sets of definitions in the flow graph. • ∧ = union. • Functions F = all “gen-kill” functions of the form f(x) = (x - K) ∪G, where K and G are sets of definitions (members of V).
Example: RD Satisfies Axioms • Union on a power set forms a semilattice (idempotent, commutative, associative). • Identity function: let K = G = ∅. • Composition: A little algebra. f1(x) = (x – K1) ∪G1 f2(x) = (x – K2) ∪G2 f1(f2(x)) = (((x – K2) ∪G2) – K1) ∪G1 = (x – (K1∪K2 )) ∪ ((G2– K1) ∪ G1) G2, K2 G1, K1
Example: Partial Order • For RD’s, S ≤ T means S ∪ T = S. • Equivalently S ⊇T. • Seems “backward,” but that’s what the definitions give you. • Intuition: ≤ measures “ignorance.” • The more definitions we know about, the less ignorance we have. • ⊤ = “total ignorance.”
Using a DFA Framework • We apply an iterative algorithm to the following information: • DFA framework (D, V, ∧, F). • A flow graph, with an associated function fB in F for each block B. • Important assumption: There are Entry and Exit blocks that do nothing. Entry has no predecessors; Exit has no successors. • Feel free to ignore above important assumption. • A boundary value vENTRY or vEXIT if D = forward or backward, respectively.
Picture of a Flow Graph Entry The interesting stuff happens here Exit
Iterative Algorithm (Forward) OUT[Entry] = vENTRY; for (other blocks B) OUT[B] = ⊤; while (changes to any OUT) for (each block B) { IN(B) = ∧predecessors P of B OUT(P); OUT(B) = fB(IN(B)); }
Iterative Algorithm (Backward) • Same thing – just: • Swap IN and OUT everywhere. • Replace entry by exit.
What Does the Iterative Algorithm Do? • MFP (maximal fixedpoint) = result of iterative algorithm. • MOP = “Meet Over all Paths” (from the entry to a given point) of the transfer function along that path applied to vENTRY. • IDEAL = ideal solution = meet over all executable paths from entry to the point. • Question: why might a path not be “executable”?
Transfer Function of a Path f1 f2 fn-1 . . . B fn-1( . . .f2(f1(vENTRY)). . .)
Maximum Fixedpoint • Fixedpoint= solution to the equations used in iteration: IN(B) = ∧predecessors P of B OUT(P); OUT(B) = fB(IN(B)); • Maximum = IN’s and OUT’s of any other solution are ≤ the result of the iterative algorithm (MFP). • Subtle point: “maximum” = “maximum ignorance” = “we don’t believe any fact that is not justified by the flow graph and the rules.” • This is a good thing.
Example: Reaching Definitions • MFP = solution with maximum “ignorance.” • That is, the smallest sets of definitions that satisfy the equations. • Makes sense: • We need to discover all definitions that really reach. • That’s what “fixedpoint” gives us. • But we want to avoid “imaginary” definitions that are not justified. • That’s what “maximum” gives us.
Example: RD’s – (2) d: x = 10 Here is the MFP d e: x = 20 But if we add d at these two points, we still have a solution, just not the maximal (ignorance) solution. e,f f: y = 30 e,f
MOP and IDEAL • All solutions are really meets of the result of starting with vENTRY and following some set of paths (possibly imaginary) to the point in question. • If we don’t include at least the IDEAL paths, we have an error. • But try not to include too many more.
MOP Versus IDEAL • At each block B, MOP[B] ≤ IDEAL[B]. • Why? The meet over many paths is ≤ the meet over a subset. • Example: x ∧ y ∧ z ≤ x ∧ y because (x ∧ y ∧ z) ∧ (x ∧ y) = x ∧ y ∧ z. • Use commutativity, associativity, idempotence.
MOP Versus IDEAL – (2) • Intuition: Anything not ≤ IDEAL is not safe, because there is some executable path whose effect is not accounted for. • Conversely: any solution that is ≤ IDEAL accounts for all executable paths (and maybe more paths), and is therefore conservative (safe), even if not accurate.
MFP Versus MOP • Is MFP ≤ MOP? • If so, then since MOP ≤ IDEAL, we have MFP ≤ IDEAL, and therefore MFP is safe. • Yes, but … • Requires two assumptions about the framework: • “Monotonicity.” • Finite height(no infinite chains . . . < x2 < x1 < x). • Needed to assure convergence of the algorithm. • Example: OK for power sets of a finite set, and therefore for the frameworks we have seen.
MFP Versus MOP – (2) • Intuition: If we computed the MOP directly, we would compose functions along all paths, then take a big meet: ∧all paths P fP(vENTRY) • But the MFP (iterative algorithm) alternates compositions and meets arbitrarily. • That’s why we need the “monotonicity” assumption.
Monotonicity • A framework is monotoneif the functions respect ≤. That is: • If x ≤ y, then f(x) ≤ f(y). • Equivalently: f(x ∧ y) ≤ f(x) ∧ f(y). Proof that these are equivalent is on p. 625 of the text. But we only need the second; the first justifies the term “monotonicity.”
Good News! • The frameworks we’ve studied so far are all monotone. • Easy proof for functions in Gen-Kill form. • And they have finite height. • Only a finite number of defs, variables, etc. in any program.
In MFP, Values x and y get combined too soon. MOP considers paths independently and and combines at the last possible moment. f(x) OUT = x f OUT = f(x) ∧ f(y) IN = x∧y OUT = f(x∧y) OUT = y f(y) Two Paths to B That Meet Early B Entry Since f(x ∧ y) ≤ f(x) ∧ f(y), we may have imagined nonexistent paths, but we’re safe.
Distributive Frameworks • Strictly stronger than monotonicity is the distributivitycondition: f(x ∧ y) = f(x) ∧ f(y)
Even More Good News! • All the Gen-Kill frameworks are distributive. • If a framework is distributive, then combining paths early doesn’t hurt. • MOP = MFP. • That is, the iterative algorithm computes a solution that takes into account all and only the physical paths.