120 likes | 209 Vues
Merlin is an automated tool that infers information flow specifications for programs by analyzing propagation graphs, ensuring security by identifying potential sources, sinks, and sanitizers. This paper discusses the methodology and algorithms used in the Merlin architecture to address path safety and minimize false positives and negatives in information flow analysis.
E N D
Merlin: Specification Inference for Explicit Information Flow ProblemsLivshits, Nori, Rajamani (MSR), Banerjee (IMDEA), PLDI’09 Shimin Chen LBA Reading Group
Explicit Information Flow in a Program • Given a program, can construct a propagation graph: • Node:method • Edge: explicit information flow between methods (through a method call parameter, a return value, or by way of an indirect update through a pointer) • (Information Flow) Specification: node labels • Regular (default): propagate taints to successors • Source: tainted initially • Sink: if tainted, then must report error • Sanitizer: cleanse/untaint/endorse information • Given a propagation graph and a specification, many tools exist that statically check if the propagation graph violates the specification.
Example • GetParameter, GetHeader are sources • WriteLine is a sink Write to web page. S1 or S2 may contain malicious scripts to run in web browsers
Problem & Solution • Problem: user-provided specification incomplete • False positive: incomplete information about Sanitizers • False negative: incomplete information about Sources and Sinks • Solution: Merlin • Automatically infers information flow specifications for programs • Intuition: most paths in a propagation graph are secure (from source to sink, passing sanitizers) • Approach (Idea): • A random variable per node: the node is a source, sink, or sanitizer w/ prob … • Compute probabilistic constraints for paths • Solve these constraints
Merlin Architecture input input output
Construct Propagation Graph • Inter-procedural data flow: • Limited by the accuracy of pointer analysis
Assumptions • Most paths in the propagation graph are secure • Number of sanitizers is small, relative to the number of regular nodes Focus: string-related vulnerabilities
Potential Sources, Sinks and Sanitizers • Potential sources: methods that produce strings as output • Potential sanitizers: methods that take a string as input and produce a string as output • Potential sinks: methods that take a string as input, but do not produce a string as output
Constraints Path safety: most paths from a source to a sink pass through at least one sanitizer. But exponential number of paths. Triple safety: O(N3), N is number of nodes.
Constraints cont’d Pairwise Minimization: unlikely to have two sanitizers on the same path. Sanitizer Prioritization: favor nodes with higher s(m) total source-to-sink paths passing m total paths passing m s(m)=
Constraints cont’d Source wrapper avoidance: unlikely to have two sources on the same path. Sink wrapper avoidance: unlikely to have two sinks on the same path.
Ideas of Solving the System • Each node is assigned a random variable with {true, false} two possible values • For each constraint, generate a probability constraint • For example, if node A and node B are both potential sources, but they are on the same path, then • Xa: true if A is source, false if A is not • Xb: true if B is source, false if B is not • Prob(Xa AND Xb = true) = low1 • Low1 is an input constant to the algorithm • Solve the set of constraints (using a tool called factor graph) • More details in the paper