Shimin Chen LBA Reading Group

Merlin: Specification Inference for Explicit Information Flow ProblemsLivshits, Nori, Rajamani (MSR), Banerjee (IMDEA), PLDI’09 Shimin Chen LBA Reading Group

Explicit Information Flow in a Program • Given a program, can construct a propagation graph: • Node:method • Edge: explicit information flow between methods (through a method call parameter, a return value, or by way of an indirect update through a pointer) • (Information Flow) Specification: node labels • Regular (default): propagate taints to successors • Source: tainted initially • Sink: if tainted, then must report error • Sanitizer: cleanse/untaint/endorse information • Given a propagation graph and a specification, many tools exist that statically check if the propagation graph violates the specification.

Example • GetParameter, GetHeader are sources • WriteLine is a sink Write to web page. S1 or S2 may contain malicious scripts to run in web browsers

Problem & Solution • Problem: user-provided specification incomplete • False positive: incomplete information about Sanitizers • False negative: incomplete information about Sources and Sinks • Solution: Merlin • Automatically infers information flow specifications for programs • Intuition: most paths in a propagation graph are secure (from source to sink, passing sanitizers) • Approach (Idea): • A random variable per node: the node is a source, sink, or sanitizer w/ prob … • Compute probabilistic constraints for paths • Solve these constraints

Merlin Architecture input input output

Construct Propagation Graph • Inter-procedural data flow: • Limited by the accuracy of pointer analysis

Assumptions • Most paths in the propagation graph are secure • Number of sanitizers is small, relative to the number of regular nodes Focus: string-related vulnerabilities

Potential Sources, Sinks and Sanitizers • Potential sources: methods that produce strings as output • Potential sanitizers: methods that take a string as input and produce a string as output • Potential sinks: methods that take a string as input, but do not produce a string as output

Constraints Path safety: most paths from a source to a sink pass through at least one sanitizer. But exponential number of paths. Triple safety: O(N3), N is number of nodes.

Constraints cont’d Pairwise Minimization: unlikely to have two sanitizers on the same path. Sanitizer Prioritization: favor nodes with higher s(m) total source-to-sink paths passing m total paths passing m s(m)=

Constraints cont’d Source wrapper avoidance: unlikely to have two sources on the same path. Sink wrapper avoidance: unlikely to have two sinks on the same path.

Ideas of Solving the System • Each node is assigned a random variable with {true, false} two possible values • For each constraint, generate a probability constraint • For example, if node A and node B are both potential sources, but they are on the same path, then • Xa: true if A is source, false if A is not • Xb: true if B is source, false if B is not • Prob(Xa AND Xb = true) = low1 • Low1 is an input constant to the algorithm • Solve the set of constraints (using a tool called factor graph) • More details in the paper

Shimin Chen LBA Reading Group