Trend Towards Parallelization

Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT SolversGeorge Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson Research Center, USAHorst SamulowitzLaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France

Trend Towards Parallelization • Focus Shifting From Single-Thread Performanceto Multi-Processor Performance • 100s and even 1000s of compute cores easily accessible • Classical Algorithm Parallelization, e.g., parallel sort, PRAM model • Significant Advances in Data Parallelisme.g., MapReduce, Hadoop, SystemML, R statistics • Challenge: Search and Optimization on 1000s of Processors • Tremendous advances in the Sequential case of Combinatorial SearchE.g., SAT solvers can tackle instances with ~2M variables, 10M constraints! • Exponential search appears to be an “obvious” candidate to parallelize! • In fact, many SAT/CSP/MIP solvers already do support multi-core runs AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Parallelization of Combinatorial Search • Fact: State-of-the-Art Search Engines Do NOT Parallelize Well • Brute Force exponential search is, of course, trivial to parallelize • But sophisticated search engines that adapt (through e.g. clause learning, impact aggregation, etc.) have inherent sequential aspects • AAAI 2012 Challenge Paper on the topic [Hamadi& Wintersteiger2012] • Rather Disappointing Performance at SAT Competitions. E.g., in 2011: • 8-coretrack: average speedup of best parallel solvers only ~1.8x • 32-core track: only ~3x • Top performing solvers based on little to no communication(CryptoMinisat-MT [Soos 2012], Plingeling[Biere 2012]) • Parallel track winners were “simple” Portfolio solvers(ppfolio[Roussel 2012], pfolioUZK[Wotzlaw et al, 2012]) AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

What makes parallelization of SAT solvers hard? Can we obtain insights into their behaviorbeyond eventual wall-clock performance? AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Contributions of the Paper • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Are there inherent bottlenecks that may hinder parallelization,irrespective of which heuristics are used to share information? • A Practical Study: Interesting properties of Actual Proofs • Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks • Proof-Based Measures that capture Best-Case Parallelizability • Coarse measure: “Depth” of the proof graph • Refined measure: Makespan of a resource constrained scheduling problem • Empirical Findings: Correlations and Parallelization Limits • Typical sequential proofs are not very parallelizable even in the best case! • “Schedule speedup” / makespan correlates with observed speedup AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Approach: Proof Complexity(applied here to Typically Generated Proofs) • Proof Complexity[Cook & Reckhov, 1979]: Study the nature (e.g., size, depth, width, “shape”, etc.) of Proofs of Unsatisfiability • Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers Runtime(any SAT solver, F)  minproofs Size(Resolution proof of F) • Note: Insights applicable also to Satisfiable instances! • Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution • Formal characterization [Achlioptas et al, 2001 & 2004] • Study of Proofs has provided strong insights into CDCL SAT solvers • What does “clause learning” bring? • What do “restarts” add? [Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011] Worst case / Best caseresults AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Underlying Inference Principle: Resolution • CDCL SAT solvers produce Resolution Derivations • Proof Graphand Depth: • Each initial and derived constraint is a node, annotated with its proof depth • proofdepth(initial clause C) = 0 • proofdepth(derived clause C) = 1 + maxparentsproofdepth(parent(C)) F : C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 C7 1 C9 1 C8 2 C11 2 C10 3 C12 3 Constraint ID Depth C13 4 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

How Parallelizable are Resolution Refutations? • Refutation(F) = Resolution Proof that derives the empty (“false”) clause • Depth of the proof clearly limits the amount of potential parallelization • Chain of dependencies • Theorem: Certain “pebbling” style instances have large depth • However, proofdepth bound on parallelization is very crude • Does not explain poor performance with small k (e.g., 8, 32, … processors) How does a typical sequential SAT solver proof look like? • Setup for Experiments: • Sequential Glucose 2.1 extended with proof output • GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose • Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose ** simplified statements; see paper for more formal notions http://x10-lang.org/satx10[IBM Teams: X10 and SAT/CSP] AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Proof Graph Example: Very Complex Structure [Easy sequential case, solved in ~30 seconds] AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Bottlenecks in Typical SAT Proofs • Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses! • Nothing deeper can be derived before bottleneck clauses  Sequentiality Number of Clauses (log-scale)Derived at that Depth Depth in the proof AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Best-Case Parallelization with k Processors • Given Proof P and k Processors, Best-Case Parallelization of P= Resource Constrained Scheduling Problem with Precedences • Let Mk(P) = makespan of the optimal schedule of P on k processors • Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy • Best-Case k processor speedupon P: Sk(P) = M1(P) / Mk(P) C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 1 1 2 C7 1 C9 1 C’9 1 Example: M1(P) = 8 M2(P) = 5 M3(P) = 4 M4(P) = 4 … depth = 4 3 2 C8 2 C11 2 4 3 C10 3 C12 3 Constraint ID Depth 5 C13 4 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Makespan vs. Proof Depth • Schedule Makespanyields a finer grained lower bound, Sk(P),on best-case parallelization than proof depth • proofdepth(P) : limit of parallelization of P with “infinite” processors • Mk(P)  proofdepth(P) • Mk(P)  proofdepth(P) as k   AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Empirical Findings AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Even Best-Case Parallelization Efficiency is LowBeyond 100 Processors Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k) E.g., 100% = full utilization of k processors  speedup = k AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup B) 128 processors insufficient toachieve a speedup of ~ 90 A) Even with 1024 processors,best-case speedup ~ 50-100 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup (Makes the study of the best-case schedule speedup relevant) Average over a sliding window AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Summary • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Main Findings: • Typical Sequential Refutations Contain Surprisingly Narrow Bottlenecks • Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors, even in the best case of offline ‘schedule speedup’ produced in hindsight • Observed Runtime Speedup with k processors weakly correlates withBest-Case Schedule Speedup of a Sequential Proof produced in hindsight • Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable? Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

Trend Towards Parallelization

Trend Towards Parallelization

Presentation Transcript

Parallelization Issues for MINLP

Parallelization in Molecular Dynamics

Loop Parallelization

Parallelization

Trend Towards Supranationalism : Good? Bad? Somewhere in between?

Cooperative Parallelization

Parallelization and Tuning

HW5: Parallelization

Automatic Parallelization

Parallelization at a Glance

Parallelization of urbanSTREAM

Parallelization of RHSEG

Parallelization of RHSEG

Parallelization Strategies

Shared Memory Parallelization

Parallelization and Grid Computing

Basic Loop Parallelization

Reasons for parallelization

Towards Real Energy-efficient Network Design (TREND)

Optimistic and Pessimistic Parallelization

Parallelization Issues for MINLP

Trend towards ‘open systems’