Solving the Greatest Common Divisor Problem in Parallel

Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. SatishRao

The problem and serial solutions • Given two positive integers (represented in binary), find the large positive integer that divides both • Let n will be the total number of bits in the inputs • Solvable in O(n) divisions and O(n2) bit operations with Euclidean algorithm: • while b ≠ 0: (a, b) ← (b, a mod b) • Matrix formulation: • Schönhage uses this to solve in O(n log2n log logn)

Adding parallelism: Algorithm PM • Brent & Kung (1982): reduce time of each step to O(1) (O(n) bit complexity) • α← n; β ← n (where a,b ≤ 2n and a odd)while b ≠ 0: while b ≡0 (mod 2) { b ← b/2; β← β− 1 } if α ≥β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋else: b ← ⌊(a−b)/2⌋ • Test “a+b≡ 0 (mod 4)” uses only 2 lowest bits of sum, enables effective pipelining of the steps • Number of steps is still linear, requires n processors

Algorithm PM: Example • GCD(105, 30) (a = 11010012, b = 111102) • α ← 7;β ← 7 • b ← 11112; β ← 6 • swap (a ← 11112, b ← 11010012,α ← 6,β ← 7) • a + b ≡ 112 + 012≡ 002 (mod 4) • Compute trailing zeroes of new b and 2 more bits: • b ← ⌊(a+b)/2⌋ = (011112 + …010012)/2= …1100 • b ← …112; β ← 5 • swap (a ← …112, b ← 11112,α ← 5,β ← 6)

Algorithm PM: Example • a =…112, b =11112, α= 5,β= 6 • a + b ≡ 112 + 112≡ 102 (mod 4) • Compute LSB of new b while previous iteration computes next bit of a in parallel: • b' = ⌊(a−b)/2⌋ = …0 • a = …1112 • With another bit of a, can compute next bit of b’ • Final result: a=11112 , b’ = 0

Sublinear time GCD • Kannan et al (1987): Break the problem into n/log n steps, each of which takes log logn time and eliminates log n bits of each input. • CRCW, O(n log logn/log n) time, O(n2 log2n) processors • Idea: instead of a mod b, compute pa mod b for all 0≤p≤n • If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n • Pigeonhole principle gives first log n bits same for at least two p • gcd(a,b) = gcd(b,(p1a mod b)-(p2a mod b)) (ignoring factors ≤ n) • Only need to know first log n bits to identify p1, p2 – can be computed in log logn time

Sublinear time GCD: Example • GCD(143, 221), n = 8 • (0×143) mod 221 = 000…2 • (1×143) mod 221 = 100…2 • (2×143) mod 221 = 010…2 • (3×143) mod 221 = 110…2 • (4×143) mod 221 = 100…2 • (5×143) mod 221 = 001…2 • (6×143) mod 221 = 110…2 • (7×143) mod 221 = 011…2 • (8×143) mod 221 = 001…2 • (4×143) mod 221 − (1×143) mod 221= 13

Sublinear time GCD: Using matrix form • Problem: (p1a mod b)− (p2a mod b) takes log n time • Carry lookahead adder needs log n time to add n-bit numbers • Idea: Delay updates to (a, b) for log n steps, collecting updates in a 2 × 2 matrix T, then compute T(a,b) in log n time • Results in n/log2n phases each taking log n time • Each step performs constant number of log2n-bit operations • Ensures that each step still takes log logn time • Time dominated by cost of the O(n/log n) steps

Getting rid of the log logn factor • Chor and Goldreich (1990): Parallelize Brent & Kung’s PM algorithm instead of Euclidean algorithm • α ← n; β ← nwhile b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • Idea: pack log n iterations of PM algorithm into each phase • Each phase can be done in constant time • Requires O(n/log n) time on O(n1+ε) processors • Best-known algorithm

log n iterations in constant time • α← n; β ← nδ← 0while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1δ← δ + 1} if α ≥βδ ≥ 0 { swap(a,b); swap(α, β)δ← −δ } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • δtogether with the last log n+1 bits of a and b determine the transformation matrix of the next log n iterations • Precompute a table – lookup, do the matrix multiplication • Technicality: Need to treat |δ| ≤ log n and |δ| > log n differently

Multiplication in constant time • How to multiply an n-bit number by a log n bit number in constant time? • Represent both numbers in base 2log n • Precompute a multiplication table for this base • Separate the larger number into a sum of two numbers, each having zero for every other digit, e.g. 1234 = 1030 + 204 • Multiply both by the smaller number; no carries are required • Example: 1234× 7 = (1030+204)×7 = 7210+1428 = 8638 • Finally, Chandra et al (1983) shows we can add two n-bit numbers in O(1) time with O(n2) processors • But this requires concurrent writes

Complexity perspective: an analogy • P = NP? • Open problem, but most practical problems have either been shown to be in P or else NP-hard. • Exceptions: integer factorization, graph isomorphism • If integer factorization is NP-complete, NP = coNP • If graph isomorphism is NP-complete, PH collapses to 2nd level • NC = P? • Open problem, but most practical problems have either been shown to be in NC or else P-hard. • Exceptions: GCD • What would GCD being P-complete imply?

Questions?

Solving the Greatest Common Divisor Problem in Parallel

Solving the Greatest Common Divisor Problem in Parallel

Presentation Transcript

Problem Solving in the Classroom

Greatest Common Divisor GCD Programming

Solving the Problem

Problem Solving in Common Lisp

Greatest Common Divisor

Problem Solving

Problem solving

Problem solving in the sciences

The Greatest Common Divisor

Solving the Greatest Common Divisor Problem in Parallel

Greatest Common Divisor --- 最大公约数

A Greatest Common Divisor (GCD) Processor

Greatest Common Divisor

“ Solving the Independent Set Problem using the Exhausted Search Algorithm in parallel. ”

Greatest Common Divisor, Least Common Multiple and Slide / Hockey Stick Methods

A Greatest Common Divisor (GCD) Processor

Solving the Problem

Greatest Common Divisor

A Greatest Common Divisor (GCD) Processor

SOLVING THE PROBLEM

Collaborative Problem Solving/Planning (Problem Solving for the Problem Solving Team)

Problem Solving and Programming – Problem Solving