130 likes | 241 Vues
This resource explores various parallel algorithms for finding the greatest common divisor of positive integers using matrix formulations, parallelism, and sublinear time complexity techniques.
E N D
Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. SatishRao
The problem and serial solutions • Given two positive integers (represented in binary), find the large positive integer that divides both • Let n will be the total number of bits in the inputs • Solvable in O(n) divisions and O(n2) bit operations with Euclidean algorithm: • while b ≠ 0: (a, b) ← (b, a mod b) • Matrix formulation: • Schönhage uses this to solve in O(n log2n log logn)
Adding parallelism: Algorithm PM • Brent & Kung (1982): reduce time of each step to O(1) (O(n) bit complexity) • α← n; β ← n (where a,b ≤ 2n and a odd)while b ≠ 0: while b ≡0 (mod 2) { b ← b/2; β← β− 1 } if α ≥β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋else: b ← ⌊(a−b)/2⌋ • Test “a+b≡ 0 (mod 4)” uses only 2 lowest bits of sum, enables effective pipelining of the steps • Number of steps is still linear, requires n processors
Algorithm PM: Example • GCD(105, 30) (a = 11010012, b = 111102) • α ← 7;β ← 7 • b ← 11112; β ← 6 • swap (a ← 11112, b ← 11010012,α ← 6,β ← 7) • a + b ≡ 112 + 012≡ 002 (mod 4) • Compute trailing zeroes of new b and 2 more bits: • b ← ⌊(a+b)/2⌋ = (011112 + …010012)/2= …1100 • b ← …112; β ← 5 • swap (a ← …112, b ← 11112,α ← 5,β ← 6)
Algorithm PM: Example • a =…112, b =11112, α= 5,β= 6 • a + b ≡ 112 + 112≡ 102 (mod 4) • Compute LSB of new b while previous iteration computes next bit of a in parallel: • b' = ⌊(a−b)/2⌋ = …0 • a = …1112 • With another bit of a, can compute next bit of b’ • Final result: a=11112 , b’ = 0
Sublinear time GCD • Kannan et al (1987): Break the problem into n/log n steps, each of which takes log logn time and eliminates log n bits of each input. • CRCW, O(n log logn/log n) time, O(n2 log2n) processors • Idea: instead of a mod b, compute pa mod b for all 0≤p≤n • If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n • Pigeonhole principle gives first log n bits same for at least two p • gcd(a,b) = gcd(b,(p1a mod b)-(p2a mod b)) (ignoring factors ≤ n) • Only need to know first log n bits to identify p1, p2 – can be computed in log logn time
Sublinear time GCD: Example • GCD(143, 221), n = 8 • (0×143) mod 221 = 000…2 • (1×143) mod 221 = 100…2 • (2×143) mod 221 = 010…2 • (3×143) mod 221 = 110…2 • (4×143) mod 221 = 100…2 • (5×143) mod 221 = 001…2 • (6×143) mod 221 = 110…2 • (7×143) mod 221 = 011…2 • (8×143) mod 221 = 001…2 • (4×143) mod 221 − (1×143) mod 221= 13
Sublinear time GCD: Using matrix form • Problem: (p1a mod b)− (p2a mod b) takes log n time • Carry lookahead adder needs log n time to add n-bit numbers • Idea: Delay updates to (a, b) for log n steps, collecting updates in a 2 × 2 matrix T, then compute T(a,b) in log n time • Results in n/log2n phases each taking log n time • Each step performs constant number of log2n-bit operations • Ensures that each step still takes log logn time • Time dominated by cost of the O(n/log n) steps
Getting rid of the log logn factor • Chor and Goldreich (1990): Parallelize Brent & Kung’s PM algorithm instead of Euclidean algorithm • α ← n; β ← nwhile b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • Idea: pack log n iterations of PM algorithm into each phase • Each phase can be done in constant time • Requires O(n/log n) time on O(n1+ε) processors • Best-known algorithm
log n iterations in constant time • α← n; β ← nδ← 0while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β← β − 1δ← δ + 1} if α ≥βδ ≥ 0 { swap(a,b); swap(α, β)δ← −δ } if a+b≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋ • δtogether with the last log n+1 bits of a and b determine the transformation matrix of the next log n iterations • Precompute a table – lookup, do the matrix multiplication • Technicality: Need to treat |δ| ≤ log n and |δ| > log n differently
Multiplication in constant time • How to multiply an n-bit number by a log n bit number in constant time? • Represent both numbers in base 2log n • Precompute a multiplication table for this base • Separate the larger number into a sum of two numbers, each having zero for every other digit, e.g. 1234 = 1030 + 204 • Multiply both by the smaller number; no carries are required • Example: 1234× 7 = (1030+204)×7 = 7210+1428 = 8638 • Finally, Chandra et al (1983) shows we can add two n-bit numbers in O(1) time with O(n2) processors • But this requires concurrent writes
Complexity perspective: an analogy • P = NP? • Open problem, but most practical problems have either been shown to be in P or else NP-hard. • Exceptions: integer factorization, graph isomorphism • If integer factorization is NP-complete, NP = coNP • If graph isomorphism is NP-complete, PH collapses to 2nd level • NC = P? • Open problem, but most practical problems have either been shown to be in NC or else P-hard. • Exceptions: GCD • What would GCD being P-complete imply?