CS137: Electronic Design Automation
270 likes | 396 Vues
This document summarizes key findings in electronic design automation related to parallel computation, emphasizing the NC (Nick's Class) complexity class. It covers techniques for adding N-bit numbers, sorting elements, evaluating finite state machines, and computing instructions in polylogarithmic time using polynomial hardware. The discussion extends to transitive closure, all pairs shortest paths, and the relationship between NC and other complexity classes, highlighting potential limitations in fully parallelizing polynomial-time algorithms and exploring foundational theories such as P=NP.
CS137: Electronic Design Automation
E N D
Presentation Transcript
CS137:Electronic Design Automation Day 13: February 8, 2006 NC
Things we’ve seen • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware
Complexity Class • What are the complexity classes for parallelism? • Suggested not all tasks have perfect area-time tradeoffs • How well can we parallelize problems? • Differentiate things which parallelize well • …things that don’t parallelize so well
If we use enough space… • Exponential space: P=NP • NTM runs in time f(N) • Use 2f(N) PEs • Each evaluates with a different choice sequence • Prefix on completion • Solve problem in f(N) time • Of course, ignores 3-space wire delays
So, we really want to know, how fast something can be run with a “reasonable” number of processor (amount of hardware)
NC • Class of problems that can be: • Computed in polylogarithimic time • Polynomial in logk(N) • E.g. 3log2(N)+2log(N)+234 • Using polynomial hardware • NC for Nick’s Class • Named after Nick Pippenger
All in NC • Can do • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware
Open Question • NC ?= P • Are all Polynomial Time algorithms computable in parallel • Polylog time • Polynomial processors • Suspected they are not • More at end
Transitive Closure • Given a Graph: G=(V,E) • Compute G*=(V,E*) • E* contains an edge e=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Transitive Closure NC
Basic Sequential Algorithm • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • M2n[i,j]=OR(all k)(Mn[i,k] & Mn[k,j]) • MN represents GN=G* • Compute in log steps • O(N3log(N))
Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors in NC • [this construct may be weak?]
All Pairs Shortest Paths • Given a Graph: G=(V,E) • Edge weight on each edge eE • Compute G’=(V,E’) • E’ contains an edge e’=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Edge weight is shortest path from Vi to Vj in G • All Pairs Shortest Path NC • Slight modification on transitive closure
Basic Sequential Algorithm • As before • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • Change • OR to MIN • & to + • So • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • Becomes: M2[i,j]=MIN(all k)(M[i,k] + M[k,j]) • MN represents GN=G’
(Same) Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors in NC • [this construct may be weak?]
NL • Complexity class • Computations that can be computed using logarithmic space on a Non-Deterministic Turing Machine • Similarly L • logspace on Deterministic TM • Addition L • Certainly: LNL
NL NC • Theorem from Borodin: • If A is accepted by a NDTM using space S(n)log2(n), • then there is a d>0 such that: DEPTHA(n)d×S(n)2. • [Depth here = circuit depth = time] • For NL • S(n)=log2(n) Depth(n)d×log2(n)
Borodin Construction (Idea) • State is bounded • Can construct the graph of all states • This will only take polynomial hardware • Compute transitive closure on graph • O(log2(N)) • Use associative reduce to extract solution • O(log(N))
Borodin States • What states can the NDTM be in? • At most sS(N) values on tape • s=size of symbol set • Head of TM at most S(N) positions • q states for FSM • N locations for input tape head • Total: states=N×q×S(N)×sS(N) • For S(N)=log(N) • N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • Number of states polynomial in N
Build Graph • Construct graph |V|=# states • M[i,j]=1 iff move from configuration i to j • If Vi is a state that corresponds to the input head being on square k • M[i,j] “enabled” iff move from i to j only when kth input is 1 and inputs is 1. • M[i,j] “enabled” iff move from i to j only when kth input is 0 and input is 0. • Can just be a set of AND’s initially setting up the initial connectivity matrix M
Transitive Closure • Transitive Closure with O(|V|3) PEs • Still polynomial in N • |V|=N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • O(|V|3) O(N3(log(s)+2)) • In log2(N) time • O(log2(|V|)) O( [log(N (log(s)+2))]2) • O([log(s)+2]2×log2(N))=O(log2(N))
Extract Result • OR reduce on Reachable states • Can reach an accepting state for TM? • Therefore: NL NC
Converse Holds • Borodin • If A is in DEPTH((S(n)) for S(n)log(n) • Then A is in DSPACE(S(n)) • Recursive evaluation of gate value • w/ compact stack representation • Specialized for S(n)=log(n) • If A is in NC, then A is in L • NC L • Know LNL … just showed NLNC • NL = NC
Context Free Languages • Can recognize all context free languages in NC • PDA NC
P-Complete • There are languages that are P-Complete • i.e. if could show these were in NC • Then would show NC=P • E.g. TM simulation
In NC FA PDA L NL Unknown: P=NC (P=NL) Complexity Roundup
Physical Realism • All rely on reductions in log(N) time • With 3D space, speed of light • …there are no log(N) time reductions • Maybe notion of 3-space parallelizable? • Run in O(N1/3) time • O(N) processors • Cannot talk to more than O(N) in O(N1/3) time
Admin • Friday/Monday:?? • Q: requests – what’s missing? • Project: two things due end of next week • Sequential implementation • Proposed plan of attack