400 likes | 585 Vues
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types. Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical Engineering Texas A&M University College Station, Texas 77845, USA. Outline. Introduction O(b 2 n 2 ) Algorithm
 
                
                E N D
An O(bn2) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical Engineering Texas A&M University College Station, Texas 77845, USA
Outline • Introduction • O(b2n2) Algorithm • New O(bn2) Algorithm • Experimental Results • Extension • Conclusion
Introduction • Buffer insertion and sizing is one of the most effective method for reducing interconnect delay. Saxena, et al. [TCAD 2004]
Introduction(cont.) • Modern libraries contain hundreds of different buffers with different characteristics. • Polarity, input capacitance, driving resistance, intrinsic delay, noise margin, power, area, etc. • Buffer library size has quadratic effect on running time in traditional algorithms. • With such large number of buffers and buffer types, fast algorithms for buffer insertion are crucial for timing closure.
Problem Formulation • Given: A routing tree, n possible buffer positions, sink capacitances and required arrival times (RAT), a buffer library, wire resistance and capacitance. • Delay model: Elmore delay for interconnect and linear delay model for buffers. buffer library s2 sinks s1 s0 s3 s4 source possible buffer positions
Maximum Slack Problem • Find: Where to insert buffers so that the slack at the source Q(s0) is maximized. = - Q ( s ) min { RAT ( s ) delay ( s , s )} 0 0 i i > 0 i s2 s1 s0 s3 s4 without buffer, Q(S0)= – 50 ps
Maximum Slack Problem • Find: Where to insert buffers so that the slack at the source Q(s0) is maximized. = - Q ( s ) min { RAT ( s ) delay ( s , s )} 0 0 i i > 0 i s2 s1 s0 s3 s4 with 2 buffers, Q(S0)= 100 ps
Previous Research • Maximum Slack • van Ginneken [ISCAS 90]: O(n2) time and space, where n is the number of buffer positions. • Lillis, Cheng and Lin [TCAS 96]: O(b2n2) time and space for b buffer types. • Shi and Li [DAC 03]: O(nlogn) time for 2-pin nets, O(nlog2n) time for multi-pin nets. O(nlogn) space. • Minimum Buffer Cost (Area, Power, etc.) • Lillis, Cheng and Lin [TCAS 96]: pseudo-polynomial time algorithm. • Shi, Li and Alpert [ASPDAC 04]: buffer cost minimization is NP-hard if b is a variable.
Outline • Introduction • O(b2n2) Algorithm • New O(bn2) Algorithm • Experimental Results • Extension • Conclusion
Dynamic Programming • Each candidate solution of a sub-tree is represented by a (Q, C) pair, where Q is slack and C is downstream capacitance. • For any two candidates A1 and A2 of the same sub-tree, if Q(A1)Q(A2) and C(A1)C(A2), then A1 is redundant. • O(b2n2) time dynamic programming algorithm (Lillis-Cheng-Lin) • For b buffer types, the number of candidates is at most bn+1 • For a wire, update (Q, C) value for every candidate in O(bn) time • For a buffer position, add b new candidates in O(b2n) time • For a branch point, merge two sets of candidates in O(bn1+bn2) time
Dynamic Programming • Each candidate solution of a sub-tree is represented by a (Q, C) pair, where Q is slack and C is downstream capacitance. • For any two candidates A1 and A2 of the same sub-tree, if Q(A1)Q(A2) and C(A1)C(A2), then A1 is redundant. • O(bn2) time dynamic programming algorithm (This paper) • For b buffer types, the number of candidates is at most bn+1 • For a wire, update (Q, C) value for every candidate in O(bn) time • For a buffer position, add b new candidates in O(bn) time • For a branch point, merge two sets of candidates in O(bn1+bn2) time
Data Structure: Linked List • Use linked list to store non-redundant candidates • Sorted in decreasing Q and decreasing C order • Each entry also contains the list of buffer positions Better Slack Less Capacitance (Q1,C1) (Q2,C2) (Q3,C3)
Best Candidates • For each buffer Bi, R(Bi) is buffer driver resistance, C(Bi) is buffer input capacitance, and t(Bi) is buffer intrinsic delay. Label buffers according to non-decreasing order of resistance R(B1)R(B2)  … R(Bb). • For each buffer type Bi • Define the best candidate ias the candidate that maximizes slack among all candidates after Bi is inserted. • The new slack is Q(i)–R(Bi)C(i)–t(Bi). • Define the new candidate ias the candidate formed by i with buffer type Bi. • How to find all best candidates quickly is the key addressed in this paper.
1 1 2 2 3 3 Example • Three buffer types • R(B1)=1, C(B1), t(B1) • R(B2)=3, C(B2), t(B2) • R(B3)=5, C(B3), t(B3) Insert B2: (6t(B2), C(B2)) (7t(B2), C(B2)) (6t(B2), C(B2)) (1t(B2), C(B2)) (3t(B2), C(B2)) Insert B1: (16t(B1), C(B1)) (15t(B1), C(B1)) (12t(B1), C(B1)) (5t(B1), C(B1)) (5t(B1), C(B1)) Candidates (Q, C): (21, 5) (19, 4) (15, 3) (7, 2) (6, 1) Insert B3: (4t(B3), C(B3)) (1t(B3), C(B3)) (0t(B3), C(B3)) (3t(B3), C(B3)) (1t(B3), C(B3)) Best candidate for B1 is 1, and the new candidate is 1 Best candidate for B2 is 2, and the new candidate is 2 Best candidate for B3 is 3, and the new candidate is 3
Outline • Introduction • O(b2n2) Algorithm • New O(bn2) Algorithm • Experimental Results • Extension • Conclusion
(Q, C) Plane A1 (21, 5) A2 (19, 4) • Non-redundant (Q, C) list is a monotonically decreasing sequence • As resistance is added, Q values change A3 (15, 3) A4 (7, 2) A5 (6, 1)
R(B1) = 1, Q=Q–R(B1)*C A1 (21-5, 5)
R(B2) = 3, Q=Q–R(B2)*C A1 (21-15, 5)
R(B3) = 5, Q=Q–R(B3)*C A1 (21-25, 5)
Best Q for each R Best Q Values Move to Left
Best Candidatesare in Decreasing Order of C • Lemma 1: C(1) C(2) …C(b) • Not enough for an O(bn) algorithm to find all best candidates. • Need global search 1 2 3
Convex Pruning • Convex pruning prune candidates like A4 A1 A2 A3 A4 Pruned A5
Before Convex Pruning Non-Convex
After Convex Pruning 1 2 3
Convex Hull • After convex pruning, remaining list is a convex hull • Lemma 3: Best candidates must be on the convex hull • A candidate is on the convex hull if and only if there exists an resistance R such thatwhen R is added, this candidate gives maximum Q • Lemma 4: On convex hull, if Ai gives maximum Q among neighboring candidates, Aigives maximum Q among all candidates • The slope (Qi Qj)/(CiCj) between candidates Ai and Aj (i>j) is the extra resistance value that makes Aj to have better slack than Ai • On convex hull, slopes are in sorted order • Local Optimal  Global Optimal
Local Optimal  Global Optimal A1 A2 • For any R(Bi), if A2 gives better slack than A1 and A3, then A2is the best candidate for Bi. A3 A5
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can perform convex pruning in linear time Q C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can perform convex pruning in linear time Q C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can perform convex pruning in linear time Q C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can perform convex pruning in linear time Q C
O(bn) O(bn) O(bn) O(blogb) New Subroutine for Adding Buffer • At each buffer position, given the (Q, C) list N in decreasing C order and the buffer library, where R(B1)  R(B2) … R(Bb). • Generate new (Q, C) list A1, A2, …, with Convex Pruning • Generate new candidates 1 , 2 … with the following loop • Initialize j = 1, then for i = 1 to b do If Aj gives better slack than Aj+1 then Generate new candidates i for buffer Bi Q(i ) = Q(Aj)–R(Bi)C(Aj)–t(Bi) C(i) = C(Bi) else j = j + 1 • Sort i s in non-increasing C order. • Insert i s into original list N
O(bn2) Algorithm • Dynamic programming • For b buffer types, the number of candidates is at most bn+1 • For a wire, update (Q, C) value for every candidate in O(bn) time • For a buffer position, add b new candidates in O(bn) time • For a branch point, merge two sets of candidates in O(bn1+bn2) time • Total complexity is O(bn2).
Outline • Introduction • O(b2n2) Algorithm • New O(bn2) Algorithm • Experimental Results • Extension • Conclusion
Speedup over O(b2n2) Algorithm net1: 337 sinks net2: 1944 sinks net3: 2676 sinks
Speedup vs. Buffer Positions Buffer Library Size: 64
Outline • Introduction • O(b2n2) Algorithm • New O(bn2) Algorithm • Experimental Results • Extension and Conclusion
Extension to Min Buffer Cost • Buffer cost is associated with area and power • Find a solution satisfying the slack requirement and at the same time, has minimum buffer cost • Each candidate solution is represented by a (Q, C, W) triple, where Q is slack, C is capacitance, and W is buffer cost • Worst-case NP-hard • Our algorithm can reduce the operation of adding a buffer from O(bN) to O(N), where N is the number of non-redundant candidates
Conclusion • New O(bn2) algorithm for optimal buffer insertion with b buffer types • Best candidates must be in decreasing order of C • Best candidates must be on the convex hull • Local optimal  global optimal • Applicable to cost minimization and inverting buffer types