Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning Wai-Kei Mak Dept. of Computer Science and Engineering University of South Florida Evangeline F.Y. Young Dept. of Computer Science and Engineering The Chinese University of Hong Kong

Outline • Dynamically reconfigurable FPGA • Temporal partitioning = Conventional partitioning? • Temporal logic replication • What? • Why? • How? • Experimental results • Conclusions

Dynamically Reconfigurable FPGA • Store multiple contexts on chip. • Reuse logic blocks and wire segments dynamically. • The contexts stored can correspond to the multiple stages of a large circuit.

Temporal Circuit Partitioning • Temporal partitioning • multiple stages execute sequentially • Spatial partitioning • multiple components execute concurrently

Temporal Logic Replication • Can reduce buffering requirement. • Effectively utilize available slack logic capacity.

Temporal Constraints • For a net n = (v1, {v2, …, vp}), • require s(v1)  s(vj), j=2,…,p, if v1 is a combinational node

Temporal Constraints (Cont’d) • require s(vj)  s(v1), j=2,…,p, if v1 is a flip-flop node

Temporal Partitioning with Replication Problem: Partition given circuit into pre-defined # stages satisfying all temporal constraints. Objective: Minimize buffers required between stages. Proposal: Utilize available slack logic capacity to reduce signal buffering. Solution: An effective 2-step approach.

2-Step Approach Step 1: Compute a temporal partition w/o replication. Step 2: Repeatedly identify the bottleneck stage and apply replication for that stage.

Advantages of 2-Step Approach • Will not replicate unnecessarily. • All temporal constraints are already satisfied when replicating.

Min-Area Min-Cut Replication Let stage i be the bottleneck stage. Min-Cut Replication • Compute a subset of nodes Riin stage i for replication into stage i+1 to maximally reduce the communication cost at stage i. Min-Area Min-Cut Replication • Compute a minimum subset of nodes Ri in stage i for replication into stage i+1 to maximally reduce the communication cost at stage i.

Optimal Solution for Min-Area Min-Cut Replication Let Vi= set of nodes in stage i. Observation 1: The min-cut replication problem can be solved by computing a minimum cut (Vi-Ri,Ri) in stage i. Observation 2: The min-area min-cut replication problem can be solved by computing a minimum cut (Vi-Ri,Ri) in stage i s.t. |Ri| is minimized.

Example A pre-partition: Computing a minimum cut in stage 2:

Example (Cont’d) • ComputedR2 = {j}

Network Modeling • Need to ensure that cut size = buffer requirement • For a net (v1, {v2, …, vp}),

The Case of Limited Slack Logic Capacity • The solution of min-area min-cut replication suffices if slack logic capacity is sufficiently large. • Otherwise, |Ri| exceeds the slack, then use a heuristic to reduce Ri. • Use a repeated max-flow min-cut heuristic to gradually reduce Ri (so cut size is only increased gradually). • H. Yang, D.F. Wong, “Efficient Network Flow based Min-Cut Balanced Partitioning”, ICCAD’94.

Algorithm Input: Stage area bound A. 1. Network modeling for bottleneck stage i. 2. Compute min-cut (Vi-Ri,Ri) s.t. |Ri| is minimized. 3. If |Vi+1|+|Ri|  A, stop and return Ri. 4. Collapse a node in Ri with all nodes in Vi-Ri, goto 2.

Experimental Results

Conclusions • Proposed temporal logic replication to reduce buffering requirement in DRFPGA partitioning. • Presented an effective 2-step approach. • Formulated and optimally solved the min-area min-cut replication problem. • Extended to case of limited slack logic capacity. • In the paper, a new timing-driven temporal partitioning algorithm was introduced to compute pre-partition.

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning