180 likes | 298 Vues
This document discusses the impact of scheduling interactions and register usage on instruction-level parallelism (ILP). It highlights the benefits of reorganizing expression trees with associative and commutative operations to decrease critical path length, thereby improving loop performance. Utilizing techniques like Huffman coding, the approach aims for a balanced tree structure that preserves necessary intermediate values while optimizing overall computation efficiency. Results show varied improvements across programs, emphasizing the need for tailored strategies in compiler design.
E N D
Last Time Interactions of scheduling and register usage Today Interactions of scheduling and instruction level parallelism 380C CS 380C
Shape of Expressions • Proebsting & Fischer assume a fixed expression tree • Hunt et al. reorganize commutative and associative operations in expression trees to • Increase ILP • Decrease critical path length • Group constants
Motivation • Long pipelines and fine grain parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism. • Decreasing critical path length improves loop performance • Grouping constants improves constant propagation.
Example • Let M denote intermediate values we need to preserve. • Let I denote associative operations whose intermediate values we do not need to preserve.
Example • What should we do to balance this tree?
Baer & Bovet: Balance Subtree Approach • Given a tree of associative and commutative operators, and other operators • Rearrange the tree to make it more balanced • Caveats • Preserve intermediate values in the expression tree that are used elsewhere • Preserve subtrees rooted by non-associative operations
Problem - unbalanced • Although each preserved node has a balanced sub-tree, the whole tree isn’t very balanced. • Note that preserved nodes with many leaves can be closer to the root.
Solution – Huffman Coding • Give constants weight 0 • Give other leaves weight 1 • Give interior nodes weight by summing their leaves • Put them all in a sorted worklist • Take two lowest weight nodes out of the worklist until the worklist is a singleton • Combine them in a subtree • Weigh this interior node by summing its leaves, insert it in the worklist • Weigh preserved nodes by summing subtrees • Guarantees optimally balanced tree
Results • Mixed • Improves a few programs by a lot, but not a lot of programs on TRIPS simulator • Huffman minimizes the sum of the tree • Baer and Bovet minimize the length of the critical path • In practice, they often attain the same result for expression reduction • For software fanout trees, Huffman seems to tolerate unknown latencies through the program better than Hartley and Casavant, which minimizes the length of the critical path given non-unit weights
Summary • Reorganize trees of commutative and associative operations. • Use Huffman coding to produce an overall balanced tree • Improves ILP • Decrease critical path length • Group constants
Next Time • P. Briggs, Register Allocation via Graph Coloring, PhD dissertation, Rice University, April 1992, Chapters 1, 2, 3, 6, 7, 8 & 9 • Skim and/or cherry pick depending on your interests