Program Optimization

Program Optimization CSSE 514 - Programming Methods 4/10/01

Program Optimization • Overview • When to optimize • Design levels • Methodology • Tradeoffs • Errors and pitfalls • Common sources of inefficiency • Example: code tuning • Initial program • Logical optimizations • Removing common subexpressions • Strength reduction • Data structure transforms • Code motion • Lazy evaluation • Summary Reference: Jon Bentley, Writing Efficient Programs, Prentice-Hall, 1982

Overview • Goal is to optimize (minimize) • Time • Runtime • Response time • Space • Secondary storage • Main memory • Optimization strategies • Cost measured in effort, time, risk • Tradeoffs often involved • Space vs. time • Increased complexity • Optimization difficult because • Many strategies, tradeoffs • No general algorithm • Focus--must decide • What to optimize--space, time • Where to optimize

When to Optimize • When the code doesn't give adequate time/space performance: • Early days of computing • Early days of desktop computers • Specialized computers (military, space) • Ubiquitous computing (as tiny computers are embedded in credit cards and other objects) • When complexity can be reduced • If performance is not an issue, aim for clarity, simplicity, maintainability • Review your first-cut code to see if it can be transformed into something simpler • Note: Solving the Year 2000 Problem can be viewed as a kind of optimization • Uses same strategy as code-tuning:semantics-preserving source-to-source transformations • Uses same tools as automatic code generation and code optimization (e.g. Refine)

Design Levels • Six Design Levels (Bentley) • Program design (System structure) -- decomposition into modules • Module and routine design (Intramodular structure) -- choice of data structures and algorithms • Code tuning (Writing efficient code) -- source to source transformations • Code compilation (Translation into machine code) -- compiler may outperform human • Operating system interaction (System software) -- changing, tuning, bypassing operating system (OS) or database system (DBMS) • Hardware -- modify or purchase: microcode, faster CPU, DB machine, array processor, floating point hardware, ASICs (Application Specific Integrated Circuits) • Strategy -- optimize where change is possible and payoff is highest • Note: a 10-fold (or greater) gain may be possible at each level for a millionfold improvement

Methodology • Design and implement for correctness and clarity • Include useful documentation • Use modularity for maintenance • Identity performance goals, and factor these down to individual modules • If performance unacceptable then • Monitor program to see where bottlenecks (time, space) are -- instrument code or apply tools • Revise data structures and algorithms in critical modules (level 2) • Consider redesign (level 1) • Consider solution by purchase (optimizing compiler, faster DBMS or OS, faster hardware) • If performance still not OK then • Apply source-to-source transformations (level 3) • retain original code as documentation • If still not OK then • Try lower level designs: • Recode critical modules in assembly language • Modify/tune/bypass OS or DBMS

Tradeoffs • Correctness and clarity go together • Clear  well-structured, simple • Clear  provable, maintainable • Top-level strategies • Minimize interfaces, data flows • Define cohesive modules • Avoid over specification (is suboptimal OK?) • 3-way gains possible • Lower level strategies • Gains in one direction often offset by losses in other directions • Most optimization strategies increase complexity • Use correctness-preserving transforms (risk of errors due to lack of mechanical support for transformations) • Family of programs may be preferred approach (leave the tradeoffs to the user)

Errors and Pitfalls • Programmers often make false assumptions: • Fewer lines means less code space or execution time (false!) • Certain operations are faster or smaller than others • May be dependent on language, compiler, or machine • Code tuning always pays off (false!) • Compiler may do it for you • Worse, your code tuning may defeat the compiler's more efficient optimizations -- you end up losing both clarity and efficiency • A pitfall: optimizing performance as you go • May introduce unneeded complexity • Micro-optimizations may cause global optimizations to be overlooked or made impossible • Detracts from other goals: correctness, readability, programmer productivity • May have little effect in final program: better to wait until program is complete

Common Sources of Inefficiency • I/O: keep small files in memory, avoid using intermediate files • Formatted printing routines: may make code bigger and slower • Floating-point operations: when software used, one statement pulls in a whole library • Paging: minimize page faults by • Avoiding scattered access to memory • Keeping control flow in small regions

Example: Code Tuning • Traveling salesman problem • Input -- n points on a plane • Output -- a minimal-length tour -- visit each point once • Level 2 analysis • Optimal, but infeasible algorithm examines all possible tours -- O(n!) • Feasible (but suboptimal) heuristic -- start anywhere and repeatedly advance to nearest unvisited point -- O(n2) • No feasible and optimal solution is known -- choose O(n2) solution and settle for usually near minimal results

Initial Program • Data • Algorithm begin for i in 1..n [initialize visited] loop visited(i) := False end thisPt := n [start at n] visited(n) := True for i in 2..n loop [set closePt = nearest unvisited point to thisPt] [output thisPt "->" closePt] thisPt := closePt [advance] visited(closePt) := True end end

Logical Optimizations • Code for inner loop (optimize performance [set closePt = nearest unvisited point] closeDist := maxreal for j in 1..n loop if not visited(j) and dist(thisPt, j) < closeDist then closeDist := dist(thisPt, j) closePt := j end end • Transform 1: not visited(j) => unvisited(j) • Effort: change declaration, 3 statements • Time: reduced slightly (potentially significant) • Space, complexity: reduced slightly • Transform 2: if a and b ... => if a and then b ... [Ada] if (a) { if (b) } ... [Java] • Effort: small, local • Time: much reduced when t(a) << t(b) and a is often false • Space, complexity: small increase • Transforms 1 & 2 are independent -- you can do in any order

Removing Common Subexpressions • Result of transforms 1 & 2 on inner loop if unvisited(j) thenif dist(thisPt, j) < closeDist then closeDist := dist(thisPt, j) closePt := j end end • Transform 3: p(e) => v := e p(v) • Rule: if expression e appears more than once and has no side effects and variables not altered between evaluations then compute v := e [v is mnemonic variable] replace e in p by v end • Effort: may be hard to check for side effects • Time: decreases (small in this case) • Space: less code, extra word of data • Complexity: decreases • Good compiler may eliminate some common subexpressions

Strength Reduction • Result of transform 3 with e=dist(thisPt, j) if unvisited(j) then thisDist := dist(thisPt, j) if thisDist < closePt then closeDist := thisDist closePt := j end end • Transform 4: replace dist(i, j) = sqrt(e) by distSqrd(i, j) = e where e = sqr(ptArr(i).X - ptArr(j).X) + sqr(ptArr(i).Y - ptArr(j).Y) • Rule: exploit algebraic identities, in this case, x2 < y2 <=> x < y for x, y > 0 • Effort: minimal • Time: greatly decreased • Space: slight decrease • Complexity: little change

Data Structure Transforms • Transform 5: replace bitmap unvisited with integer array unvisited where unvisited(1..highPt) = unvisited points and unvisited(highPt + 1..n) = visited points in order • Effort: large rewrite of many lines • Time: small decrease (halve inner loop cycles, remove if unvisited(j) • Space: large increase  maxPts words • Complexity: increase but clear loop invariant -- introduces a level of indirection begin for i in 1..n loop unvisited(i):=i unvisited(i):=True end thisPt:=unvisited(n) thisPt:=n highPt:=n-1 unvisited(n):=False while highPt>0 <= for j in 2..n loop closeDist:=maxreal for i in 1..highPt for j in 1..n loop [compare distance] if unvisited(j) end thisPt:=unvisited(closePt) thisPt:=closePt swapUnvisited(closePt, highPt) highPt:=highPt-1 unvisited(closePt):=False end

Code Motion • Transform 6: distSqrd(...) => inline code • Effort: small to large • Time: save code overhead (small) • Space: usually increases (here it decreases) • Complexity: increases (less readable) • Transform 7: move invariant expressions outside loops • Effort: small (but must verify invariance) • Time: decrease • Space: slight increase (not if statements are moved) • Complexity: little change (may reduce locality) thisX:=ptArr(thisPt).X thisY:=ptArr(thisPt).Y closeDist:=maxreal for i in 1..highPt loop thisDist:=sqr(ptArr(unvisited(i)).X - thisX) + sqr(ptArr(unvisited(i)).Y - thisY) if thisDist < closeDist then closePt:=i closeDist:=thisDist end end

Lazy Evaluation • Transform 8: compute xDist first, yDist if needed • Effort: small • Time: large decrease • Space: slight increase • Complexity: increase thisDist:=sqr(ptArr(unvisited(i)).X - thisX) + sqr(ptArr(unvisited(i)).Y - thisY) if thisDist < closeDist then ... thisDist:=sqr(...X - thisX) if thisDist < closeDist * then thisDist:=thisDist + sqr(...Y - thisY) if thisDist < closeDist then ... * if xDist >= closeDist we avoid computing yDist

Summary transform time/n2 savings T1: remove not visited T2: short-circuit and 47.0 T3: remove common sub- expression 45.6 1.4 *T4: remove square root 24.2 21.4 T5: convert boolean array to pointer array 21.2 3.0 *T6: put proc in line (helped by T3 & T4) *T7: move code from loop (T7 requires T6) 14.0 7.2 *T8: delay computing yDist (T8 requires T6) 8.2 5.8 Note: Choice of transform analogous to choice of chess move -- one transform makes other transforms possible * Best transforms

Program Optimization