Transitioning from Algorithms to Software

Transitioning from Algorithms to Software Thomas Kue Southern Arkansas University Dr. Ernst Leiss University of Houston REU Summer 2011

Outline • VMM and the Memory Hierarchy Problem • The First Experiment • The Improved Algorithm • Results • Retesting the Algorithm • Revisiting Example: Adding Matrices • A Basic Algorithm • Adding Matrices: Summary • Providing Implementation • Results • Conclusion

VMM and the Memory Hierarchy Problem • Scientific computing often requires massive data sets • Virtual Memory Manager – divides program into ‘pages’ • Out of core program – program in which there is transfer of data between memory and hard disk • The goal: reduce frequency of data transfer to and from hard disk

The First Experiment • We start with Algorithm 1: • Premise: • M is a zero n×n matrix larger than main memory. • A data item is a triple [i,j,x], where i and j (1≤i,j≤n) are row and column indices and x is a real value to be added to M • Each data item is randomly assigned • Because of this randomness, locality is poor • Algorithm Analysis shows the number of page swaps for Algorithm 1 is 9m/10, where m is the number of data items[Leiss, 2007]. while more input do{ read a triple [i,j,x]; M[i,j] := M[i,j] + x; }

An Improved Algorithm • An improved algorithm, Algorithm 1’, is proposed • M is divided into 10 even sections (each section is size n/10) • Subsequences, St, hold all data items corresponding to section t=2,3,…10. • Algorithm analysis shows that the number of block transfers is 9m/(5B), where B is the block size[Leiss, 2007]. • allocate M1 in the available main memory and initialize it to 0; • set the sequence St to empty, for all t=2,…,10; • while more input do{ • read a triple [i,j,x]; • if [i,j] is in M1 then • M[i,j] := M[i,j] + x; • else{ • determine t such that [i,j] is in Mt; • append [i,j,x] to the sequence St; • } • } • for t:=2 to 10 do{ • write Mt-1 to disk; • allocate Mt in the available main memory and initialize it to 0; • while more input in St do{ • read a triple [i,j,x] from St; • M[i,j] := M[i,j] + x; • } • }

Results • Written in C++ using GCC compiler. Table 1 shows the execution times of Algorithm 1 and Algorithm 1’. • The program for Algorithm 1’ crashed for n≥1600 due to excessive memory. • An effort to translate the C++ code into both Java and C languages yielded similar results. • Because n was small such that the VMM was not invoked (i.e. in-core) we were unable to prove nor disprove the improvements of the asserted improved algorithm, Algorithm 1’. • The data from this particular experiment was unable to be used.

Algorithm 1 Retested • After the previous experiment failed to run properly, we set out to show the real problem of the original algorithm when transitioning from algorithms to software. • We set m=16000 × 16000 × 100 • In setting m constant, each experiment run will process the same amount of data items and should produce similar timings. • Seemingly good algorithms may not provide properly efficient implementation as anticipated • But do the proposed improved algorithms actually show improvements in implementation?

Revisiting Example: Adding Matrices • We have two matrices, A and B, of size n2 • n is large such that matrix cannot fit within main memory • VMM is invoked and paging occurs A Basic Algorithm for i := 1 to n do for j := 1 to n do C[i,j] = A[i,j] + B[i,j]

Transitioning Into Software • Because memory is linear, these 2-dimensional matrices must be mapped into the 1-dimensional memory • Row Major Mapping • Column Major Mapping A11 A12 … A1n A21 A22 … A2n . . … . . . … . . . … . . . … . An1 An2 … Ann Memory A11 A12 … A1n … Memory A11 A21 … An1 …

Problems Transitioning Into Software • Assume column major mapping • Assume one column = one page • Assume memory can hold three pages (1 from each matrix) Our basic algorithm: for i := 1 to n do for j := 1 to n do C[i,j] = A[i,j] + B[i,j] A11 A12 … A1n A21 A22 … A2n . . … . . . … . . . … . . . … . An1 An2 … Ann Memory A11 A21 … An1 A12 A22 … An2 • Total # of page swaps: 3n2

Problems Transitioning into Software • The interaction between the algorithm and the VMM plays an important role in software performance Modifying the algorithm: for j := 1 to n do for i := 1 to n do C[i,j] = A[i,j] + B[i,j] A11 A12 … A1n A21 A22 … A2n . . … . . . … . . . … . . . … . An1 An2 … Ann Memory A11 A21 … An1 • Total # of page swaps: 3n

Adding Matrices: Summary • Using the first algorithm produces bad software of I/O complexity 3n2 • Using the second algorithm produces a good software that is n times faster than the first • Achieving the goal: • Restructuring the program to reduce disk I/O • Our basic algorithm: • for i := 1 to n do • for j := 1 to n do • C[i,j] = A[i,j] + B[i,j] Modifying the algorithm: for j := 1 to n do for i := 1 to n do C[i,j] = A[i,j] + B[i,j]

Providing Implementation • Both algorithms were implemented in C code using the GCC compiler • We refer to the column-traversing algorithm as Algorithm 2, and the row-traversing algorithm as Algorithm 2’. • The C language uses row-major mapping so the slower algorithm will be the one that traverses the matrix via columns (i.e. Algorithm 2’ will perform better than Algorithm 2) • Algorithm 2’: • for i := 1 to n do • for j := 1 to n do • C[i,j] = A[i,j] + B[i,j] Algorithm 2: for j := 1 to n do for i := 1 to n do C[i,j] = A[i,j] + B[i,j]

Adding Matrices: Results • The results show that Algorithm 2’ provides better performance for all n≥500. • For Algorithm 2, the time does not grow linearly for any n (i.e. not performing as it should). • We improved the performance of Algorithm 2 by a factor of 10 for n=16000 by applying a loop interchange. • The results show that a good algorithm can produce poorly performing out-of-core programs • The results confirm our improved algorithm also performs in execution

Conclusion • Algorithms can produce poorly performing out of core programs. • The performance of out-of-core programs can be improved via loop transformations using information from algorithm and dependence analysis. • Using this information, it is intended that a tool be developed that automatically utilizes the methods discussed in this project to improve the performance of out-of-core applications.

References • Leiss, E. (2007). A Programmer’s Companion to Algorithm Analysis. Boca Raton, FL: Chapman & Hall/CRC.

Transitioning from Algorithms to Software

Transitioning from Algorithms to Software

Presentation Transcript

Transitioning from HDF4 to HDF5

Transitioning From Student to Professional

Transitioning from Algorithms to Software

Transitioning from Trainee to Assistant Professor

Transitioning From Awareness to Implementation

TRANSITIONING FROM HIGH SCHOOL TO POSTSECONDARY

Transitioning from Preschool to Kindergarten

Transitioning From Pre-K to Kindergarten

Transitioning from Active Duty to Retirement

Transitioning from Bid to Project

Transitioning From Software Requirements Models to Design Models

Transitioning from Shapefiles to Geodatabases

Transitioning from FLASH2 to FLASH3

Transitioning To…

Transitioning From Software Requirements Models to Design Models

Transitioning from College to COLLEGE

Transitioning from Student to Professional

Transitioning from Legacy System to SaaS

Transitioning From Software Requirements Models to Design Models