1 / 5

An analytical model for ATLAS

This paper presents an analytical model for matrix multiplication using the ATLAS library. It focuses on optimizing the tile size (NB) to enhance performance. The model accounts for increasingly complex scenarios while ensuring the entire working set fits in L1 cache. It details the memory requirements for each iteration of the nested loops and discusses fully associative cache impacts, optimal replacement strategies like LRU, and efficient memory usage. This work aims to minimize data movement and maximize computational efficiency, offering insights for improving matrix multiplication algorithms.

elia
Télécharger la présentation

An analytical model for ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An analytical model for ATLAS

  2. matmul(c[i:i+NB-1,j:j+NB-1], a[i:i+NB-1,k:k+NB-1],b[k:k+NB-1,j:j+NB-1]) for (int j = 0; j < NB; j++) for (int i = 0; i < NB; i++) for (int k = 0; k < NB; k++) c[i][j] += a[i][k] * b[k][j]

  3. Modeling for Tile Size (NB) • Models of increasing complexity • 3*NB2 ≤ C • Whole work-set fits in L1 • NB2 + NB + 1 ≤ C • Fully Associative • Optimal Replacement • Line Size: 1 word • or • Line Size > 1 word • or • LRU Replacement

  4. Explanation for LRU Model (I)

  5. Explanation for LRU Model (II) Each iteration of j requires: -NB2 elems of A -a column of C (NB elems) -a column of B (NB elems) In the middle of iteration j+1, being able to reuse the Elements of A requires holding not one, but two colums of B; and one extra element of C. Thus:

More Related