An Efficient Texture Cache for Programmable Vertex Shaders

An Efficient Texture Cache for Programmable Vertex Shaders Seunghyun Cho,Chang-Hyo Yu,and Lee-Sup Kim ISCAS 2006, pp. 3834 - 3837

outline • Introduction • Vertex texture cache design • △u and △v Test (△-test) • Same Block Test (SBT) • Implementation • Simulation results • Conclusion

Introduction(1/3) • Vertex texture sample data in GE, and it usually used for the terrain and water in the scene • Unlike the per-pixel texture, mapping of vertex texture may spread in different cache block

Introduction(2/3) • Per-pixel texture always mapping several pixels in a cache block, but vertices may mapping to different cache block

Introduction(3/3) • If a cache block just has one data need to access, burst transfer is waste • Vertex texture not have high locality for the whole area of a scene • Because the low locality, traditional texture caches used in RM are not applicable to vertex texture

Vertex texture cache design • According to the estimated locality, the proposed cache operates in two modes • Cache mode: for high locality • Burst transfer from the external memory is used for this mode • Direct mode: for low locality • Every vertex required to read a texel from the external memory

△-test(1/5) • A texture block that is worth caching should satisfy the equation Cdirect × Nacc ≧ Cdirect＋ Nblock－ 1 Cdirect: the number of cycles required to read a texel from external memory Nacc: the number of accessed texels in a block Nblock: the number of texels in a block

△-test(2/5) • Calculate Nacc be the basis to compare • The minimum Nacc to satisfy previous equation is decided by Cdirect and Nblock • If Nblock is 16 and Cdirect is 8, the minimum Nacc is 3 8 × Nacc ≧ 8 ＋ 16 － 1 ＝ 23

△-test(3/5) • The operation mode of the cache can determined by following equation Nest:in this proposed cache, the number of texels being accessed in a block △u and △v : calculates the distance between two consecutively reqested texel coordinates in terms of △u and △v

△-test(4/5) • When current vertex is V3 and previous vertex is V2, △u and △v between T3 and T2 are 4 and 4

△-test(5/5) Nest = 16 / (4^2) = 1 Nacc = 3 • In this case, there will chose direct mode • Conform to the equation (Nest ≧ Nacc), the operation mode is cache mode • Otherwise, the operation mode will choose direct mode

SBT(1/2) • △-test will be a possible miss prediction • When current vertex is V3,△-test require the cache to operate in direct mode • But next requests will access in the same block

SBT(2/2) • SBT observes the accesses requested after the current request • If requested data after the current request in the same block, choose cache mode

Implementation(1/2) • There have three test pattern of vertex texture access use for simulations • Up of figure are simulate results • Down of it are density of vertices

Implementation(2/2) • Pattern(a) • Vertices are evently spread in a scene • Pattern(b) • Vertices has higher density as it goes to the origin • Pattern(c) • The density of vertices is unevently

Simulation results(1/2) • The average ACVT over different tessellation ※ACVT(average cycle per vertex texel)

Simulation results(2/2) • Up of dashed line indicate need not use cache mode in proposed cache • Low density vertices got the better results • Dense vertices cause almost accesses by cache mode

Conclusion • The proposed cache improves 27% of the vertex texture loading performance for general test scenes • The hardware overhead added to the conventional cache is 9.6%

An Efficient Texture Cache for Programmable Vertex Shaders