An Efficient Texture Cache for Programmable Vertex Shaders
This paper discusses an innovative vertex texture cache design aimed at enhancing performance for programmable vertex shaders. Traditional texture caching methods are ineffective for vertex textures due to low locality characteristics. The proposed cache operates in dual modes, adapting to varying data access patterns. It evaluates access patterns using the △-test and Same Block Test (SBT) to optimize performance. Simulation results demonstrate a 27% improvement in texture loading performance, with only a 9.6% overhead on existing cache architectures. This work is critical for efficient handling of vertex textures in graphical applications.
An Efficient Texture Cache for Programmable Vertex Shaders
E N D
Presentation Transcript
An Efficient Texture Cache for Programmable Vertex Shaders Seunghyun Cho,Chang-Hyo Yu,and Lee-Sup Kim ISCAS 2006, pp. 3834 - 3837
outline • Introduction • Vertex texture cache design • △u and △v Test (△-test) • Same Block Test (SBT) • Implementation • Simulation results • Conclusion
Introduction(1/3) • Vertex texture sample data in GE, and it usually used for the terrain and water in the scene • Unlike the per-pixel texture, mapping of vertex texture may spread in different cache block
Introduction(2/3) • Per-pixel texture always mapping several pixels in a cache block, but vertices may mapping to different cache block
Introduction(3/3) • If a cache block just has one data need to access, burst transfer is waste • Vertex texture not have high locality for the whole area of a scene • Because the low locality, traditional texture caches used in RM are not applicable to vertex texture
Vertex texture cache design • According to the estimated locality, the proposed cache operates in two modes • Cache mode: for high locality • Burst transfer from the external memory is used for this mode • Direct mode: for low locality • Every vertex required to read a texel from the external memory
△-test(1/5) • A texture block that is worth caching should satisfy the equation Cdirect × Nacc ≧ Cdirect+ Nblock- 1 Cdirect: the number of cycles required to read a texel from external memory Nacc: the number of accessed texels in a block Nblock: the number of texels in a block
△-test(2/5) • Calculate Nacc be the basis to compare • The minimum Nacc to satisfy previous equation is decided by Cdirect and Nblock • If Nblock is 16 and Cdirect is 8, the minimum Nacc is 3 8 × Nacc ≧ 8 + 16 - 1 = 23
△-test(3/5) • The operation mode of the cache can determined by following equation Nest:in this proposed cache, the number of texels being accessed in a block △u and △v : calculates the distance between two consecutively reqested texel coordinates in terms of △u and △v
△-test(4/5) • When current vertex is V3 and previous vertex is V2, △u and △v between T3 and T2 are 4 and 4
△-test(5/5) Nest = 16 / (4^2) = 1 Nacc = 3 • In this case, there will chose direct mode • Conform to the equation (Nest ≧ Nacc), the operation mode is cache mode • Otherwise, the operation mode will choose direct mode
SBT(1/2) • △-test will be a possible miss prediction • When current vertex is V3,△-test require the cache to operate in direct mode • But next requests will access in the same block
SBT(2/2) • SBT observes the accesses requested after the current request • If requested data after the current request in the same block, choose cache mode
Implementation(1/2) • There have three test pattern of vertex texture access use for simulations • Up of figure are simulate results • Down of it are density of vertices
Implementation(2/2) • Pattern(a) • Vertices are evently spread in a scene • Pattern(b) • Vertices has higher density as it goes to the origin • Pattern(c) • The density of vertices is unevently
Simulation results(1/2) • The average ACVT over different tessellation ※ACVT(average cycle per vertex texel)
Simulation results(2/2) • Up of dashed line indicate need not use cache mode in proposed cache • Low density vertices got the better results • Dense vertices cause almost accesses by cache mode
Conclusion • The proposed cache improves 27% of the vertex texture loading performance for general test scenes • The hardware overhead added to the conventional cache is 9.6%