1 / 89

POD: A Heterogeneous Many-Core Platform Using a 3D-integrated Acceleration Layer

POD: A Heterogeneous Many-Core Platform Using a 3D-integrated Acceleration Layer. Hsien-Hsin S. Lee Georgia Tech with Dong Hyuk Woo Georgia Tech Josh Fryman Intel Corporation Allan Knies Intel Research Berkeley Marsha Eng Intel Corporation. Performance Scalability Area Scalability

irina
Télécharger la présentation

POD: A Heterogeneous Many-Core Platform Using a 3D-integrated Acceleration Layer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POD: A Heterogeneous Many-Core Platform Using a 3D-integrated Acceleration Layer Hsien-Hsin S. Lee Georgia Tech with Dong Hyuk Woo Georgia Tech Josh Fryman Intel Corporation Allan Knies Intel Research Berkeley Marsha Eng Intel Corporation

  2. Performance Scalability • Area Scalability • How many cores on a die? • Power Scalability • How much power per core? ‘Many-Core’? Are We There Yet? • What is new? • On-Die Performance Scalability

  3. Cooler jet Pure copper Cooligy’s channel 3D Cooler Pro PS3 Grill Cooking-Aware Computing (Source: Kevin Skadron, UVa) Power Issue 1: Thermal Control

  4. Power Issue 2: Supply Google Server Farms (Oregon)

  5. Scalable Power Consumption? * The above pictures (>=8) do not represent any current or future Intel products.

  6. A Thousand Cores, Power Consumption? • UIUC: Programming Model for one thousand cores [DAC-44] • Mark Horowitz (Stanford): You will never see a one-thousand core processor [C2S2/GSRC review, Dec 2007] ? ? ? ? ?

  7. Some Known Facts on Interconnection Interconnection Network related • One main energy hog! • Alpha 21364: • 20% (25W / 125W) • MIT RAW: • 40% of individual tile power • UT TRIPS: • 25% • Mesh-based MIMD many-core? • Excessive energy in its router logic • Unpredictable communication pattern • Unfriendly to low-power techniques

  8. Many Many-Cores . . . To Name a Few P* : Homogeneous c* : Homogeneous (Energy Efficient) P+c*: Heterogeneous

  9. Power-Equivalent Performance P+c* c* P* Woo and Lee. To appear in IEEE Computer

  10. Power-Equivalent Performance per Joule P+c* c* P* Woo and Lee. To appear in IEEE Computer

  11. Woo, Fryman, Knies, Eng, and Lee. To appear in IEEE MICRO, July 2008 PODParallel-On-DemandParallel-On-Die

  12. Modern Constraints and Issues • Area & Power scalability (efficiency) • On-chip wire delay • Backward/forward binary compatibility • Extensibility • Low NRE

  13. Instruction Bus (IBUS) ORTree A “Snap-On” Broad-Purpose Accelerator Interleaved 2D Torus Links Data Return Buffer (DRB) Memory Bus (MBUS) Processing Element Memory ring Row Response Queue (RRQ) External TLB (xTLB) x86 Processor Arbiter (ARB) Memory Controller L2 Cache

  14. 3D Die Stacking, Extending Moore’s Law Analog Layer Face to Back DRAM Layer Back to Back Cache Layer Face to Face Processor Layer Heat Sink

  15. 3D-integrated POD Die 0 Die 1 You live and work here You get your beer here

  16. Host Processor Host Processor

  17. POD Array (with 8x8 PEs) Host Processor

  18. Instruction Bus (IBus) Host Processor

  19. ORTree: Status Monitoring Flag status Host Processor

  20. DRB DRB DRB DRB DRB DRB DRB DRB Data Return Buffer (DRB) Host Processor

  21. DRB 0 7 1 6 2 5 3 4 DRB DRB DRB DRB DRB DRB DRB 2D Torus P2P Links Host Processor

  22. DRB DRB DRB DRB DRB DRB DRB DRB 2D Torus P2P Links Host Processor

  23. RRQ DRB RRQ DRB RRQ DRB RRQ DRB DRB RRQ RRQ DRB RRQ DRB RRQ DRB RRQ & MBus: DMA Row Response Queue Memory Bus Host Processor

  24. DRB RRQ RRQ DRB System memory DRB RRQ RRQ DRB MC DRB RRQ MC RRQ DRB RRQ DRB RRQ DRB ARB xTLB LLC On-chip MC / Ring Memory Controller Arbitrator Host Processor

  25. RRQ RRQ D D RRQ D D RRQ D D RRQ D D RRQ D D RRQ D D RRQ D D Wire Delay Aware Design D D D D D D D EFLAGS ! MemFence IBUS

  26. Simplified PE Pipelining N S E W N S E W DEC / RF G Local SRAM X 96-bit VLIW instruction M 80 MBUS 96 IBUS

  27. Inter-PE Communication • Fully synchronized and orchestrated by the host processor • No crossbar • No contention / arbitration • No large buffer of conventional on-chip routers • Nothing unpredictable from the standpoint of programmers!

  28. Memory Organization LLC Ctrl Ring: ~2B Address Ring: ~8B RRQn-1 MCj-1 System memory PE Array RRQ Ctrl Ring: ~2B MC0 RRQ0 xTLB Data Ring: ~66B ARB Worst-case: ~78B LLC x86 Host Core

  29. Execution Model • Heterogeneous ISAs • Host Processor (e.g., x86 CISC) • For backward compatibility • RISC-type VLIW POD • For area- and power-efficiency • Host orchestrates the entire execution • Instruction Encapsulation • SendBits (x86 extension) • Followed by 96-bit POD VLIW instructions

  30. Programming Model • Example code: Reduction char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 }

  31. Programming Model (2x2 POD) • Malloc memory at the host memory space char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 }

  32. Programming Model (2x2 POD) • Initialize ai char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 0 17: 0 18: 0 19: 0

  33. Programming Model (2x2 POD) • Load the address of my PE ID char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  34. Programming Model (2x2 POD) • Load my PE ID char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r15 2 r15 2 r15 2 r15 2 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  35. Programming Model (2x2 POD) • Broadcast base_addr char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r15 2 r15 3 r15 0 r15 1 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  36. Programming Model (2x2 POD) • Calculate the pointer to my ai char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r14 16 r14 16 r15 2 r15 3 r14 16 r14 16 r15 0 r15 1 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  37. Programming Model (2x2 POD) • Load my ai from the host memory space char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r14 16 r14 16 r15 18 r15 19 r14 16 r14 16 r15 16 r15 17 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  38. Programming Model (2x2 POD) • Wait until it is loaded char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r14 16 r14 16 r15 18 r15 19 r14 16 r14 16 r15 16 r15 17 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  39. Programming Model (2x2 POD) • Update S char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } r1 3 r1 4 r14 16 r14 16 r15 18 r15 19 r1 1 r1 2 r14 16 r14 16 r15 16 r15 17 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  40. r1 r1 r1 r1 3 1 2 4 r2 r2 r2 r2 2 1 4 3 r14 r14 r14 r14 16 16 16 16 r15 r15 r15 r15 19 16 17 18 Programming Model (2x2 POD) • 1st iteration char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  41. r1 r1 r1 r1 3 1 2 4 r2 r2 r2 r2 2 1 4 3 r14 r14 r14 r14 16 16 16 16 r15 r15 r15 r15 19 16 17 18 Programming Model (2x2 POD) • Transfer my ai to my east-side neighbor char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  42. r1 3 r1 4 r2 3 r2 4 r14 16 r14 16 r15 18 r15 19 r1 1 r1 2 r2 1 r2 2 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Update S char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  43. r1 4 r1 3 r2 7 r2 7 r14 16 r14 16 r15 18 r15 19 r1 2 r1 1 r2 3 r2 3 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Copy rowsum to r1 char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  44. r1 7 r1 7 r2 7 r2 7 r14 16 r14 16 r15 18 r15 19 r1 3 r1 3 r2 3 r2 3 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • 1st iteration char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  45. r1 r1 r2 7 r2 7 r14 16 r14 16 r15 18 r15 19 r1 r1 r2 3 r2 3 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Transfer my rowsum to my north-side neighbor char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } 7 7 3 3 base_addr: 16 16: 1 17: 2 18: 3 19: 4

  46. r1 7 r1 7 r2 7 r2 7 r14 16 r14 16 r15 18 r15 19 r1 3 r1 3 r2 3 r2 3 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Update S char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  47. r1 3 r1 3 r2 10 r2 10 r14 16 r14 16 r15 18 r15 19 r1 7 r1 7 r2 10 r2 10 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Done! char* base_addr = (char*) malloc(npe); for ( i=0; I < npe; i++ ) { *(base_addr+i) = i+1; } $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] POD_movl( r14, base_addr ); $ASM add r15 = r15, r14 $ASM ld r1 = sys [ r15 + 0 ] POD_mfence(); $ASM add r2 = r2, r1 for ( i=0; i< (npeX-1); i++ ) { $ASM xfer.east r1 = r1 $ASM add r2 = r2, r1 } $ASM add r1 = r2, 0 for ( i=0; i< (npeY-1); i++ ) { $ASM xfer.north r1 = r1 $ASM add r2 = r2, r1 } base_addr: 16 16: 1 17: 2 18: 3 19: 4

  48. r1 3 r1 3 r2 10 r2 10 r14 16 r14 16 r15 18 r15 19 r1 7 r1 7 r2 10 r2 10 r14 16 r14 16 r15 16 r15 17 Programming Model (2x2 POD) • Broadcast the pointer to ‘result’ int result; POD_movl( r14, &result ); $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] $ASM sub r15 = r15, 0 $ASM pushmask.e $ASM st sys[ r14 + 0 ] = r2 $ASM popmask POD_mfence(); base_addr: 16 16: 1 17: 2 18: 3 19: 4 128: <- addr of ‘result’

  49. r1 3 r1 3 r2 10 r2 10 r14 128 r14 128 r15 18 r15 19 r1 7 r1 7 r2 10 r2 10 r14 128 r14 128 r15 16 r15 17 Programming Model (2x2 POD) • Only PE0 will write-back! int result; POD_movl( r14, &result ); $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] $ASM sub r15 = r15, 0 $ASM pushmask.e $ASM st sys[ r14 + 0 ] = r2 $ASM popmask POD_mfence(); base_addr: 16 128: <- addr of ‘result’ 16: 1 17: 2 18: 3 19: 4

  50. r1 3 r1 3 r2 10 r2 10 r14 128 r14 128 r15 2 r15 2 r1 7 r1 7 r2 10 r2 10 r14 128 r14 128 r15 2 r15 2 Programming Model (2x2 POD) • Load PE_ID int result; POD_movl( r14, &result ); $ASM mov r15 = MY_PE_ID_POINTER $ASM ld r15 = local [ r15 + 0 ] $ASM sub r15 = r15, 0 $ASM pushmask.e $ASM st sys[ r14 + 0 ] = r2 $ASM popmask POD_mfence(); base_addr: 16 128: <- addr of ‘result’ 16: 1 17: 2 18: 3 19: 4

More Related