1 / 16

Fast SoC Architecture Exploration Using Traffic Simulation Techniques

Fast SoC Architecture Exploration Using Traffic Simulation Techniques. Nadjib Mammeri, ARM. Problems we are trying to solve. What interconnect topology should I use? What arbitration and QoS schemes? How should I configure my memory controller? DMC queue length? Memory width?

fwoods
Télécharger la présentation

Fast SoC Architecture Exploration Using Traffic Simulation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast SoC Architecture Exploration Using Traffic Simulation Techniques Nadjib Mammeri, ARM

  2. Problems we are trying to solve What interconnect topology should I use? What arbitration and QoS schemes? How should I configure my memory controller? DMC queue length? Memory width? How to optimally size my interconnect/memory system and still meet my performance requirements?

  3. SoC Architecture Exploration Current Techniques Spreadsheet: Not accurate, Fast, Cheap RTL simulation: 100% Accurate, Slow, Expensive RTL emulation: Accurate, Fast, Expensive Behavioural SystemC models: Accurate, Fast, Expensive Traffic Profiling: ~Accurate, Fast, Cheap Abstracting away some components or parts of the system and replacing them with bus transactors that can: Generate realistic traffic which is statistically equivalent to SoC data flows Re-use existing data flows to explore new architectures Uses constrained random techniques

  4. Our proposed approach • Iteration time of a spreadsheet with the accuracy approaching RTL simulation LOW Mathematical formula, not dynamic LOW Spreadsheet Analysis minutes/hours Statistical or recorded traffic profiles RTL simulation, VPE, User VIP Industry standards VIP minutes/hours Cycle time Realistic behaviour Acceleration/ Emulation VIP, Logic Tiles, SW Adding S/W, external I/F with realistic scenarios days/weeks Observe actual behaviour Silicon/ Applications months/years HIGH HIGH

  5. How is it done? When analysing performance, content or functional intent of the data is not important but the nature and flow of traffic is. Reduction in simulation time can be achieved by trading off functional accuracy of end points. Accuracy should be preserved in the DUT and in the interconnect because it is the performance bottleneck. How simulation speed-up is achieved By ‘giving-up’ execution of functions within the emulated device in favour of emulating its traffic No need to model their cycle-accurate behaviour By replacing real data with constrained random data

  6. Functional Verification Complete AXI functional Verification solution System Verilog Master, Slave, Monitor RTL Protocol assertions RTL Coverage Points Performance Exploration Profile editor toolkit GUI RTL Profile extraction RTL Profile generation AXI Traffic Characterization and Analysis AXI Traffic Replay and Adaptation Profile Data Profile Data IEEE 1800 SystemVerilog Testbench AXI Slave Interface AXI Master Interface DUT User Customer VIP AXI Slave AXI Master AXI Master AXI Monitor (Block or Sub-system) Customer IP AXI Master Interface AXI Slave Interface What is VPE (formerly AVIP) ?

  7. Abstraction example1 If I would like to investigate my interconnect topology, I would keep the RTL for my interconnect and abstract away all end points (masters and slaves). Replace them with VPE masters and slaves Master Master Master Master Monitor Monitor Monitor Monitor Master2 Master 3 Master 4 Master 1 AXI Interconnect AXI Interconnect Slave 2 Monitor Slave 1 Slave Slave

  8. Abstraction example2 If I would like to investigate my memory controller configurability, I would use the RTL for my interconnect and DMC and abstract away other end points. Replace them with VPE masters and slaves Master Master Master Master Monitor Monitor Monitor Monitor AXI Interconnect Monitor Slave DMC Master2 Master 3 Master 4 Master 1 AXI Interconnect Slave 1 DMC

  9. Traffic Profiling (1) Traffic profiles statistically characterise the traffic (transactions) on an AXI connection Traffic flow is an identifiable stream of traffic (AXI transactions) between two points in a system Examples: When profiling at slave 1, traffic coming from Master 2 can be identified using AxID If we know Master 1 always does 4-beat bursts we can identify its traffic flow based on AxLEN

  10. Traffic Profiling (2) A profile is associated with a connection and can have multiple flows Flows contain histograms that store statistical data of both payload and timings information. Payload histograms Histograms describing traffic payload information (control of a transaction, response of a transaction but no data content) ADDRESS, ID, BURST, SIZE, LEN, RESP etc… Timing histograms Histograms describing traffic timings information ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc…

  11. AXI Timing Histograms Inter transaction timings ITT: Histogram parameter defining the inter-transaction timings in a flow (time between successive transactions). Intra transaction timings Flow timings: timings that describe the flow of traffic. Connection timings: timings that are considered as properties of the connection

  12. AXI Intra-Transaction Timings RIL: Time between handshake on the AR channel and the first read transfer on the R channel RW: Time between RVALID and RREADY WIL: Time between handshake on the AW channel and the first write transfer on the W channel WW: Time between WVALID and WREADY

  13. How accurate is it? • 4 hours to 4 minutes – VPE Master executing 2M cycles of traffic profile in place of real Mali200 RTL running Proxycon/Samurai content Real RTL Original captured traffic profile now used to drive VPE Master VPE profile executes much faster than real RTL but generates represent able & controllable traffic VPE Profile

  14. Master Slave Monitor More VPE Features AXI Protocol checker AXI Protocol coverage Traffic profile extraction Transaction recording/ visualisation

  15. Conclusion System architects requires novel techniques with short iteration times to analyze performance and fine tune their SoCs. VPE introduces a new approach that combines high level modeling and statistical low level random generation techniques to explore and verify IP performance. Traffic profiling can be used by VPE masters and slaves to generate statistically equivalent traffic and by VPE monitors when monitoring performance.

  16. Questions

More Related