1 / 43

RTL-Synchronized Transaction Reference Models

RTL-Synchronized Transaction Reference Models. Dave Whipp Fast-Chip Inc. Motivation. Needed Cycle Verification Now, not 6 months later Why build two models, when one will do We had a working “functional” model Don’t Chase RTL Avoid modeling artifacts of the implementation. Overview.

iliana
Télécharger la présentation

RTL-Synchronized Transaction Reference Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RTL-SynchronizedTransaction Reference Models Dave Whipp Fast-Chip Inc.

  2. Motivation • Needed Cycle Verification • Now, not 6 months later • Why build two models, when one will do • We had a working “functional” model • Don’t Chase RTL • Avoid modeling artifacts of the implementation

  3. Overview • What is Transaction Synchronization • Patterns in Transaction Synchronization • Methodology, Futures, Summary

  4. Part 1 What is Transaction Synchronization?

  5. A Functional Model int classify_packet ( Packetpacket_data, Uint32 rule_address ) { int result = ITERATE while (result == ITERATE) { RuleStruct rule; read_rule(&rule, rule_address); int field = extract(rule, packet_data); interpret(rule, field, &result, &rule_address); } return result; }

  6. “Bringup” Flow csim.log C-sim test.script Compare RTL-sim rtl.log

  7. Transaction Interactions Thread A Thread B Read-Rule Write-Rule Rules DB

  8. Trace Files • A trace of the sequence of transaction steps • Each synch point has a name, and thread-ID • Comments provide context (values from RTL) • Often hand-edited during debug Example: [1536] read_rulethread_A# addr=h8a34 data=h1578 [1544] write_rulethread_B# addr=h8a34 data=h5343 [1632] read_rulethread_A#addr=h8a34 data=h5343 [1694] write_rulethread_B# addr=h8a34 data=hf519 [1694] read_rule thread_A# addr=h8a34 data=hf519

  9. “Synchronized” Flow csim.log C-sim Compare test.script RTL-sim rtl.log

  10. Simulation Kernel Read Synch Call Synch function [pending] [not pending] Pending Synch Points (task list) Read Stimulus

  11. Monitor A Delay Delay Mem Arb B Memory Access with Arbiter

  12. Monitor B Monitor A Memory B A Delay Delay Dual Port Memory Access

  13. A Functional Model int classify_packet ( Packet packet_data, Uint32 rule_address ) { int result = ITERATE while (result == ITERATE) { RuleStruct rule; read_rule(&rule, rule_address); int field = extract(rule, packet_data); interpret(rule, field, &result, &rule_address); } return result; } } intcontinue_read_rule () {

  14. Refactoring • Move local variables into a “context” structure. Create an instance (on the heap, not the stack) at start of transaction – and delete at end. • Replace iterative loops with recursive functions. • For each function that requires synchronization (directly or indirectly), replace the call with a request/callback pair.

  15. “Context” Structure struct context { Packet packet_data; Uint32 rule_address; RuleStruct rule; int field; int result; void (*callback) (int); };

  16. Introduce Context Structure void classify_packet_request ( Packet packet_data, Uint32 rule_address, void (*callback)(int)) { struct context *cxt = calloc(1, sizeof(struct context)); cxt->packet_data = packet_data; cxt->rule_address = rule_address; cxt->callback = callback; cxt->result = ITERATE; classify_packet_iterate(cxt); } void packet_classify_reply(struct context *cxt) { int result = cxt->result; void(*callback)(int) = cxt->callback; free(cxt); callback(result); }

  17. Non-Recursive Implementation void classify_packet_iterate ( struct context *cxt ) { while (cxt->result == ITERATE) { read_rule(&cxt->rule, cxt->rule_address); cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); } classify_packet_reply(cxt); }

  18. Recursive Implementation void classify_packet_iterate ( struct context *cxt ) { if (cxt->result == ITERATE) { read_rule(&cxt->rule, cxt->rule_address); cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); classify_packet_iterate(cxt); } else { classify_packet_reply(cxt); } }

  19. Synchronized Implementation void classify_packet_iterate ( struct context *cxt ) { if (cxt->result == ITERATE) { read_rule_request(&cxt->rule, cxt->rule_address, &classify_packet_continue); } else { classify_packet_reply(cxt); } } void continue_read_rule ( struct context *cxt ) { cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); classify_packet_iterate(cxt); }

  20. Rules DB Packet Buffer Transaction Diagrams Classify Packet [done] [iterate] Read Rule Extract Interpret

  21. Part 2 Patterns in Transaction Synchronization

  22. Adding a Cache • Cache needn’t effect transactions • Data-RAM not modeled • cache is coherent • Can rerun all tests, with no changes to C model • Tag RAM is an Addition, not Modification • Independent Transactions • Independent Synchronization

  23. Read/Write Check ECC Miss Rd/Wr Tag RAM A Hit Rd/Wr Delay Delay Cache Mem Arb B Correct Errors Single Port, Cached

  24. Read Tag [hit] [miss] Read Data Write Tag Read Tag Check ECC Write Tag Cache Transaction (Read)

  25. FIFOs and Counters • Delay elements need no synchronization • But synchronization can increase locality • Some FIFOs can drop transactions • Synchronize overflow: don’t model actual size • Counters seem to need cycle-based model • We want to avoid this • Correct Synch propagates “forces” to Model

  26. Push Pop Force Producer FIFO Consumer Flow Control Drop Synchronizing a FIFO

  27. Pop FIFO Transaction Diagram [drop] [push]

  28. Checker: Queue Size Assertions Push Producer FIFO Consumer Pop Drop FIFO Synchronization Checker

  29. Force Sample Update value value Counters load Register Client +1 clk select sample_en

  30. Scaffolding • Permit verification incomplete RTL • Encourage end-to-end skeletons • Implement “incorrect, but simple” algorithms • Don’t wait for complete RTL • Postpone modeling the algorithm • Use synch to avoid chasing a moving target • Remove scaffolding once RTL is complete

  31. Hit Rd/Wr Miss Rd/Wr Tag Ram Read Node Result Cache Hit Miss An Algorithm Cache Node Memory Tree Search

  32. Read Tag [hit] [miss] Backdoor search Algorithm Cache: Transactions Read Node [match] [iterate] [No match]

  33. Speculation • When hardware speculates: • Effect precedes cause • Transaction model appears incorrect • Creative accounting can sometimes help • Insert a “virtual” delay • Filter based on future events

  34. Read Data Read Data Speculation Read Ctrl

  35. Speculation Read Ctrl Read Data

  36. write Delay (2 clocks)? read Update Data RAM Update Ctrl RAM advance (2 clocks)? read write Speculative Reads Lookup (Pipe) Stage 1 Stage 2 Stage 3 Stage 4

  37. Part 3 Methodology, Futures, Summary

  38. Verification Flow • RTL Simulation is expensive • Licenses • CPU time • Post-Processing is cheap • Stop simulations when broken • But not if bug is in test/model

  39. Methodology • Cycle-Precise Reference Comparison • Without a cycle-accurate model • Verify the System First • Bringup Flow (Function Model) • Synchronized Flow (Transaction-Testbench) • Postpone module level testing • Use scoreboarding to identify unit testbenches • Only build unit-testbenches for stable modules

  40. Comparison with Platform-Based • System-on-Chip Methodology • Verify components first • Verify system as composition of verified units • Complex-ASIC Methodology • Verify transactions first • Verify units in context of verified transactions • An “Agile” Methodology

  41. Future Work • Performance in non-synchronized mode • Use threading to avoid fragmentation • Synchronization as basis of SW architecture • Cycle-model plug-in could provide synch • Can postpone this plug-in until tapeout • But what if we want a cycle-model earlier? • Example: up-front performance validation

  42. Summay • Cycle timing is a “Don’t Care” • Initial verification uses “Functional” model • Refactor into “Transaction” model • RTL provides cycle timing • Caches, like FIFOs, are just delay elements • “Forces” in testbench propagate to model • “Coarse-grain first” methodology

  43. Questions mailto:Dave@Whipp.name http://Dave.Whipp.name/dv

More Related