1 / 34

designKilla: The 32-bit pipelined processor

designKilla: The 32-bit pipelined processor. Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho. 32-Bit RISC Pipelined Processor.

marsha
Télécharger la présentation

designKilla: The 32-bit pipelined processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. designKilla:The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho

  2. 32-Bit RISC Pipelined Processor • Reduced Instruction Set allows for faster execution of simple, frequently used instructions which can be combined to achieve the same result as a single, slower CISC instruction • Pipelining allows a faster clock cycle and less wasted resources

  3. Datapath Pipeline Stages • 5 Stages • Instruction Fetch • Instruction Decode • Execution • Memory Write • Write Back

  4. Unique Data Path Features • Next instruction address calculation • For basic incrementation, the address is calculated by a counter

  5. Address Jump Calculations • For address jumps, there is a 19-bit load port on the counter • The loaded address comes from an adder with multiplexed inputs • Load bit is controlled by a comparator (beq) or-ed with the absolute jump control bit

  6. Double Clocked Memory Interface • Problem: One Memory for both Instruction and Data • Solution: Double Clock! • Access the memory twice during one clock cycle

  7. Double Clocked Memory Interface Write Enable Fast Clock Clock Fetch Instruction Fetch Data Fetch Instruction Write Data • Fetches Instruction in First Cycle • Fetches or Writes Data In Second Cycle • Data is output by end of Clock Cycle

  8. Unique Data Path Features • Structural Multiplier • 16 X 16 bit • Multi-level creation: • Four 8 X 8 bit multipliers • Each containing four 4 X 4 bit multipliers • Each comprised of a cascaded network of full and half adders, built on logic gates

  9. 16-Bit Multiplier Unit • Based On Hand Multiplication • Made Up of Networkof AND Gatesand Adders

  10. Why 32  16 bit? 32bit x 32bit = 64 bits! Multiple complex changes to existing architecture would be required • Only one register can be written per clock cycle • Could hold value for next cycle or stall the pipeline • Would require pseudoinstruction as well as new hardware and multiple control signals

  11. srli 25, 25, 16 and 24, 22, 30 srli 24, 24, 16 add 24, 24, 25 and 25, 23, 31 add 24, 24, 25 and 22, 24, 30 srli 22, 22, 16 and 21, 23, 30 srli 21, 21, 16 add 6, 21, 22 slli 6, 6, 16 and 24, 24, 31 or 6, 6, 24 mult 20, 2, 4 mult 21, 4, 1 mult 22, 2, 3 mult 23, 1, 3 and 24, 20, 30 srli 24, 24, 16 and 25, 21, 31 add 25, 24, 25 and 24, 22, 31 add 25, 25, 24 and 5, 25, 31 srli 5, 5, 16 and 20, 20, 31 or 5, 5, 20 Use pseudo-code instruction mult32

  12. Improve the Multiplier • Can decrease the latency of a combinational multiplier with carry-look ahead adding methods. • Small amount of extra hardware needed, worth it if multiplier has largest latency.

  13. Other Multiplier Topologies • Shifting multiplication • Shift multiplicand several times based on multiplier bits • Add intermediate shifted values

  14. Other Multiplier Topologies • Pipelined multiplication • Store intermediate sums • Allows for faster clock cycle if traditional combinational multiplication presents the critical path

  15. Other Multiplier Topologies • Pipelined multiplication • Sequential multiplication • Useful to minimize hardware waste if multiplication is an infrequent operation • Continues to allow for faster clock cycle if traditional combinational multiplication presents the critical path

  16. Instruction Set Architecture R-Type

  17. I-Type J-Type

  18. The Assembler • Converts assembly code to binary representation Add $3,$1,$2 => 0000000001000100001100000000000 • 16-bit wide memory modules • Split into high and low bits for output 000000000100010 => High 0001100000000000 => Low

  19. Assembler Features • Allows for labels to be used in loops • Automatically calculates offsets based on label position • LABEL: add $1,$2,$3 • jmp LABEL • Resolves hazards created by pipelining • Automatically determines the appropriate number of NO-OPS to insert based on relative position of consecutive instructions

  20. Design allows for pseudo-instructions to be used Pseudo Instruction HLT Actual Instructions H1: JMP H1 NOP NOP

  21. Topic 2 Design – Compiler • Bison - Parser • A compiler compiler • A grammar generator • ------------------------- • Flex – Lexer • A Fast lexical analyzer • Tool used in pattern matching on text

  22. Compiling The C Language • Interface Lexer and Parser • Lex will feed tokens to Bison (YACC) • A grammar tree is generated

  23. Source code to run-time

  24. A simple C program void main ( void ) { int b ; int d; int x; int y = 3; int g; x = b + d; g = y + x; } Assembly Code Equivalent lwi 4, 0, 3 add 6, 1, 2 sw 3, 6, 0 add 6, 4, 3 sw 5, 6, 0 A simple program • Machine Code Instructions • Memory High • 0 0000110000000100 • 1 0000000000100010 • 2 0000000000000000 • 3 0000000000000000 • 4 0000000000000000 • 5 0000000000000000 • 6 0000000000000000 • 7 0000100011000011 • 8 0000000000000000 • 9 0000000000000000 • 10 0000000000000000 • 11 0000000010000011 • 12 0000000000000000 • 13 0000000000000000 • 14 0000000000000000 • 15 0000000000000000 • 16 0000000000000000 • 17 0000100011000101 • Memory Low • 0 0000000000000011 • 1 0011000000000000 • 2 0000000000000000 • 3 0000000000000000 • 4 0000000000000000 • 5 0000000000000000 • 6 0000000000000000 • 7 0000000000000000 • 8 0000000000000000 • 9 0000000000000000 • 10 0000000000000000 • 11 0011000000000000 • 12 0000000000000000 • 13 0000000000000000 • 14 0000000000000000 • 15 0000000000000000 • 16 0000000000000000 • 17 0000000000000000

  25. Could Use a Little Work • Currently the Processor could use a little work to improve performance. • Decreased memory latency would be largest and most direct improvement to processor. • Must optimize ALU as well as multiplier unit. • All in all, will work but not ready for commercial usage.

  26. References Computer Organization and Design: The Hardware Software Interface (2nd Ed) Patterson, David A. and Hennessy, John L. Morgan Kaufman Publishers, 1997 Introduction to Compilers http://cs.wwc.edu/~aabyan/221_2/PLBOOK/Translation.html Aaby, Anthony A., 1998 The Compiler Design Handbook Srikant, Y. N. and Shankar, Priti CRC Press, 2002

  27. THE END Questions?

More Related