Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Chapter 4: Arithmetic for Computers (Part 1)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Chapter 4: Arithmetic for Computers(Part 1)**CS 447 Jason Bakos**Notes on Project 1**• There are two different ways the following two words can be stored in a computer memory… • word1 .byte 0,1,2,3 • word2 .half 0,1 • One way is big-endian, where the word is stored in memory in its original order… • word1: • word2: • Another way is little-endian, where the word is stored in memory in reverse order… • word1: • word2: • Of course, this affects the way in which the lw instruction works…**Notes on Project 1**• MIPS uses the endian-style that the architecture underneath it uses • Intel uses little-endian, so we need to deal with that • This affects assignment 1 because the input data is stored as a series of bytes • If you use lw’s on your data set, the values will be loaded into your dest. register in reverse order • Hint: Try the lb/sb instruction • This instruction will load/store a byte from an unaligned address and perform the translation for you**Notes on Project 1**• Hint: Use SPIM’s breakpoint and single-step features to help debug your program • Also, make sure you use the registers and memory/stack displays • Hint: You may want to temporarily store your input set into a word array for sorting • Make sure you check Appendix A for additional useful instructions that I didn’t cover in class • Make sure you comment your code!**Goals of Chapter 4**• Data representation • Hardware mechanisms for performing arithmetic on data • Hardware implications on the instruction set design**Review of Binary Representation**• Binary/Hex -> Decimal conversion • Decimal -> Binary/Hex conversion • Least/Most significant bits • Highest representable number/maximum number of unique representable symbols • Two’s compliment representation • One’s compliment • Finding signed number ranges (-2n-1 to 2n-1-1) • Doing arithmetic with two’s compliment • Sign extending with load half/byte • Unsigned loads • Signed/unsigned comparison**Binary Addition/Subtraction**• Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment • Overflow in signed arithmetic occurs under the following conditions:**What Happens When Overflow Occurs?**• MIPS detects overflow with an exception/interrupt • When an interrupt occurs, a branch occurs to code in the kernel at address 80000080 where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt • SPIM has a simple interrupt handler built-in that deals with interrupts • We may come back to interrupts later**Review of Shift and Logical Operations**• MIPS has operations for SLL, SRL, and SRA • We covered this in the last chapter • MIPS implements bit-wise AND, OR, and XOR logical operations • These operations perform a bit-by-bit parallel logical operation on two registers • In C, use << and >> for arithmetic shifts, and &, |, ^, and ~ for bitwise and, or, xor, and NOT, respectively**Review of Logic Operations**• The three main parts of a CPU • ALU (Arithmetic and Logic Unit) • Performs all logical, arithmetic, and shift operations • CU (Control Unit) • Controls the CPU – performs load/store, branch, and instruction fetch • Registers • Physical storage locations for data**Review of Logic Operations**• In this chapter, our goal is to learn how the ALU is implemented • The ALU is entirely constructed using boolean functions as hardware building blocks • The 3 basic digital logic building blocks can be used to construct any digital logic system: AND, OR, and NOT • These functions can be directly implemented using electric circuits (wires and transistors)**Review of Logic Operations**• These “combinational” logic devices can be assembled to create a much more complex digital logic system**Review of Logic Operations**• We need another device to build an ALU… • This is called a multiplexor… it implements an if-then-else in hardware**A 1-bit ALU**• Perform logic operations in parellel and mux the output • Next, we want to include addition, so let’s build a single-bit adder • Called a full adder**Full Adder**• From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder • We can also add this input to our ALU • How do we give subtraction ability to our adder? • How do we detect overflow and zero results?**Chapter 4: Arithmetic for Computers(Part 2)**CS 447 Jason Bakos**Logic/Arithmetic**• From the truth table for the mux, we can use sum-of-products to derive the logic equation • With sum-of-products, for each ‘1’ row for each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products • To make it simpler, let’s add “don’t cares” to the table…**Logic/Arithmetic**• This gives us the following equation • (A and (not D)) or (B and D) • We don’t need the inputs for the “don’t cares” in our partial products • This is one way to simplify our logic equation • Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm**Logic/Arithmetic**• Here is a (crude) digital logic design for the 2-to-1 mux • Note that multiple muxes can be assembled in stages to implement multiple-input muxes**Logic/Arithmetic**• For the adder, let’s minimize the logic using a Karnaugh Map… • For CarryOut, we need 23 entries… • We can minimize this to • CarryOut=AB+CarryInB+CarryInC**Logic/Arithmetic**• There’s no way to minimize this equation, so we need the full sum of products: • Sum=(not A)(not B)CarryIn + ABCarryIn + (not A)BCarryIn + A(not B)CarryIn**Logic/Arithmetic**• In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1 • This can be implemented with a mux: select B or not B (call this input Binvert) • Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation • We can perform the AND, OR, and ADD in parallel and switch the results with a 4-input mux (Operation will be our D-input) • To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1**Lecture 4: Arithmetic for Computers(Part 3)**CS 447 Jason Bakos**Chapter 4 Review**• So far, we’ve covered the following topics for this chapter • Binary representation of signed integers • 16 to 32 bit signed conversion • Binary addition/subtraction • Overflow detection/overflow exception handling • Shift and logical operations • Parts of the CPU • AND, OR, XOR, and inverter gates • Multiplexor (mux) and full adder • Sum-of-products logic equations (truth tables) • Logic minimization techniques • Don’t cares and Karnaugh Maps**1-bit ALU Design**• A 1-bit ALU can be constructed • Components • AND, OR, and adder • 4-to-1 mux • “Binverter” (inverter and 2-to-1 mux) • Interface • Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less • Outputs: CarryOut and Result • Digital functions are performed in parallel and the outputs are routed into a mux • The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU • The select lines of the mux make up the “operation” input to the ALU**32-bit ALU**• In order to create a multi-bit ALU, array 32 1-bit ALUs • Connect the CarryOut of each bit to the CarryIn of the next bit • A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B • The Result outputs of each 1-bit ALU will form the 32-bit result • We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input • Hardwire the other “Less” inputs to 0 • We need to add an Overflow unit • We need to add a Zero detection unit**SLT Unit**• To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed • With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result**Overflow Unit**• When doing signed arithmetic, we need to follow this table, as we covered previously… • How do we implement this in hardware?**Overflow Unit**• We need a truth table… • Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1**Zero Detection Unit**• “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU**32-bit ALU Operation**• We need a 3-bit ALU Operation input into our 32-bit ALU • The two least significant bits can be routed into all the 1-bit ALUs internally • The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs**32-bit ALU Operation**• Here’s the final ALU Operation table:**32-bit ALU**• In the end, our ALU will have the following interface: • Inputs: • A and B (32 bits each) • ALU Operation (3 bits) • Outputs: • CarryOut (1 bit) • Zero (1 bit) • Result (32 bits) • Overflow (1 bit)**Carry Lookahead**• The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case) • The longest path that a digital signal must propagate through is called the “critical path” • This is WAAAYYYY too slow! • There other ways to build an adder that require lg n delay • Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic) • Obviously, in the case of a 64-input system, the resulting design will be too big and too complex**Carry Lookahead**• For example, we can easily see that the CarryIn for bit 1 is computed as: • c1=(a0b0)+(a0c0)+(b0c0) • c2=(a1b1)+(a1c1)+(b1c1) • Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays • c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0) • I used the logical distributive law to compute this • As you can see, the CarryIn logic gets bigger and bigger for consecutive bits**Carry Lookahead**• Carry Lookahead adders are faster than ripple-carry adders • Recall: • ci+1=(aibi)+(aici)+(bici) • ci can be factored out… • ci+1=(aibi)+(ai+bi)ci • So… • c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)**Carry Lookahead**• Note the repeated appearance of (aibi) and (ai+bi) • They are called generate (gi) and propagate (pi) • gi=aibi, pi=ai+bi • ci+1=gi+pici • This means if gi=1, a CarryOut is generated • If pi=1, a CarryOut is propagated from CarryIn**Carry Lookahead**• c1=g0+(p0c0) • c2=g1+(p1g0)+(p1p0c0) • c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0) • c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0) • …This system will give us an adder with 5 gate delays but it is still too complex**Carry Lookahead**• To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic • The superpropagate is only true if all the bits propagate a carry • P0=p0p1p2p3 • P1=p4p5p6p7 • P2=p8p9p10p11 • P3=p12p13p14p15**Carry Lookahead**• The supergenerate follows a similar equation: • G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0) • G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4) • G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8) • G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12) • The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit • This yields a worst-case delay of 7 gate delays • Reason?**Carry Lookahead**• We’ve covered all ALU functions except for the shifter • We’ll talk after the shifter later**Lecture 4: Arithmetic for Computers(Part 4)**CS 447 Jason Bakos**Binary Multiplication**• In multiplication, the first operand is called the multiplicand, and the second is called the multiplier • The result is called the product • Not counting the sign bits, if we multiply an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product**Binary Multiplication**• Binary multiplication works exactly like decimal multiplication • In fact, multiply 100101 by 111001 and pretend you’re using decimal numbers**First Hardware Design for Multiplier**Note that the multiplier is not routed into the ALU**Second Hardware Design for Multiplier**• Architects realized that at the least, half of the bits in the multiplicand register were 0 • Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left • In this case, the product is only 32 bits**Final Hardware Design for Multiplier**• Let’s combine the product register with the multiplier register… • Put the multiplier in the right half of the product register and initialize the left half with zeros – when we’re done, the product will be in the right half**Final Hardware Design for Multiplier**• For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive • The signs would need to be remembered so the product can be converted to whatever sign it needs to be • The third design will deal with signed numbers, as long as the sign bit is extended in the product register