Bits and Bytes

Bits and Bytes COMP 21000 Comp Org & Assembly Lang Topics • Why bits? • Representing information as bits • Binary / Hexadecimal • Byte representations • Numbers • Characters and strings • Instructions • Bit-level manipulations • Boolean algebra • Expressing in C

Bit techniques INTEGER CODING RULES: Replace the "return" statement in each function with one or more lines of C code that implements the function. Your code must conform to the following style: int Funct(arg1, arg2, ...) { /* brief description of how your implementation works */ int var1 = Expr1; ... int varM = ExprM; varJ = ExprJ; ... varN = ExprN; return ExprR; }

Bit techniques Each "Expr" is an expression using ONLY the following: 1. Integer constants 0 through 255 (0xFF), inclusive. You are not allowed to use big constants such as 0xffffffff. 2. Function arguments and local variables (no global variables). 3. Unary integer operations ! ~ 4. Binary integer operations & ^ | + << >> Some of the problems restrict the set of allowed operators even further. Each "Expr" may consist of multiple operators. You are not restricted to one operator per line. You are expressly forbidden to: 1. Use any control constructs such as if, do, while, for, switch, etc. 2. Define or use any macros. 3. Define any additional functions in this file. 4. Call any functions. 5. Use any other operations, such as &&, ||, -, or ?: 6. Use any form of casting. 7. Use any data type other than int. This implies that you cannot use arrays, structs, or unions.

Bit techniques /* pow2plus1 - returns 2^x + 1, where 0 <= x <= 31 */ int pow2plus1(int x) { /* exploit ability of shifts to compute powers of 2 */ return } /* pow2plus4 - returns 2^x + 4, where 0 <= x <= 31 */ int pow2plus4(int x) { /* exploit ability of shifts to compute powers of 2 */ return result; } (1 << x) + 1; int result = (1 << x) result += 4;

Bit techniques / * bitNor - ~(x|y) using only ~ and & * Example: bitNor(0x6, 0x5) = 0xFFFFFFF8 * Legal ops: ~ & * Max ops: 8 * Rating: 1 */ int bitNor(int x, int y) { } /* evenBits - return a word with all even-numbered bits set to 1 * Legal ops: ! ~ & ^ | + << >> * Max ops: 8 * Rating: 1 */ int evenBits(void) { } use DeMorgan’s Law: ~(A & B) = (~A) | (~B) And ~(A | B) = (~A) & (~B) return (~x & ~y) int byte = 0x55; int word = byte | byte<<8; return word | word<<16;

Bit techniques /* * isNegative - return 1 if x < 0, return 0 otherwise * Example: isNegative(-1) = 1. * Legal ops: ! ~ & ^ | + << >> * Max ops: 6 * Rating: 2 */ int isNegative(int x) { } #ifdef CODE /* * isNotEqual - return 0 if x == y, and 1 otherwise * Examples: isNotEqual(5,5) = 0, isNotEqual(4,5) = 1 * Legal ops: ! ~ & ^ | + << >> * Max ops: 6 * Rating: 2 */ int isNotEqual(int x, int y) { } return (x>>31) & 0x1; return !!(x ^ y); // when not equal must return 1 not some other num

Some Other Uses for Bitvectors Representation of small sets Representation of polynomials: • Important for error correcting codes (see later slides) • Arithmetic over finite fields, say GF(2^n) • Example 0x15213 : x16 + x14 + x12 + x9 + x4 + x + 1 Representation of graphs (intersection of sys & algo) • A ‘1’ represents the presence of an edge Representation of bitmap images, icons, cursors, … • Exclusive-or cursor patent (see later slides) Representation of Boolean expressions and logic circuits

UDP The User Datagram Protocol (UDP) is one of the core members of the Internet protocol suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol (IP) network without prior communications to set up special transmission channels or data paths. The protocol was designed by David P. Reed in 1980 and formally defined in RFC 768. http://en.wikipedia.org/wiki/User_Datagram_Protocol

Transport Layer “no frills,”“bare bones” Internet transport protocol “best effort” service, UDP segments may be: lost delivered out of order to app connectionless: no handshaking between UDP sender, receiver each UDP segment handled independently of others Why is there a UDP? no connection establishment (TCP: 3-way handshake) simple: no connection state at sender, receiver (TCP keeps buffers, sequence no., ack, etc.) small segment header (20 bytes for TCP, 8 for UDP) no congestion control: UDP can blast away as fast as desired UDP: User Datagram Protocol [RFC 768] With UDP applications basically talk directly to IP

Transport Layer UDP checksum Goal: detect “errors” (e.g., flipped bits) in transmitted segment Result: UDP does error detection not error correction 1. may just discard a segment with errors 2. may pass it to the application with a warning.

Transport Layer Sender: treat segment contents as sequence of 16-bit integers checksum: addition (1’s complement sum) of segment contents sender puts checksum value into UDP checksum field Receiver: compute checksum of received segment check if computed checksum equals checksum field value: NO - error detected YES - no error detected. But maybe errors nonetheless? More later …. UDP checksum Goal: detect “errors” (e.g., flipped bits) in transmitted segment

Transport Layer Internet Checksum Example Note • When adding numbers, a carryout from the most significant bit needs to be added to the result sender: add two 16-bit integers 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 wraparound sum Checksum (complement the sum)

Transport Layer Internet Checksum Example Note • When adding numbers, a carryout from the most significant bit needs to be added to the result receiver: add two 16-bit integers plus the checksum: should get all 1’s (why?) 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Do XOR of sum & checksum; should get all 1’s wraparound sum Checksum Result For an explanation of 1’s complement addition see http://mathforum.org/library/drmath/view/54379.html

UDP checksum code (part 1) // make 16 bit words out of every two adjacent 8 bit words and // calculate the sum of all 16 vit words typedef unsigned short u16; typedef unsigned long u32; u16 udp_sum_calc(u16 len_udp, u16 src_addr[],u16 dest_addr[], BOOL padding, u16 buff[]) { u16 prot_udp=17; u16 padd=0; u16 word16; u32 sum; // Find out if the length of data is even or oddnumber. Ifodd, // add a padding byte = 0 at the end of packet if (padding&1==1){ padd=1; buff[len_udp]=0; } //initialize sum to zero sum=0; See http://www.netfor2.com/udpsum.htm for the full code.

UDP checksum code (part 2) // make 16 bit words out of every two adjacent 8 bit words and // calculate the sum of all 16 bit words for (i=0; i < len_udp+padd; i=i+2){ word16 =((buff[i]<<8)&0xFF00)+(buff[i+1]&0xFF); sum = sum + (unsigned long)word16; } // add the UDP pseudo header which contains the IP source and destinationn addresses for (i=0;i<4;i=i+2){ word16 =((src_addr[i]<<8)&0xFF00)+(src_addr[i+1]&0xFF); sum=sum+word16; } for (i=0;i<4;i=i+2){ word16 =((dest_addr[i]<<8)&0xFF00)+(dest_addr[i+1]&0xFF); sum=sum+word16; } // overflow bits are added on the next slide See http://www.netfor2.com/udpsum.htm for the full code.

UDP checksum code (part 3) // the protocol number and the length of the UDP packet sum = sum + prot_udp + len_udp; // keep only the last 16 bits of the 32 bit calculated sum and add the carries while (sum>>16) sum = (sum & 0xFFFF)+(sum >> 16); // Take the one's complement of sum sum = ~sum; return ((u16) sum); } /* here’s where the carry’s are added in. Turns out you can save the overflow bits and add them back at the end */ See http://www.netfor2.com/udpsum.htm for the full code.

XOR patent dispute for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; vidmem[loc] ^= 255; /* XOR */ } // another way of doing this: for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; for (x=0; x < 8; ++x) if (vidmem[loc] & (1 << x)) vidmem[loc] &= ~(1 << x); else vidmem[loc] |= (1 << x); } Closely related is a patent which apparently covers results, not process. The exclusive-or-cursor patent is a simple example of this. There are a number of mathematically equivalent ways of expressing the xor algorithm, but it appears to be widely believed that all of them are covered by the patent. The patent apparently is understood to cover the idea of representing an onscreen cursor by complementing each "visible" pixel of the cursor with the contents of whatever is on screen. This is somewhat surprising, since it is predated by hardware implementations that do the exact same thing for text-only modes--rendering a block or underline cursor so its appearance is the same as above. See http://nothings.org/computer/patents.html for a full discussion

More Bitvector Magic Count the number of 1’s in a word MIT Hackmem 169: int bitcount(unsigned int n) { unsigned int tmp; tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); return ((tmp + (tmp >> 3)) & 030707070707)%63; } • MIT Hackmem Count. The main idea. • 1. Consider a 3 bit unsigned number as being 4a+2b+c, • example: consider the three bit number 101 • then the unsigned number represents 1 x 22 + 0 x 21 + 1 x 20 = 1 x 4 + 0 x 2 + 1 • let “a” represent the leftmost bit (the 1), “b” the next bit (the 0) and c the rightmost bit (the rightmost 1). • then we get a x 4 + b x 2 + c or 4a+2b+c • 2. If we shift the number right 1 bit, we get 010. If we follow the convention of using “abc” for the bits, then after the shift (logical) we get 0ab or 0 x 22 + a x 21 + b x 20 = 2a+b. • 3. Subtracting this shifted number (2a+b) from the original number (4a+2b+c) gives 2a+b+c. 4. If we right-shift the original 3-bit number by two bits, we go from “abc” to “00a” which is just a x 20 • 5. If we subract this new shifted number (a) from the result of 3, we get 2a+b+c – a = a+b+c, 6. Since a, b, and c represent the bits (1, 0, and 1 in our example), a+b+c is just the number of bits in the original number!

More Bitvector Magic Count the number of 1’s in a word MIT Hackmem 169: int bitcount(unsigned int n) { unsigned int tmp; tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); return ((tmp + (tmp >> 3)) & 030707070707)%63; } MIT Hackmem Count. (continued) How is this insight employed in the code? 1. The key insight is that we’re actually going to consider the parameter n to be composed of 11 groups of 3 (3 x 11 = 33, or one more bit than we actually have in a 32-bit number). 2. So we’re going to do the algorithm from the previous page one each three bit group of bits. 3. Consider the bits 101 111 010 To apply the previous algorithm on each group we want to: a. (101 – 010 – 001) (111 – 011 – 001) (010 – 001 – 000) b. If we shift n we get: 010 111 101 Note that the shifted number in each position has a “wrong” first bit (it’s the shifted bit from the previous group) but the correct last two digits. c. We know that the first bit must be 0 since we’re doing logical shift right d. So we can take the result of shifting and mask out the first bit in each group. e. Since each group can be represented by an octal number, the mask for each group will be 3 in octal (011). This explains the line ((n >> 1) & 033333333333). Note that a number beginning in “0” in C is taken as an octal number. f. A similar explanation can be made for the next part of the expression, i.e., to shift each group twice, shift n right twice and mask with 001 (or octal 1) for each group. Now each octal digit in the number n represents the number of 1’s the original number in that position. In our example, the 9 digits become

More Bitvector Magic Count the number of 1’s in a word MIT Hackmem 169: int bitcount(unsigned int n) { unsigned int tmp; tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); return ((tmp + (tmp >> 3)) & 030707070707)%63; } MIT Hackmem Count. (continued) The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. 1. Right-shifting tmp by three bits, 2. adding it to tmp itself and 3. ANDing with a suitable mask. This step saves every other group of 3 bits. We want this because we’ve added each 3-bit group with its right neighbor group; if we save every group we will have counted each group twice. 4. The result is a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n. 5. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970's.

Summary of the Main Points It’s All About Bits & Bytes • Numbers • Programs • Text Different Machines Follow Different Conventions for • Word size • Byte ordering • Representations Boolean Algebra is the Mathematical Basis • Basic form encodes “false” as 0, “true” as 1 • General form like bit-level operations in C • Good for representing & manipulating sets

Bits and Bytes

Bits and Bytes

Presentation Transcript

Bits and Bytes!

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes