Bits and Bytes

Bits and Bytes COMP 21000 Comp Org & Assembly Lang Topics • Why bits? • Representing information as bits • Binary / Hexadecimal • Byte representations • Numbers • Characters and strings • Instructions • Bit-level manipulations • Boolean algebra • Expressing in C

How then do we represent data? • Each different “type” in a programming language has its own representation. • Next few slides discuss the representation for the common types: • Ints • Pointers • Floats • Strings

Alpha C Linux/Alpha A Linux C Linux/Alpha B Sun B Sun C Sun A 00 FF 00 93 6D 6D 6D 00 FF 00 3B C4 3B 3B 3B C4 3B 00 FF 00 00 93 6D 6D 00 00 FF 00 00 00 00 00 Linux/Win/OS X: little endian Solaris: big endian Representing Integers Decimal: 15213 Binary: 0011 1011 0110 1101 Hex: 3 B 6 D int A = 15213; int B = -15213; long int C = 15213; Two’s complement representation (Covered next lecture)

D4 EF A0 FF F8 FC FF FB FF BF 2C FF 01 00 00 00 Linux/Win/OS X: little endian Solaris: big endian Alpha P Representing Pointers int B = -15213; int *P = &B; Alpha Address Hex: 1 F F F F F C A 0 Binary: 0001 1111 1111 1111 1111 1111 1100 1010 0000 Sun P Sun Address Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100 Linux P Linux Address Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100 Different compilers & machines assign different locations to objects

Linux/W in/OS X F Solaris F 46 00 6D IEEE Single Precision Floating Point Representation Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 15213 1110 1101 1011 01 (in binary) B4 B4 6D 00 46 Representing Floats Float F = 15213.0; IEEE Single Precision Floating Point Representation Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 15213: 1110 1101 1011 01 IEEE Single Precision Floating Point Representation Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 15213: 1110 1101 1011 01 Not same as integer representation, but consistent across machines Can see some relation to integer representation, but not obvious

31 31 35 35 32 32 31 31 33 33 00 00 Representing Strings char S[6] = "15213"; Strings in C • Represented by array of characters • Each character encoded in ASCII format • Standard 7-bit encoding of character set • Character “0” has code 0x30 • Digit i has code 0x30+i • String should be null-terminated • Final character = 0 Compatibility • Byte ordering not an issue • Text files generally platform independent • Except for different conventions of line termination character(s)! • Unix (‘\n’ = 0x0a = ^J) • Mac (‘\r’ = 0x0d = ^M) • DOS and HTTP (‘\r\n’ = 0x0d0a = ^M^J) Linux/Win/ OS X S Solaris S

ASCII Chart ASCII ‘0’ is 0x30 Reference: http://jimprice.com/jim-asc.htm

Characters Other popular encoding standards: • Unicode • 16 bit code • Allows more characters: most languages can be encoded • See http://en.wikipedia.org/wiki/Unicode • The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2 • UTF-8 • uses one byte for any ASCII character, • These bytes have the same code values in both UTF-8 and ASCII encoding, • and up to four bytes for other characters • ISO Latin-8 • 8-bit encoding standard with dotted consonants • International standard for Latin letters

Machine-Level Code Representation Encode Program as Sequence of Instructions • Each instruction is a simple operation • Arithmetic operation • Read or write memory • Conditional branch • Instructions encoded as bytes • Alpha’s, Sun’s, Mac’s use 4 byte instructions • Reduced Instruction Set Computer (RISC) • PC’s (and intel Mac’s) use variable length instructions • Complex Instruction Set Computer (CISC) • Different instruction types and encodings for different machines • Most code is not binary compatible Programs are Byte Sequences Too!

Alpha sum Solaris sum PC sum 55 81 01 00 08 90 45 89 C3 00 80 0C 02 89 E5 E0 FA 03 00 EC 30 08 8B 45 6B 09 42 5D C3 Representing Instructions int sum(int x, int y) { return x+y; } • For this example, Alpha & Sun use two 4-byte instructions • Use differing numbers of instructions in other cases • PC uses 7 instructions with lengths 1, 2, and 3 bytes • Same for NT, OSX (intel) and for Linux • NT / Linux/OSX not fully binary compatible Different processors use totally different instructions and encodings

What do we know now? A computer is a processor and RAM. • Everything else is peripheral Everything is 1’s and 0’s. • A processor executes instructions • Instructions are encoded in 1’s and 0’s • each instruction has binary representation • RAM stores 1’s and 0’s • We interpret these 1’s and 0’s based on context • Every type is interpreted from binary • integers: binary • float: special representation in binary • char: binary (ASCII encoding)

How do we get to bits? We write programs in an abstract language like C How do we get to the binary representation? Compilers & Assemblers! We write x = 5. The compiler changes it into mov 5, 0x005F The assembler changes it into: 80483c7: 8b 45 5F

Manipulating bits Since all information is in the form of bits, we must manipulate bits when programming at low levels • We did some of this with the show_bytes program. • To understand how to do more, we must understand Boolean algebra • We can manipulate bits from C; this is where C and hardware meet.

And • A&B = 1 when both A=1 and B=1 Or • A|B = 1 when either A=1 or B=1 Not • ~A = 1 when A=0 Exclusive-Or (Xor) • A^B = 1 when either A=1 or B=1, but not both Boolean Algebra Developed by George Boole in 19th Century • Algebraic representation of logic • Encode “True” as 1 and “False” as 0

A&~B ~A&B A ~B ~A B Application of Boolean Algebra Applied to Digital Systems by Claude Shannon • 1937 MIT Master’s Thesis • Reason about networks of relay switches • Encode closed switch as 1, open switch as 0 Connection when A&~B | ~A&B = A^B

Representing & Manipulating Sets Representation • Width w bit vector represents subsets of {0, …, w–1} • aj = 1 if jA 01101001{ 0, 3, 5, 6 } 76543210 01010101{ 0, 2, 4, 6 } 76543210 Operations • & Intersection 01000001 { 0, 6 } • | Union 01111101 { 0, 2, 3, 4, 5, 6 } • ^ Symmetric difference 00111100 { 2, 3, 4, 5 } • ~ Complement 10101010 { 1, 3, 5, 7 }

General Boolean Algebras Operate on Bit Vectors • Operations applied bitwise • Let x = 105 and y = 85 (think about how this is actually stored) x & y: x | y: x ^ y ~y All of the Properties of Boolean Algebra Apply 01101001 & 01010101 01000001 01101001 | 01010101 01111101 01101001 ^ 01010101 00111100 ~ 01010101 10101010 01000001 01111101 00111100 10101010

Bit-Level Operations in C Operations &, |, ~, ^ Available in C • Apply to any “integral” data type • long, int, short, char, unsigned • View arguments as bit vectors • Arguments applied bit-wise Examples (Char data type) • ~0x41 --> 0xBE ~010000012 --> 101111102 • ~0x00 --> 0xFF ~000000002 --> 111111112 • 0x69 & 0x55 --> 0x41 011010012 & 010101012 --> 010000012 • 0x69 | 0x55 --> 0x7D 011010012 | 010101012 --> 011111012

Contrast: Logic Operations in C Contrast to Logical Operators • &&, ||, ! • View 0 as “False” • Anything nonzero as “True” • Always return 0 or 1 • Early termination Examples (char data type) • c = ‘A’ then !c is !0x41 --> 0x00 • c = ‘\0’ then !c is !0x00 --> 0x01 • c = ‘A’ the !!c is !!0x41 --> 0x01 • 0x69 && 0x55 --> 0x01 • 0x69 || 0x55 --> 0x01 • p && *p++ (avoids null pointer access) • Or p && (*p = 10) also protects Works because of short-circuit evaluation in C

Bits and Bytes

Bits and Bytes

Presentation Transcript

Bits and Bytes!

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes

Bits and Bytes