Chapter 3 Numeral System and Data Representation

Chapter 3Numeral System andData Representation 國立聯合大學　電子工程學系蕭裕弘

Chapter Goals • 介紹不同的數字系統 • 說明不同數字系統之間的轉換方法 • 介紹二進位的算數運算 • 說明類比訊號與數位訊號的差異 • 介紹電腦系統常用的數字系統與編碼方式 • 介紹電腦系統常用的資料表示法

1. Numeral Systems • A numeral is a symbol or group of symbols that represents a number. • 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • I, II, III, IV, V, VI, VII, VIII, IX, X, ... • A numeral system (or system of numeration) is a framework where a set of numbers are represented by numerals in a consistent manner. • Number system: • A set of objects on which arithmetic operations can be performed. • E.g.: the real numbers, the rational numbers

Types of Numeral Systems - 1 • The unary numeral system • Every natural number is represented by a corresponding number of symbols. • E.g.: If the symbol $ is chosen, then the number seven would be represented by $$$$$$$. • The unary notation can be abbreviated by introducing different symbols for certain new values. • E.g.: if $ stands for one, % for ten and # for 100, then the number 304 can be compactly represented as ###$$$$ and number 123 as #%%$$$.

Types of Numeral Systems - 2 • The positional system: • A system in which each position has a value represented by a unique symbol or character. • For each position, the resultant value of each position is the value of that character multiplied by a power of the base number for that numeral system. • The position of each character or symbol (usually called a digit) counting from the right determines the power of the base that is to be multiplied by that digit. 0123456789

Decimal Numeral System • Decimal is the base 10 numeral system: • The symbols 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 are used • The decimal point • The sign symbols + (plus) and − (minus) = (2 * 103) + (6 * 102) + (7 * 101) + (4 * 100) 12.345 = (1 * 101) + (2 * 100 ) + (3 * 10-1) + (4 * 10-2) + (5 * 10-3)

K Base Numeral System • The symbols 0, 1, 2, ..., K-1 are used. NK = (dpdp-1~d1d0.d-1d-2d-(q-1)d-q)K N10 = (dp * Kp) + (dp-1 * Kp-1) + ... + (d1 * K1) + (d0 * K0) + (d-1 * K-1) + (d-2 * K-2) + ... + (d-(q-1) * K-(q-1)) + (d-q * K-q) dp: Most significant digit d-q: Least significant digit

Binary Numeral System • The binary numeral system is a system for representing numbers in which a radix of two is used; that is, each digit in a binary numeral may have either of two different values. • Typically, the symbols 0 and 1 are used to represent binary numbers. • Owing to its relatively straightforward implementation in electronic circuitry, the binary system is used internally by virtually all modern computers.

The Octal and Hexadecimal Numeral Systems

2. Convert Binary to and from Decimal System • Binary  Decimal • 101102 = 1 * 24 + 1 * 22 + 1 * 21 = 2210 • 10.112 = 1 * 21 + 1 * 2-1 + 1 * 2-2 = 2.7510 • Decimal  Binary 0.75 * 2 1.50 * 2 1.00 0.7510 = 0.112 2 22 0 2 11 1 2 5 1 2 2 0 2 1 1 0

Convert Octal to and from Decimal System • Octal  Decimal • 7238 = 7 * 82 + 2 * 81 + 3 * 80 = 46710 • 7.238 = 7 * 80 + 2 * 8-1 + 3 * 8-2 = 7.17187510 • Decimal  Octal 0.3125 * 8 2.5000 * 8 4.0000 0.321510 = 0.248 8 467 3 8 58 2 8 7 7 0

Convert Hexadecimal to and from Decimal System • Hexadecimal  Decimal • AB16 = A * 161 + B * 160 = 17110 • A.816 = A * 160 + 8 * 16-1 = 10.510 • Decimal Hexadecimal 16 171 11 B 16 10 10 A 0

Conversion among Base 2, 8, 16 • Octal  Binary • 5762.138 = 101 111 110 010.001 0112 • Binary  Octal • 11 010 111.101 12 = 327.548  011 010 111.101 1002 • Hexadecimal  Binary • E8C4.B16 = 1110 1000 1100 0100.10112 • Binary  Hexadecimal • 10 1101 0111 1010.1110 012 = 2D7A.E416 8 2 16

3. Binary Arithmetic - Addition • 0 + 0 = 0 • 0 + 1 = 1 • 1 + 0 = 1 • 1 + 1 = 10 (the 1 is carried) 1 1 1 1 (carry) 0 1 1 0 1 13 + 1 0 1 1 1 23 1 0 0 1 0 0 36 1 1 (carry) 1 . 0 1 1.25 + 0 . 1 1 0.75 1 0 . 0 0 2.00

Binary Arithmetic - Subtraction • 0 - 0 = 0 • 0 - 1 = 1 (with borrow) • 1 - 0 = 1 • 1 - 1 = 0 * (borrow) 1 . 1 0 1 1.625 - 0 . 0 1 1 0.375 1 . 0 1 0 1.250 * * * * (borrow) 1 1 0 1 1 1 0 110 - 1 0 1 1 1 23 1 0 1 0 1 1 1 87

Binary Arithmetic - Multiplication • 0 * 0 = 0 • 0 * 1 = 0 • 1 * 0 = 0 • 1 * 1 = 1 1 0 1 0 10 * 1 0 2 0 0 0 0 1 0 1 0 1 0 1 0 0 20 1.0 1 1.25 * 1 0 2 0 0 0 1 0 1 1 0.1 0 2.50

Binary Arithmetic - Division 11001 (25) 1001 11101001 (233) (9) 1001 1011 1001 0100 1000 10001 1001 1000 (8)

4. Analog and Digital Information • Analog signal • A signal that has a continuous nature rather than a pulsed or discrete nature. • Digital signal • A signal in which discrete steps are used to represent information.

Advantages and Disadvantages of Digitization • The advantages of digitization • reliable high-speed signal transmission • quality duplication • easy manipulation and processing • The primary disadvantage of digital signals is their large size resulting in high-storage requirements.

Analog-to-Digital Conversion • The continuous signal is usually sampled at regular intervals by an analog to digital converter (ADC) and the value of the continuous signal in that interval is represented by a discrete value.  Sampling

Why Do We Use Binary? • Modern computers are designed to use and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent on of two possible values. V 1 0 On Off T V T

Data and Computer • Computers are multimedia devices, dealing with a vast categories of information: • Numbers • Text • Images and graphics • Audio • Video

5. Representing Integer Data • In computer science, the term integer is used to refer to any data type which can represent some subset of the mathematical integers. • The most common representation of a positive integer is a string of bits, using the binary numeral system. • Four different ways to represent negative numbers in a binary numeral system: • Signed-magnitude • One’s complement • Two’s complement • Excess N xxxx xxxx (x: 0 or 1) 可用來表示 0 ~ (28 – 1) = 255

Signed-Magnitude Representation • In mathematics, signed numbers in some arbitrary base is done in the usual way, by prefixing it with a "-" sign. However, on a computer, there is no single way of representing a number's sign. • One may first approach this problem of representing a number's sign by allocating one bit to represent the sign: • Set that bit (often the most significant bit) to 0 for a positive number. • Set that bit to 1 for a negative number. • The remaining bits in the number indicate the (positive) magnitude. Sign bit 0111 1111  +127 0000 0000  + 0 1000 0000  - 0 1111 1111  -127 -2N-1 + 1  2N-1 - 1 缺點： 1. 有 +0 與 -0 2. X – Y  X + (-Y)

One's Complement Representation • The 1's complement representation in binary of a positive integer is no different from the sign-magnitude representation of that integer. • The 1's complement in binary of a negative integer is obtained by subtracting its magnitude from 2n -1 where n is the number of bits used to store the integer in binary. 0111 1111  +127 0000 0000  + 0 1111 1111  - 0 1000 0000  -127 * Convert -36 in a byte to 1's complement form Step 1: convert the magnitude of the integer to binary +3610 = 0010 01002 Step 2: 111111112 (28 - 1) - 001001002 1111 1111 - 0010 0100 1101 1011 -2N-1 + 1  2N-1 - 1

Two’s Complement Representation - 1 • With two's complement notation, all integers are represented using a fixed number of bits with the leftmost bit given a negative weight. • E.g.: • 1001 00102 = -1 * 27 + 1 * 24 + 1 * 21 = -128 + 16 + 2 = -11010 • 1000 00002 = -1 * 27 = -12810 • 1111 11112 = -110 0111 1111  +127 0111 1110  +126 ... 0000 0010  + 2 0000 0001  + 1 0000 0000  + 0 1000 0000  -128 1000 0001  -127 ... 1111 1110  - 2 1111 1111  - 1 -2N-1 2N-1 - 1

Advantages of Two's Complement Representation • It's easy to negate any integers: simply complement each bit and add 1 to the result. • The left most bit tells you if the integer is positive (0) or negative (1). • The normal rules used in the addition of (unsigned) binary integers still work (throw away any bit carried out of the left-most position).  只需利用加法電路即可執行加法與減法。 * Convert -36 in a byte to 2's complement form Step 1: convert the magnitude of the integer to binary +3610 = 0010 01002 Step 2: complement each bit 0010 0100 => 1101 1011 Step 3: Add I to the result 1101 1011 + 1 1101 1100

Excess-N Representation • This is a representation that is primarily used in floating point numbers. • It uses a specific number as a base. Under excess-N, a standard number representation is 'shifted' downwards such that the number 0 is represented as N as a binary number. • For example the Excess-3 representation for 3 bits is as left:

Comparison of Different Representations

Addition (5 + (-5)) Subtraction 35 - 15 = 35 + (-15) Calculating Two's Complement +35 => 0010 0011 +15 => 0000 1111 -15 => 1111 0000 + 1 => 1111 0001 0010 0011 (+35) + 1111 0001 (-15) 1 0001 0100 (20) +5 => 0000 0101 -5 => 1111 1010 + 1 => 1111 1011 0000 0101 (+5) + 1111 1011 (-5) 1 0000 0000 (0) discard discard X - Y = X + (-Y)

Common Integral Data Types

Arithmetic Overflow • In a digital computer, the condition that occurs when a calculation produces a result that is greater than a given register or storage location can store or represent. • E.g.: In 8-bit 2’s complement representation 0111 1111 (+127) + 0000 0001 (+1) 1000 0000 (-128) Positive + Positive  Negatives Negative + Negative  Positive 1000 0011 (-126) + 1000 0001 (-127) 10000 0100 (+4)

Binary coded decimal (BCD) 2421 6. Other Numeral Systems - 1 Weighted Code Weighted Code Self-complementing Code

84-2-1 Biquinary code (二五碼) Other Numeral Systems - 2 Weighted Code Self-complementing Code

0 00 = 0 0 01 = 1 0 11 = 2 0 10 = 3 1 10 = 4 1 11 = 5 1 01 = 6 1 00 = 7 0 1 G1 = Gn+1 = {0 Gn, 1 Gnref} G1 = {0, 1}, n >= 1 G3 = Other Numeral Systems - 3 • Gray code • A code assigning to each of a contiguous set of integers, or to each member of a circular list, a word of symbols such that each two adjacent code words differ by one symbol. • There can be more than one Gray code for a given word length, but the term was first applied to a particular binary code for the non-negative integers, the binary-reflected Gray code or BRGC. 0 0 = 0 0 1 = 1 1 1 = 2 1 0 = 3 G2 =

7. Floating-Point Representations • A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. • In particular, it represents an integer or fixed-point number (the significand or, informally, the mantissa) multiplied by a base (usually 2 in computers) to some integer power (the exponent). • When the base is 2, it is the binary analog of scientific notation (in base 10). • A floating-point number a can be represented by two numbers m and e, such that a = m × be. • m is a p digit number of the form ±d.ddd...ddd (each digit being an integer between 0 and b−1 inclusive). • If the leading digit of m is non-zero, then the number is said to be normalized. • Some descriptions use a separate sign bit (s, which represents −1 or +1) and require m to be positive.

IEEE Floating-Point Standard (IEEE 754) • The IEEE floating-point standard (IEEE 754) is an IEEE standard, used by many CPUs and FPUs, which • defines formats for representing floating-point numbers; • representations of special values (i.e., zero, infinity, very small values (denormal numbers), and bit combinations that don't represent a number (NaN)); • five exceptions, when they occur, and what happens when they do occur; • four rounding modes; • a set of floating-point operations that will work identically on any conforming system. • IEEE 754 specifies four formats for representing floating-point values: • single-precision (32-bit) • double-precision (64-bit) • single-extended precision (>= 43-bit, not commonly used) • double-extended precision (>= 79-bit, usually implemented with 80 bits). • Only 32-bit values are required by the standard, the others are optional.

IEEE 754 – Single-Precision • A binary floating-point number is stored in a 32 bit word. • The set of possible data values can be divided into the following classes: • Zeroes: Exp: 0, Fraction: 0 • Normalised numbers: Exp: 1-254 (bias + 127), Fraction: any • Denormalised numbers: Exp: 0, Fraction: non zero • Infinities: Exp: 255, Fraction: 0 • NaN (Not a Number): Exp: 255, Fraction: non zero 1 8 23 S Exponent (e) Mantissa or fraction (f) 31 30 23 22 0 Value = s * m * 2e-127 s = 1 if S = 0; s = -1 if S = 1. m = 1.f.

IEEE 754 – Single-Precision - Examples • 10.510 = 1010.12 • -0.510 = -0.12 S Exponent Mantissa 0 1000 0010 0101 0000 0000 0000 0000 000 + 3=130-127 1.0101 S Exponent Mantissa 1 0111 1110 0000 0000 0000 0000 0000 000 - -1=126-127 1.0000

IEEE 754 – Double-Precision 1 11 52 S Exponent (e) Mantissa or fraction (f) 63 62 52 51 0 Value = s * m * 2e-1023 s = 1 if S = 0; s = -1 if S = 1. m = 1.f.

Problems with Floating-Point • Floating-point numbers usually behave very similarly to the real numbers they are used to approximate. However, this can easily lead programmers into over-confidently ignoring the need for numerical analysis. • Errors in floating-point computation can include: • Rounding • Non-representable numbers: for example, the literal 0.1 cannot be represented exactly by a binary floating-point number • Rounding of arithmetic operations: for example 2/3 might yield 0.6666667 • Absorption: 1×1015 + 1 = 1×1015 • Cancellation: subtraction between nearly equivalent operands • Overflow, which usually yields an infinity • Underflow • Invalid operations (such as an attempt to calculate the square root of a non-zero negative number). Invalid operations yield a result of NaN (not a number).

8. The Hierarchy of Data Organization 0 或 1 A (ASCII = 65) John John, 20, Male John, 20, Male Mary, 21, Female File1, file2, … Bit 位元 Character 字元 Data field 資料欄位 Data record 資料記錄 File 檔案 Database 資料庫

Bit and Bytes • 位元 (bit) • 在數位電腦系統中，所有資料都是由一組位元 (bit) 所組成的。每個位元的值可以是 0 或 1。 • Bit = Binary digit 的縮寫。 • 位元組 (byte) 或字元 (character) • 由於位元所能代表的內容只有 0 或 1，為了讓人們更方便記憶與溝通，於是將 8 個位元 (bits) 組合成一個位元組 (byte)，並以位元組作為資料處理的基本單位。 • 為了讓位元的組合能用於代表人類所能瞭解的資料，因此人們設計了多種編碼系統 (encoding system)，以建立位元組合與字元 (character) 的對應方式。 • 常見的字元編碼方式大都採用位元組容量的倍數來處理，如一個位元組 (28 = 256) 或是兩個位元組 (216 = 65536)。

From Data Field to Database • 資料欄位 (data field) • 由數個字元或位元組所組成讓資料擁有意義的最低邏輯單位。 • 例如：姓名欄位、年齡欄位、性別欄位等。 • 資料記錄 (data record) • 由數個相關的資料欄位所組成可用來描述一個是件或項目的資料單位。 • 例如：由姓名欄位、年齡欄位、性別欄位所組成的學生資料記錄。 • 檔案 (file) • 由數筆相關資料記錄所組成的資料單位。 • 例如：由同班學生之資料記錄所組成的班級資料檔案。 • 資料庫 (database) • 由相關之檔案所組合成的資料單位。檔案之間會利用一些技術建立邏輯關係。

儲存設備常用單位 • Byte: 8 bits • KB: Kilobyte = 210 bytes = 1024 bytes (KiB) • MB: Megabyte = 220 bytes = 1,048,576 bytes (MiB) • GB: Gigabyte = 230 bytes = 1,073,741,824 bytes (GiB) • TB: Terabyte = 240 bytes = 1,099,511,627,776 bytes (TiB) • PB: Petabyte = 250 bytes = 1,125,899,906,842,624 bytes (PiB) • EB: Exabyte = 260 bytes = 1,152,921,504,606,846,976 bytes (EiB) Byte  KB  MB  GB  TB  PB  EB Word: A group of one or more bytes.

9. Representing Text • To represent a text document in digital form, we need to be able to represent every possible character that may appear. • There are finite number of characters to represent, so the general approach is to list them all and assign each a binary string. • A character set is a particular mapping between characters and binary strings.

常見的字元系統 • 美國標準資訊交換碼 (ASCII, American Standard Code for Information Interchange) • 編碼長度：8 bits (早期為 7 bits) • 編碼內容：鍵盤上可見到的英文字母、阿拉伯數字與符號，還有一些控制字元。 • Big-5 • 編碼長度：16 bits • 編碼內容：常用的中文字與符號。 • Unicode • 編碼長度：16 bits • 編碼內容：世界上常見的文字符號。 A B 聯 C D 合

ASCII Examples • Control characters (0 ~ 31, 127) • Printable characters (32 ~ 126)

Big-5 Code • Big-5 code • 電腦處理繁體中文字的編碼系統 • 使用 16 bits 高位元組低位元組線上範例

Unicode • The Unicode Standard is the universal character encoding standard used for representation of text for computer processing. • The original goal was to use a single 16-bit encoding that provides code points for more than 65,000 characters. • The Unicode Standard defines codes for characters used in all the major languages written today. 綉  綉

Chapter 3 Numeral System and Data Representation

Chapter 3 Numeral System and Data Representation

Presentation Transcript

Data Representation System XML

Chapter 3 Data Representation part2

Chapter 3 Data Representation

Chapter 2: Data Representation

Chapter 3 Data Representation

Data Mining Chapter 3 Output: Knowledge Representation

Data Representation – Chapter 3

Chapter 3 Data Representation

Chapter 3 : Data Representation

Chap. 3 Data Representation

Data Representation 3

Chapter 3 Data Representation

Chapter 3 Representation

Chapter 2: Data Representation

Data Representation – Chapter 3

Chapter 3 Data Representation

Chap. 3 Data Representation

Chapter 3 Data Representation Text Characters

Chapter 3 Data Representation

Chapter 3 : Data Representation

Week 3: Data Representation: Negative Numbers READING: Chapter 3