Data Representation

Chapter 2 Data Representation 天津大学软件学院

2.1 DATA TYPES(数据类型) Data today come in different forms such as numbers, text, images, audio, and video. People need to process all these data types. The computer industry uses the term“multimedia” to define information that contains numbers, text, images, audio, and video. number：数值 text：文本 image：图像 audio：音频 Video：视频 Multimedia:多媒体

Analog and Digital (模拟和数字)Information • Computers are finite（有限的）. Computer memory and other hardware（硬件）devices have only so much room to store and manipulate a certain amount of data. The goal of data representation（数据表示）is to represent enough of the world to satisfy our computational needs and our senses of sight and sound.

Analog and Digital Information • Information can be represented in one of two ways: analog or digital. Analog data A continuous representation, analogous to the actual information it represents. Digital data A discrete representation, breaking the information up into separate elements.

Analog and Digital Information A mercury thermometer exemplifies analog data as it continually rises and falls in direct proportion to the temperature. Digital displays only show discrete（离散的）information.

2.2 DATA INSIDE THE COMPUTER All data types from outside a computer are transformed into a uniform representation when stored in a computer and then transformed back when leaving the computer. This universal format is called a bit pattern（位组合格式）. BIT（位） A bit (binary digit) is the smallest unit of data that can be stored in a computer; it is either 0 or 1. BIT PATTERN（位组合格式） A bit pattern is a sequence, or as it is sometimes called, a string of bits that can represent a symbol. e.g. BYTE（字节） A bit pattern of length 8 is called a byte.

Examples of bit patterns

2.3 REPRESENTING DATA TEXT（文本） A piece of text in any language is a sequence of symbols used to represent an idea in that language. You can represent each symbol with a bit pattern. In other words, text such as “BYTE”, which is made of four symbols, can be represented as 4 bit patterns, each pattern defining a single symbol.

Number of Symbols--------------------- 2 4 8 16 … 128 256 … Bit Pattern Length--------------------- 1 2 3 4 … 7 8 … How many bits are needed in a bit pattern to represent a symbol in a language? The length of the bit pattern that represents a symbol in a language depends on the number of symbols used in that language. More symbols mean a longer bit pattern. The relationship is not linear; it is logarithmic. If you need n symbols, the length is log2n bit.

Codes（编码） Different sets of bit patterns have been designed to represent text symbols. Each set is called a code, and the process of representing symbols is called coding. ASCII The American National Standards Institute (ANSI) developed a code called American Standard Code for Information Interchange (ASCII)（美国信息交换标准代码）. This code uses 7 bits for each symbol. This means 128 different symbols can be defined by this code. e.g.

ASCII CODEAmerican Standard Code for Information Interchange

The Unicode Character Set（统一的字符编码标准,采用双字节对字符进行编码） Figure 3.6 A few characters in the Unicode character set

AUDIO（音频） Audio is converted to digital data, then we can use bit patterns to store them. Audio is by nature analog data. It is continuous (analog), not discrete (digital). sampling:采样 quantization:量化 Coding：编码 WAV, AU, AIFF, VQF, and MP3...

Images today are represented in a computer by one of two methods: bitmap graphic or vector graphic. Bitmap Graphic（位图） In this method, an image is divided into a matrix of pixels(picture elements), where each pixel is a small dot. The size of the pixel depends on what is called the resolution. After dividing an image into pixels, each pixel is assigned a bit pattern. The size and the value of the pattern depend on the image. IMAGES（图像） pixel：像素 resolution：分辨率

To represent color images, each colored pixel is decom-posed into three primary colors: red, green, and blue (RGB). Then the intensity of each color is measured, and a bit pattern (usually 8 bits) is assigned to it. In other words, each pixel has three bit patterns: one to represent the intensity of the red color, one to represent the intensity of the green color, and one to represent the intensity of the blue color. BMP, GIF, JPEG, PNG, TIFF, XBM, and PCX three primary colors:三基色

Digitized Images

Vector Graphic（矢量图） The vector graphic method does not store the bit patterns. An image is decomposed into a combination of curves and lines. Each curve or line is represented by a mathematical formula. For example, a line may be described by the coordinates of its endpoints, and a circle may be described by the coordinates of its center and the length of its radius. The combination of these formulas is stored in a computer. When the image is to be displayed or printed, the size of the image is given to the system as an input. The system redesigns the image with the new size and uses the same formula to draw the image. In this case, each time an image is drawn, the formula is reevaluated. WMF, PICT, EPS, SVG, SWF, and TrueType fonts curve：曲线， mathematical formula：数学公式

Representing Video To simulate motion, movies need to record (and play back) at least 12 frames per second. However, good sound quality requires 24 frames/s. 24 frames/s = 1440 frames/minute= 46400 frames/hour video：视频 frame：祯 If each frame has a resolution of 1024 x 768*there are 786,432 pixels in a frame. If the colour of each pixel is stored as 24 bits (3 bytes) of data, one frame alone requires 2,359,296 bytes (2 MB) of memory. An hour of film then, requires 203,843,174,400 bytes (194,400 MB – more than 190 Gigabytes) of storage – just for the images.

Data Compression(数据压缩) • It is important that we find ways to store and transmit data efficiently, which leads computer scientists to find ways to compress it. • Data compressionis a reduction in the amount of space needed to store a piece of data. • Compression ratio is the size of the compressed data divided by the size of the original data. Data compression :数据压缩 Compression ratio：压缩比

Data Compression • A data compression technique can be • lossless, which means the data can be retrieved without any loss of the original information, • lossy, which means some information may be lost in the process of compaction. • As examples, consider these 3 techniques: • keyword encoding （关键字编码） • run-length encoding （行程长度编码） • Huffman encoding （霍夫曼编码） Lossless：无损 Lossy：有损

SUMMARY • Numbers, text, images, audio, and video are all forms of data. Computers need to process all types of data. • All data types are transformed into a uniform representation called a bit pattern for processing by computers. • A bit is the smallest unit of data that can be stored in a computer. • A bit pattern is a sequence of bits that can represent a symbol. • A byte is 8 bits.

SUMMARY (continued) • Coding is the process of transforming data into a bit pattern. • ASCII is a popular code for symbols. • Images use the bitmap graphic or vector graphic method for data representation. The image is broken up into pixels which can then be assigned bit patterns. • Audio data are transformed to bit patterns though sampling, quantization, and coding. • Video data are a set of sequential images.

EXERCISES • 2-1；2-2； • 2-11；2-12；2-13；2-14；2-15 • 2-23；2-24；2-25；2-26；2-27 • 2-34；2-35；2-36；2-37；2-38；2-39

Chapter 3 Number Representation 天津大学软件学院

Number System • The Decimal system is based on 10，0-9； • The binary system is based on 2，0-1； • Octal notation is based on 8，0-7； • Hexadecimal notation is based on 16 ，0-9，A-F。 Decimal system:十进制 binary system：二进制 Octal notation：八进制 Hexadecimal notation：十六进制

3.1 DECIMAL AND BINARY Two numbering systems are dominant today in the world of computers: decimal and binary. DECIMAL SYSTEM

BINARY SYSTEM The binary system is based on 2. There are only two digits in the binary system, 0 and 1.

OCTAL NOTATION Octal notation is based on 8. This means there are 8 symbols: 0,1,2,3,4,5,6,7. Bit Pattern------------ 000 001 010 011 Oct Digit------------ 0 1 2 3 Bit Pattern------------ 100 101 110 111 Oct Digit------------ 4 5 6 7

Binary to octal and octal to binary transformation

Bit Pattern------------ 0000 0001 0010 0011 0100 0101 0110 0111 Bit Pattern------------ 1000 1001 1010 1011 1100 1101 1110 1111 Hex Digit------------ 0 1 2 3 4 5 6 7 Hex Digit------------ 8 9 A B C D E F HEXADECIMAL NOTATION Hexadecimal notation is based on 16 (hexadec is Greek for 16). This means there are 16 symbols (hexadecimal digits): 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E, and F. Each hexadecimal digit can represent 4 bits, 4 bits can be represented by a hexadecimal digit.

CONVERSION（转换） Converting from a bit pattern to hexadecimal is done by organizing the pattern into groups of four and finding the hexadecimal value for each group of 4 bits. For hexadecimal to bit pattern conversion, convert each hexadecimal digit to its 4-bit equivalent. Hexadecimal notation is written in two formats. In the first format, you add a lowercase (or uppercase) x before the digits. For example, xA34; In another format, you indicate the base of the number (16) as the subscript after the notation. For example, (A34)16.；A34H

Example 1 Solution Show the hexadecimal equivalent of the bit pattern 1100 1110 0010. Each group of 4 bits is translated to one hexadecimal digit. The equivalent is xCE2.

Example 2 Solution Show the hexadecimal equivalent of the bit pattern 0011100010B. Divide the bit pattern into 4-bit groups (from the right). In this case, add two extra 0s at the left to make the number of bits divisible by 4. So you have 000011100010, which is translated to 0E2H.

Example 3 Solution What is the bit pattern for x24C? Write each hexadecimal digit as its equivalent bit pattern to get 001001001100.

3.2 CONVERSION BINARY TO DECIMAL CONVERSION Start with the binary number and multiply each binary digit by its weight. Since each binary bit can be only 0 or 1, the result will be either 0 or the value of the weight. After multiplying all the digits, add the results.

Example 1 Solution Convert the binary number 10011 to decimal. Write out the bits and their weights. Multiply the bit by its corresponding weight and record the result. At the end, add the results to get the decimal number. Binary 1 0 0 1 1Weights 16 8 4 2 1 ------------------------------------- 16 + 0 + 0 + 2 + 1Decimal 19

DECIMAL TO BINARY CONVERSION To convert from decimal to binary, use repetitive division. division：除法 quotient:商 remainder:余数

Example 2 Solution Convert the decimal number 35 to binary. Write out the number at the right corner. Divide the number continuously by 2 and write the quotient and the remainder. The quotients move to the left, and the remainder is recorded under each quotient. Stop when the quotient is zero. 0  1  2  4  8  17  35 Dec. Binary 1 0 0 0 1 1

3.3 INTEGER REPRESENTATION（整数表示法） Integers are whole numbers (i.e., numbers without a fraction). An integer can be positive or negative. −∞ ←− 0 −→ +∞ To use computer memory more efficiently, two broad categories of integer representation have been developed: unsigned integers and signed integers . Signed integers may also be represented in three distinct ways. Integer：整数 fraction:分数 unsigned integer：无符号整数 signed integer:带符号整数

Number of Bits ------------------- 8 16 Range------------------------------------- 0 . . . 255 0 . . . 65,535 UNSIGNED INTEGERS FORMAT An unsigned integer is an integer without a sign. Most computers define a constant called the maximum unsigned integer. An unsigned integer ranges between 0 and this constant. The maximum unsigned integer depends on the number of bits the computer allocates to store an unsigned integer. Range: 0 ... (2N-1) N is the number of bits allocated to represent one unsigned integer.

Example 3 Solution Representation Storing unsigned integers is a straightforward process as outlined in the following step: 1. The number is changed to binary. 2. If the number of bits is less than N, 0s are added to the left of the binary number so that there is a total of N bit. Store 7 in an 8-bit memory location（存储单元）. First change the number to binary 111. Add five 0s to make a total of N (8) bits, 00000111. The number is stored in the memory location.

Example 4 Solution Store 258 in a 16-bit memory location. First change the number to binary 100000010. Add seven 0s to make a total of N (16) bits, 0000000100000010. The number is stored in the memory location.

Decimal------------ 7 234 258 24,760 1,245,678 8-bit allocation ------------ 00000111 11101010 overflow overflow overflow 16-bit allocation ------------------------------ 0000000000000111 0000000011101010 0000000100000010 0110000010111000 overflow Overflow（溢出） If you try to store an unsigned integer such as 256 in an 8-bit memory location, you get a condition called overflow.

Example 5 Solution Interpretation How do you interpret an unsigned binary representation in decimal? The process is simple. Change the N bits from the binary system to the decimal system. Interpret 00101011 in decimal if the number was stored as an unsigned integer. Using the procedure shown in Figure 3.3 , the number in decimal is 43.

SIGNED INTEGERS FORMAT SIGN-AND-MAGNITUDE FORMAT（原码） • In sign-and-magnitude representation • the leftmost bit defines the sign of the number. • If it is 0, the number is positive. • If it is 1, the number is negative positive:正数 negative:负数

Range -------------------------------------------------------------- Number of Bits ------------------- 8 16 32 -127 -0 -32767 -0 -2,147,483,647-0 +0 +127 +0 +32767 +0 +2,147,483,647 SIGN-AND-MAGNITUDE FORMAT Range: -(2N-1-1) —— +(2N-1-1) There are two 0s in sign-and-magnitude representation: positive and negative. In an 8-bit allocation: +0  00000000 -0  10000000

Representation Storing sign-and-magnitude integer is a straightforward process: 1. The number is changed to binary; the sign is ignored. 2. If the number of bits is less than N-1, 0s are added to the left of the number so that there is a total of N-1 bits. 3. If the number is positive, 0 is added to the left (to make it N bits). If the number is negative, 1 is added to the left (to make it N bits).

Example 6 Solution Store +7 in an 8-bit memory location using sign-and-magnitude representation. First change the number to binary 111. Add four 0s to make a total of N-1 (7) bits, 0000111. Add an extra zero because the number is positive. The result is: 00000111

Example 7 Solution Store –258 in a 16-bit memory location using sign-and-magnitude representation. First change the number to binary 100000010. Add six 0s to make a total of N-1 (15) bits, 000000100000010. Add an extra 1 because the number is negative. The result is: 1000000100000010

Decimal------------ +7 -124 +258 -24,760 8-bit allocation ------------ 00000111 11111100 overflow overflow 16-bit allocation ------------------------------ 0000000000000111 1000000001111100 0000000100000010 1110000010111000 Interpretation How do you interpret a sign-and-magnitude binary representation in decimal? The process is simple: 1. Ignore the first (leftmost) bit. 2. Change the N-1 bits from binary to decimal as shown at the beginning of the chapter. 3. Attach a + or a – sign to the number based on the leftmost bit.

Data Representation