Management Information Systems

CLARK UNIVERSITY College of Professional and Continuing Education (COPACE) Management Information Systems Lection 06 Archiving information

Plan • Coding of numeric information • Coding of textual information • Coding of graphical information • Archiving of information • Shannon-Fano coding • Huffman coding

Basic terms • Coding is the converting the message to the code, that is, to the set of symbols transmitted by the communication channel

Coding of numeric information • Binary encodingused in computing, based on the representation of data sequence of two characters: 0 and 1. • These signs are called binary digits, in English -binary digit, or, in short, bit (bit).

Coding of numeric information Onebit can be represent two numbers: 0or1(yes or no, true or false, etc.). If the number of bits is increased totwo, we can represent four different numbers: 00 01 10 11 Threebits can encode eight different values: 000 001 010 011 100 101 110 111

Coding binary data The general formula is: N = 2i where N - number of independent coded values; i - bit binary code.

Coding of binary integers Principle: Integer is divided in a half, while the reminder is not either zero or one. The set of reminders from each division, written from right to left with the last reminder forms a binary equivalent of a decimal number.

Example 19 : 2 = 9 + 1 9 : 2 = 4 + 1 4 : 2 = 2 + 0 2 : 2 = 1 So, 1910 =10112

Coding of binary integers • To encode the integers from 0 to 255 it is enough to have8 bits. • 16-bitcoding is used for integers from 0 to 65535 • 24 bitsare used for more than 16.5 million numbers.

Coding of textual information • If each letter of the alphabet matches a certain integer, then we can use the binary code for the encoding the textual information. • Eightbits are sufficient to encode256different characters.

Coding of textual information U.S. Standards Institute (ANSI - American National StandardInstitute) has put in place a system of encoding ASCII (American Standard Code for Informational Interchange - American Standard Code for Information Interchange).

Coding of textual information • There are two encoding tables in ASCII: basic (symbols with numbers 0 - 127) and extended one (128 - 255).

The extended ASCII character set

Windows 1251 character set

Coding of textual information • The use of multiple concurrent encoding happen due to the limited set of codes (256). • The character set based on a 16-bit character encoding, called universal - UNICODE. • It contains the unique codes for 65536 different characters. • The transition to this system was limited by the insufficient resources of computingfor a long time

Coding of graphical information • Graphic image is made up of tiny dots (pixels)which form a grid called a raster.

Example • increasing in seven times

Coding of graphical information • Pixels with only two possible colors (black and white) can be encoded by two numbers - 0 or 1.So, it is necessary to use only 1 bit. • For black and white illustrations it is generally accepted coding with 256 shades of gray.How many bits do we need then?

Example

Coding of graphical information • The color image on the screen is obtained by mixing three primary colors: red (Red) green (Green) blue (Blue)

Coding of graphical information

Coding of graphical information • While encoding color images, the principle of decomposition of any color on the basic components is used. • Such a coding system is called RGB. • If for the encoding of each of the main components of color it is used 256 bits, then the system provides 16777216different colors.

Archiving of information • Data archiving is the process of converting the information stored in a file to the form which reduces redundancy in its representation and thus requires less space for storage

Archiving of information • Archiving(packing) movement of the source files into an archive file in a compressed format • Decompression(unpacking) is the process of recovering files from the archive in the exact form which they had before archiving

Archiving of information The aims: • accommodation in a more compact form on the disk • reduction of time (or cost) of the transmission of information through communication channels • simplification of transferring files from one computer to another • protection from unauthorisedaccess

Archiving of information • One of the first archiving method was proposed in 1844by Samuel Morsein the coding system of Morse code. • Frequent characters are coded in shorter sequences

Archiving of information • In the 40-ies of the XX century the founder of the modern information theoryShannonand in independency with him Fanodeveloped a universal algorithm for constructing optimal codes.There is an analogue of this algorithm which was proposed byHuffman. • The principle of this algorithm is the encoding of frequently occurring characters by shorter sequences of bits.

Archiving of information • In the 70's of the XX century LempelandZiv proposed algorithms LZ77 and LZW. • The algorithm finds the repeated sequences and replace some numbers instead of these sequences according to the dynamically generated dictionary. • Most modern archives (WinRar, WinZip) are based on the variations of the Lempel-Ziv algorithm.

Archiving of information where Kc – the coefficient of the compressed file, Vc – the volume of the compressed file, Vr – the volume of the resource file. The degree of the compression depends on the archiving program, the method and the type of source file

Archiving of information • The degree of compression for graphical, text and data files is 5-40%. • The degree of compression for executable files is 60-90%. • The degree of compression for archived files is 90-100%.

Archiving of information • The self-extracting archive fileis the boot executable module which is able to self-unzip contained files without using the archiver. • Big archive files can be divided into severaltoms.

Shannon-Fano coding

Develop a list of probabilities or frequency counts • Sort the lists of symbols according to frequency • Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. • The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. • Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has a code.

Huffman coding

Huffman coding • A source generates 4 different symbols with probability. • A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. • The process is repeated until there is just one symbol. • The tree can then be read backwards, from right to left, assigning different bits to different branches.

Management Information Systems