1 / 55

Data Representation

Data Representation. Art 311 Dr. J R. Parker Fall 2010. Data Representation. The basic question today is: “Given that a computer only manipulates numbers, how can we represent interesting things like images, sounds, graphics, text, video, and so on”?

felton
Télécharger la présentation

Data Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Representation • Art 311 • Dr. J R. Parker • Fall 2010

  2. Data Representation • The basic question today is: • “Given that a computer only manipulates numbers, how can we represent interesting things like images, sounds, graphics, text, video, and so on”? • The answer differs depending on the type of data.

  3. Data Representation • This subject is basic to creating new things on a computer. • We need to become familiar with the standard methods of representing data. • We need to acquire the skill of inventing new representation for things that we ourselves invent. • So, how would you represent music (IE notes)? More later…

  4. Text • So, we’ve already seen how text is represented, at least briefly. Remember ASCII? • There are 96 printable characters. • 128 characters altogether.

  5. Text • What do we need to consider when building a text representation? • Upper case/lower case • The space character has to come before others to make sorting easy. • non-alphanumeric characters were positioned to correspond to their shifted position on typewriters • The first two columns (32 positions) were reserved for control characters. • The digits 0–9 were placed so they correspond to values in binary prefixed with 011, making conversion with binary-coded decimal straightforward.

  6. Text • This is a typical ASCII table. • You do not need to know it. • An irritating detail is that characters are defined in a base-16 number system called HEXADECIMAL or just HEX. • Why? Allow me to explain.

  7. Hexadecimal • My feeling, after being involved with computers since 1971, is that computer guys are lazy. This is not a bad thing, but motivates much of what they do. • We are so lazy that we will spend days writing a program to do simple things that we’re bored with doing. • Much of the history of computing can be explained by the need to avoid tedious repetitive work using a computer.

  8. Hexadecimal • So, HEX: • All numbers in a computer are binary, or base 2 • 0001 is one • 0010 is two • 0100 is four • 1000 is eight • And so on. Powers of 2, like decimal numbers use powers of ten • Hexadecimal numbers use base 16. • Why is this convenient? I’m getting there.

  9. Hexadecimal • Base 16 is a problem, as we would nee 16 distinct characters as digits. We use letters A,B,C,D,E,F in conjunction with our regular digits. • So 1 is still one … and 9 is still nine. • But A is ten • B is eleven • C is twelve • D is thirteen • E is fourteen • And F is fifteen • Why is this convenient? I’m getting there.

  10. Hexadecimal • Positional number systems use powers of the base. • 160 is 1 • 161 is 16 • 162 is 256 • 163 is 4096 • … • Why is this convenient? I’m getting there.

  11. Hexadecimal • Counting in base 16: • 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,10, 11, • 12,13,14,15,16,17,18,19,1A,1B,`1C,1D, • 1E,1F, 20, … • so 2016 = 3210 • Why is this convenient? I’m getting there.

  12. Hexadecimal • Reminder: why are we doing this? So we can read computer science tables and documents. Like the ASCII table. (I’m teaching you to read!) • Converting: 1216 is 1x16 + 2 = 1810 • 2A116 is 2x256 + 10x16 + 1 = 67310 • Why is this convenient? I’m getting there.

  13. Hexadecimal • Now, 16 is an exact power of 2 (it is 24) • Each hex digit takes exactly 4 binary digits (BITS) to represent in binary. • So converting from hex to binary and back is trivially simple. • Converting hex to binary: replace each hex digit with the binary equivalent • 2 A 1 • 2A116 = 0010 1010 0001 = 0010101000012 (= 67310) • Why is this convenient? I’m getting there.

  14. Hexadecimal • Converting binary to hex: group binary number into sets of 4 digits (bits) and convert those into hex. • So 0110101001010010101 becomes • 011 0101 0010 1001 0101 (group from the right) • 3 A 2 9 5 • 3A9516 = 01101010010100101012 • That’s why this is convenient.

  15. Hexadecimal • Yup, that’s it. • Easy conversion between Hex and Binary, and hex uses many fewer digits. • We can list binary numbers in a lot less space. • That’s why this is convenient. • Let’s move on …

  16. Text • So characters are binary numbers when stored in memory, and they are often coded using ASCII. • A string is a sequence of characters. In a file we can indicate them using quotes: “This is a string” • In memory they are placed in consecutive locations. • Two ways to do this: • Start with an indication of how many characters there are. • Terminate the string with a special character

  17. Text • 16 • T 52 52 • h 104 104 • i 105 105 • s 115 115 • 32 32 • i 105 105 • s 115 115 • 32 32 • a 97 97 • 32 32 • s 115 115 • t 116 116 • r 114 114 • I 105 105 • n 110 110 • g 103 103 • 0 The first string begins With a count. The second ends with a character whose code is 0 – this is a nul character, and thestring is referred to as a nul terminated string.

  18. Dates • ISO 1987-10-12 • IBM USA 10/12/1987 • IBM Europe 12.10.1987 • Unf Julian 1987285 • Julian 87/285 • MDY 10/12/87 • YMD 87/12/10 • DMY 12/10/87 • October 12, 1987 • 12 Oct 87 • Oct 12, 1987 • Etc etc Of the text strings, dates are the hardest to deal with There are many, many ways to display them, and many things we want to do with them.

  19. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console Questions – why do we use date information? Shouldn’t the representation make answering the common questions simple?

  20. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console Store year as 4 digits (avoids the Y2K problem) Do not store month as string. Hard to use that way – store as a number. Store day as number. EG 2012 01 12

  21. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console BUT: each month has a different number of days. This makes differences hard to calculate. Days between Mar 10 and May 12?

  22. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console Days between Mar 10 and May 12 = 63 (not counting last day)

  23. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console The international standard ISO 8601 describes a string representation for dates and times. Two simple examples of this format are 2007-03-04 20:32:17 20070304T203217

  24. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console both stand for the 4th of March 2007, a bit after half past eight in the evening (forgot about time) 2007-03-04 20:32:17 20070304T203217

  25. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console Unix time: The number of seconds elapsed since the beginning of the year 1970. 1172960204.226908

  26. Dates • Has X passed? • Print X in a particular way • How many days since X? • How long until X? • Input X from the console This discussion was started just to show how some simple things can become complicated. We use dates all of the time, but the millennia has made them complex rather than simple.

  27. Text • Printing – give the address of a string to the printer. It converts the numbers (characters) into electronic signals which print the characters (or draw character images onto a page) • Characters each have an image that represents them. It’s called a glyph.

  28. Glyphs • A glyph is a simple graphic. • The letter ‘B’ is drawn as: • The paper is white, and the drawn glyph consists of black ’spots’ drawn by the printer on a 2D mesh or grid. • This is a simple image – more on images later.

  29. Glyphs • The point is that, for any particular size (indicated by how many dots are on each side of the glyph image) a character glyph contains a certain percentage of black. • That can be thought of as how black the glyph is. • This allows us to create images with characters.

  30. Glyphs • For the ‘B’ on the right, there are 11 rows and 10 columns = 110 squares. • Of those, 8+5+4+4+3+6+4+4+4+5+8=55 • 55/110 = 50% • So any spot on an image to be created that is 50% black can be drawn as a ‘B’

  31. Glyphs Darker .'`,^:";~ /|\ -_+<>i!lI? | /\|()1{}[] rcvunxzjft | LCJUYXZO0Q \|/ oahkbdpqwm Lighter *WMB8&%$#@ This is for white characters on a black background. Reverese For printing on paper.

  32. ASCII Images +WWWMMWWX;VBVIVVXRRRMMMWWWWWWWMMWWWWMMMBRRBRRRVi MWWWWBRMBYXVVXI+;;+IIXBWWWWWWWWWWWWWWMMBBBRBBBBMI XWWWWMVRXVt;t+=IXBRRYi=iVMWWWWWWMXVYIYVYVBBBBBBRBMBI ,MWMWBYXXRBR=.=tYVBMMWMV=+RMWWWWBXVIVRRRRViIBBRBXYMWWV MWRRItRBMMMM::+,+ttIVVMM;iBMWWMRRMMWWMBRRRYiVVtVVRBMY MWXtX=tMMMMMIt,.:=tYBMRBBIBWWMMMVIItXBMXVVI+tIiI:YRXMB WMBIiR+YtBBMXRBMMMMMMMBRYRMWMMMMWWXti,.;tItIIYYiRRBBMMV ,MWMMtIRR=,+XBBMMMMMMMMWR:;RWWBRMMWWWMBRXYXBBXYYV+VMWMBMMM VWMMi::BW, IIRMMMMWWWWX+..,I+:iMMWWWWWWMMMBBRXVXYRMMWMBMMX +WMMY::,BX :RMBMMMWWMMRYVVXMWBXVMWWWWWWWMMBRXXXXYIRMBMMBBMR =WMBBi:tt tBBRBBMMMMRYtitYYVXMWWWWWMWWMMBXXVYXVi:VMMMMMMWV RWRV+,;,, XBRRRBBBt,..:+t=+:..+RWWWWWMBBXVVVXXIiI:IVMMBMMMW IMRBRRBMB= YXRRRBBBB,..+YBBBBV...=MWWMBBRRVVXXBYXRRBXBMMBMMMB tMYitIXMi iYXXRXRBRt;.:itt=:=iXMWWWMBBRRVXXRRVRMWWMMMMMMBBB :MR:.,:Ii YVVXRBBBMMRRRBMWMMMMMMMMBBRXXXYY+VMXMWMBMMBBVVRI :RY+ iVRBBBBMMMWWWMMMMMMMBBBBRRXVt.iXYRYXMBBBYItVVt =XVY, ;tVRBBMMMWWWMMMMBMBBBBXVYt iXBMWWMRVXYXMMY RX:, ,,;+tIIVXRBBRXVVIIi=,.= iYYt+;,;iYRMMXt

  33. ASCII Images

  34. ASCII Images

  35. ASCII Line Images

  36. Pictures The pictures we have seen are rows and columns of ascii (characters). Computer images are always stored in that way, but are not ASCII. We have a 2D grid of elements, let’s say boxes, each having A distinct colour or grey level. Like a TV image.

  37. Pictures

  38. Pictures Column 1 2 3 4 5 6 7 8 9 10 Row 1 2 3 4 5 6 4,5 Picture elements (Pixels) are identified by [row, column]

  39. Pictures Picture elements are numbers that indicate a colour or a grey level. EG let 0 be black and 1 be white: Letter ‘T’ 000000000000000000000000000000000000000 000000000000111111111111111111100000000 000000000000111111111111111111100000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000000000000000000000000

  40. Graphics (line drawings) Lines are drawn on a canvas or background of some kind. It has a size. Lines can be defines by specifying the end point, and these can be specified as pixels. So (10,10) (20,20) is a line (segment) between those two pixels. Entire objects can be drawn using these segments alone.

  41. Graphics (line drawings)

  42. Graphics (line drawings)

  43. Sound Computer sound is a sequence of loudness measurements, recorded as electronic levels or voltages, converted into binary, and stored (in order) in a file.

  44. Sound Data is read by bouncing a low-powered laser beam off the reflective coating in the disc. Light hitting a land (a flat area) is reflected back, and picked up by a photosensitive detector. Light hitting a pit is reflected back with far less intensity. 1’s and 0’s.

  45. Video Video is a sequence of pictures, sampled at a known rate. TV is nearly 30 pictures (frames) per second. 35MM film is 24 frames per second. We can use any rate we like.

  46. What other things are there? ANYTHING that a computer manipulates is stored as numbers, and the scheme used to convert to numbers from whatever is called a coding scheme. A codec is short for coder/decoder, and is software that implements The coding scheme.

  47. Code? What do you mean, ‘code’? Video is a series if images that, when displayed rapidly one after the other, give the illusion of motion. Like a ‘flip book’ However, a TV image is 512x512 (just about) = 262K. 1 second = 30 x 262K = 7.86Mbyte 1 minute = 471 Mbyte 1 hour = 28 Gbyte We need to compress the images, and that’s where code/decode comes in.

  48. Code? What do you mean, ‘code’?

  49. Code? What do you mean, ‘code’? Each image can be compressed. JPEG compression can reduce size by a factor of 15 before artifacts can be seen clearly

  50. Code? What do you mean, ‘code’? In a video, we can also compression between consecutive images.

More Related