1 / 86

The second principle of biological processing SEEMS TO BE : MINIMIZATION OF THE AMOUNT OF

MULTIMEDIA SIGNAL PROCESSING ALGORITHMS PART II – MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED AND BASIC ALGORITHMS. The second principle of biological processing SEEMS TO BE : MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED

Télécharger la présentation

The second principle of biological processing SEEMS TO BE : MINIMIZATION OF THE AMOUNT OF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MULTIMEDIA SIGNAL PROCESSINGALGORITHMS PART II – MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED ANDBASIC ALGORITHMS

  2. The second principle of biological processing SEEMS TO BE : MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED THAT IS THE PROCESSING SYSTEM ELIMINATES AS MUCH INFORMATION AS POSSIBLE AND USES ONLY ABSOLUTELY NECESSARY MINIMUM TO ACHIEVE ITS TASKS Why this principle is reasonable? Minimizing information to be processed saves energy, increases speed, reduces effort and is overall logical to do. This is not limited to biology but also applies to technical systems.

  3. INPREVIOUS LECTURES THIS PRINCIPLE WAS EVIDENT SEVERAL TIMES: WE ARE ABLE TO RECOGNIZE OBJECTS BASED ON VERY MINIMAL INFORMATION THIS MEANS PROCESSING SYSTEM IS ABLE TO REDUCE INFORMATION TO MINIMUM OR IN OTHER WORDS TO EXTRACT THE NECESSARY MINIMUM

  4. SO WE CAN HAVE THE MAIN PRINICPLE FOR THIS COURSE : FOR EFECTIVE MULTIMEDIA SIGNAL PROCESSING ONE HAS TO MINIMIZE THE AMOUNT OF INFORMATION PROCESSED, EXTRACT THE ABSOLUTELY NECESSARY MINIMUM FOR THE PROCESSING TASK. HOW TO DO THIS IS NOT ALWAYS CLEAR AND EASY, WE NEED TO STUDY THIS. The second principle, as indicated before can be statistical processing, producing results matched to the most likely signals happening in the real world. But this principle also has to be applied correctly.

  5. NOW LET US GO TO TECHNOLOGY ASSUME WE HAVE COMPUTER SYSTEM: ASSUME WE HAVE COMPUTER WITH CAMERA AND DIGITIZER CARD AND WE WOULD LIKE TO EXTRACT VISUAL INFORMATION ABOUT ENVIRONMENT LIKE OUR EYES DO(OR WE HAVE MICROPHONES AND WE WOULD LIKE TO EXTRACT ACOUSTICAL INFORMATION LIKE OUR EARS DO) HOW WE SHOULD PROGRAM THE COMPUTER?

  6. Let’s think about typical example which is already becoming popular in cameras: We would like to implement algorithms which will mark faces in pictures, recognize familiar faces. This may of course extended to other objects and complete scenes, for example camera would recognize if the picture is taken of familiar building or landscape. The problem is not easy since objects can be seen from different viewpoints, lighting, time. But the input to algorithm which we have is digitized picture

  7. IT IS A MATRIX OF NUMBERS. THE MATRIX SIZE CAN BE EG. 256X256 OR 720x576 – TELEVISION PICTURE 1024X768 - COMPUTER MONITOR 1920x1080- HIGH DEFINITION TELEVISION PICTURE MATRIX ELEMENTS ARE USUALLY 8-BIT NUMBERS, THIS CORRESPONDS TO 256 LEVELS OF LIGHT WHICH IS ENOUGH. COLOR PICTURES ARE DESCRIBED BY THREE SUCH MATRICES FOR EACH BASIC COLOR • WHAT IS THE PICTURE AFTER DIGITIZATION? HERE IS A PICTURE FROM MARS LANDER AND PART OF THE MATRIX NEAR THE OBJECT

  8. WHAT WILL HAPPEN WHEN THE PICTURE RESOLUTION IS TOO SMALL? RESOLUTION WILL BE IMPAIRED LESS DETAILS VISIBLE HERE WE SEE WHAT WILL HAPPEN WHEN RESOLUTION IS REDUCED FROM 512X512 TO 32X32 WHAT IS THE SIZE OF ONE TV PICTURE IN BITS? 720x576x3x8-bit = about 10 Mbits

  9. IMAGES ARE REGISTERED IN THREE BASIC COLOR COMPONENTS: RGB=RED, GREEN, BLUE MIXTURE OF THESE COLORS PROVIDES OTHER COLORS WE HAVE TO USE THREE IMAGE MATRICES TO REPRESENT ONE COLOR PICTURE • TOPIC: COLOR PROCESSING RGB REPESENTATION IS USED FOR DISPLAY, E.G. COMPUTER MONITORS OR TELEVISION PANELS ARE DRIVEN BY R,G,B SIGNALS

  10. COLOR IMAGE AND RGB COMPONENTS

  11. WE OFTEN PERFORM CONVERSION TO MORE SUITABLE COLOR SPACE TWO SUCH SPACES ARE VERY USEFUL: YUV SPACE AND HSV SPACE YUV SPACE : Y – INTENSITY OF (WHITE) LIGHT U, V – COLOR CHROMINANCES TO OBTAIN YUV REPRESENTATION WE TAKE THE R,G,B COLOR MATRICES FOR A PICTURE AND CONVERT THEM BY ->

  12. RGB->YUV TRANSFORMATION NOTE: Y IS BLACK AND WHITE COMPONENT, THAT IS MIXTURE OF R, G, B WHICH GIVES GRADATIONS OF WHITE COLOR, FROM BLACK TO GREY TO WHITE. U AND V ARE COLOR COMPONENTS – DO NOT HAVE PHYSICAL MEANING THUS HERE INTENSITY OF LIGHT IS SEPARATED FROM COLOR INFORMATION

  13. AFTER THIS TRANSFORMATION INSTEAD OF THREE R,G,B MATRICES WE GET THREE MATRICES Y, U, V TRANSFORMATION IS INVERTIBLE SO ALL INFORMATION IS PRESERVED BUT NOW WE CAN PLAY A TRICK: HUMAN VISUAL PROCESSING IS MUCH LESS SENSITIVE TO COLOR INFORMATION THAN TO BLACK AND WHITE LIGHT INTENSITY INFORMATION THUS, MATRICES U,V CAN BE REDUCED IN SIZE

  14. SUBSAMPLING OF MATRICES U AND V FOR 4 ELEMENTS OF Y THERE WILL BE TAKEN ONLY ONE ELEMENT OF U,V Y1 Y2 U U V V ELEMENTS U AND V CAN BE E.G. Y3 Y4 U U V V AVERAGE VALUE OF ORIGINAL 4 ELEMENTS U AND V THUS MATRICES U,V CAN BE REDUCED BY FACTOR OF 4 IN SIZE RETURNING BACK TO RGB FORM WILL NOT CHANGE THE PICTURE VISUALLY

  15. THE RGB->YUV TRANSFORMATION USES DIRECTLY PROPERTY OF HUMAN VISION WHICH ALLOWS: • TO REDUCE THE SIZE OF COLOR IMAGES (IMPORTANT FOR COMPRESSION) - TO USE ONLY LIGHT INTENSITY WITHOUT COLOR INFORMATION (FOR E.G. RECOGNITION OF OBJECTS)

  16. HSI IS MORE RELATED TO HUMAN PERCEPTION WHERE WE CAN SEE SATURATION OF COLORS THAT IS WE CAN TELL ”REDNESS”, ’BLUENESS’ OF COLORS AND SO ON. TO GET THE HSI REPRESENTATION WE MAP RGB INTO H – HUE (COLOR) S – SATURATION (AMOUNT OF WHITE MIXED WITH COLOR) I - INTENSITY (AMOUNT OF GREY LEVEL EQUATIONS FOR HSI FROM RGB AND VICE VERSA: • ANOTHER TRANSFORMATION IS HSI

  17. BASIC ASPECTS OF THE HSI REPRESENTATION: ON A CUBE THERE ARE SOME OTHER ’BASIC’ COLORS APART OF RGB, MAIN DIAGONAL IS THE AMOUNT OF WHITE ON THE DIAMOND WE SEE COLORS AROUND HEXAGON HEIGHT IS AMOUNT OF WHITE, SATURATION IS X-AXIS LOOK WHERE IS THE I (V) AXIS, S AXIS AND HUE ANGLE

  18. HSI TRANSFORMATION IS USEFUL SINCE WE GET REPRESENTATION IN COLOR SPACE WHICH CORRESPONDS TO THE PROPERTY OF HUMAN VISION, THAT IS INTENSITY LEVEL CAN BE ESTIMATED. COLOR SATURATION, AND THE COLOR ITSELF.

  19. DIGRESSION ON COLOR SENSORS ASSUME YOU BUY DIGITAL CAMERA WITH E.G. 5 MEGAPIXELS. WHAT DOES THIS MEAN? IT TURNS OUT THAT THE PIXEL DEFINTION IS DIFFERENT FOR DIFFERENT APPLICATIONS. TRADITIONALLY 1 PIXEL = R, G, B COLOR COMBINATIONS SO WE NEED 3 COLOR SENSORS FOR CAMERA OR 3 COLOR ELEMENTS FOR DISPLAY

  20. FOR EXAMPLE: LCD COMPUTER MONITOR WITH RESOLUTION OF 1280X1024 PIXELS HAS 1280X1024 ELEMENTS FOR EACH R, G, B COLOR, THAT IS IT HAS 1280X1024X3 DISPLAY ELEMENTS. THE DISPLAY ELEMENTS ARE CALLED SUBPIXELS, ONE PIXEL IS COMPOSED OF THREE SUBPIXELS R G B

  21. IN DIGITAL CAMERAS THIS ISDIFFERENT SENSOR IN DIGITAL CAMERAS LOOKS LIKE THIS: IN DIGITAL CAMERAS EVERY COLOR SUBPIXEL COUNTS AS ”PIXEL” THE PIXELS ARE ARRANGED IN A MATRIX CALLED BAYER SENSOR EACH ”CAMERA” PIXEL IS MADE BY 4 COLOR PIXELS: 1 RED, 2 GREEN, 1 BLUE (REMEMBER THAT MOST OF VISIBLE LIGHT IS GREEN) PIXEL 1 PIXEL 2 WE CAN NOTICE THAT ”FULL” COLOR PIXEL CAN BE MADE FROM OVERLAPPING SQUARES BY HALF SHIFT

  22. SO THE E.G. 5 MILION PIXELS IN DIGITAL CAMERA IS NOT EXACTLY 5 MILION IN THE DISPLAY SENSE. IT SHOULD BE DIVIDED BY 4 OR BY TWO IF WE TAKE INTO ACCOUNT INTERPOLATION BUT THERE ARE TWO EXCEPTIONS: THERE ARE VIDEO CAMERAS WHICH HAVE 3 CCD SENSORS SEPARATELY ONE FOR EACH R,G,B COLORS

  23. IN 3 CCD VIDEO CAMERAS OPTICALSYSTEM SPLITS LIGHT INTO 3 SENSORS WHICH PICKUP R,G,B COLORS. TOTAL NUMBER OF PIXELS CORRESPONDS TO THE NUMBER OF PIXELS IN DISPLAY ANOTHER EXCEPTION IS FOVEON SENSOR IN FOVEON, THERE IS ONE SENSOR BUT IT MEASURES ALL 3 RGB COLORS IN ONE AREA THIS IS BASED ON THE FACT THAT PHOTONS GO TO DIFFERENT DEPTHS IN THE SEMICONDUCTOR DEPENDING ON THEIR WAVELENGHTS www.foveon.com

  24. COMPARISON: WE CAN SEE THAT SINGLE SENSOR DEVICES HAVE LOWER RESOLUTION THAN 3 SENSOR DEVICES OR FOVEON. BUT THEY ARE EASIEST TO PRODUCE SO THE NUMBER OF THEIR COLOR PIXELS IS INCREASING ALL THE TIME AND RESOLUTION PROBLEM IS SOLVED.....

  25. The elimination of information based on color is an example of much more general principle: Elimination of information Input signal Output signal, representation of the input signal which is ”just good enough” for specific task How to produce the ”good enough” representation is the essential problem to solve Next we will show example of representation by edges

  26. EDGE DETECTION LINEAR FILTERING: AREA AROUND EVERY POINT IN THE IMAGE MATRIX IS MULTIPLIED z l m u x v n p q BY VALUES FROM ANOTHER MATRIX AND RESULT IS SUMMED UP

  27. DEPENDING ON THE MATRIX BY WHICH WE MULTPILY WE HAVE SEVERAL TYPES OF FILTERS: LOW PASS – SUM OF FILTER COEFFICIENTS IS ONE BANDPASS – SUM OF FILTER COEFFICIENTS IS ZERO HIGPASS - SUM IS BETWEEN ZERO AND ONE

  28. WE SAID THAT IN HUMAN VISUAL SYSTEM IN THE RETINA PROCESSING ELEMENTS ARE SENSITIVE FOR CHANGES IN LIGHT LEVEL. THIS IS EQUIVALENT TO BANDPASS FILTERING SPECIAL CLASS OF BANDPASS FILTERS IS CALLED EDGE DETECTORS SINCE THEY ARE DESIGNED TO DETECT SHARP CHANGES IN IMAGE LIGHT INTENSITY

  29. LET US CONSIDER THE FOLLOWING SITUATION – WHITE BAR ON BLACK BACKGROUND OR OPPOSITE OUR VISUAL SYSTEM AND WE HERE ARE INTERESTED MOSTLY IN AREAS WHERE LIGHT IS CHANGING IT VALUE, SHARP CHANGES IN LIGHT VALUE ARE CALLED EDGES

  30. HOWEVER, THERE IS A PROBLEM HERE: WHAT EXACTLY IS SHARP CHANGE IN INTENSITY? THIS IS NOT WELL DEFINED ON THE RIGHT WE SEE SOME EXAMPLES OF LIGHT CHANGE: RAMP EDGE – LIGHT INCREASING GRADUALLY STEP EDGE – SHARP TRANSITION NARROW LINE ROOF EDGE THERE COULD BE MANY MORE SUCH EXAMPLES!

  31. EDGE DETECTION IS EQUIVALENT TO DIFFERENTIATION IN CONTINUOUS FUNCTION DOMAIN if F(x,y)=const BUT IN IMAGES WE HAVE LIMITED NUMBER OF PIXELS SO WE CAN PERFORM ONLY APPROXIMATE DIFFERENCING

  32. EDGE DETECTORS HERE WE HAVE TWO MATRICES OF FILTERS FOR DIFFERENCING NOTE THAT THE FIRST ONE WILL PROVIDE ZERO OUTPUT WHEN THERE ARE CONSTANT VALUES IN VERTICAL DIRECTION AND SECONDE WHEN THERE ARE IN HORIZONTAL DIRECTION

  33. NOW LET’S TAKE THE OUTPUTS OF BOTH FILTERS AND COMBINE THEM TOGETHER, FOR EXAMPLE BY THE OUTPUT WILL NOW BE QUITE INDEPENDENT FROM THE DIRECTION OF EDGES NOTE THAT GC/GR IS EQUIVALENT TO THE DIRECTION OF AN EDGE

  34. HERE WE HAVE EXAMPLE OF RESULTS: • ORIGINAL PICTURE • HORIZONTAL DETECTOR • VERTICAL DETECTOR • BOTH COMBINED AS WE CAN SEE THE COMBINED OUTPUT GIVES BORDERS OF OBJECTS SO WE CAN RECOGNIZE IT EVEN IF THERE IS LITTLE INFORMATION THIS MAY CORRESPOND IN SOME WAY TO HOW HUMAN SYSTEM WORKS

  35. THERE CAN BE MANY SUCH MATRICES USED, SOME OF THEM ARE SHOWN HERE, AND MANY OTHERS ARE KNOWN THEY DIFFER IN PROPERTIES AND OPERATION IN NOISE E.G. PREWITT, SOBEL ARE GOOD • WHY WE USED JUST SUCH MATRIX FOR EDGE DETECTION?

  36. IF WE TALK ABOUT OPERATION IN NOISY IMAGES, THERESHOLDING IS IMPORTANT AFTER RUNNING A DETECTOR WE GET OUTPUT SIGNAL. UNFORTUNATELY THIS CAN BE MADE BY NOISE, NOT BY EDGE. EDGE DETECTORS CAN BE SENISITVE TO NOISE. WE THRESHOLD THE OUTPUT SIGNAL IF IT IS > THAN SOME VALUE T IT IS CLASSIFIED AS EDGE

  37. HERE OPERATION OF EDGE DETECTOR IN NOISY CONDITIONS WITH THRESHOLDING IS SHOWN: AT LOW NOISE LEVEL IT IS GOOD AT HIGHER NOISE LEVEL, WE GET SOME NOISE POINTS CLASSIFIED AS EDGES, AND SOME EDGE POINTS ARE MISSING (WE SEE GOOD EDGE) AT VERY HIGH NOISE LEVEL, THE DETECTOR OPERATION BREAKS UP COMPLETELY AND NO EDGE IS DETECTED NOTE THAT WE CAN SEE SOME EDGE IN THIS PICTURE

  38. SO IN NOISY CONDITIONS THERE ARE PROBLEMS WITH EDGE DETECTORS BUT SOMEHOW IN HUMAN VISION THEY WORK VERY WELL – HOW??? RESEARCHERS MOTIVATED BY HUMAN VISION NOTICED THAT FILTERING ELEMENTS IN HUMAN RETINA AT THE BACK OF THE EYE ARE MORE COMPLICATED THAN SIMPLE DETECTORS HERE.

  39. MOTIVATED BY OBSERVATION OF HUMAN SYSTEM AND SOME CONSIDERATION OF OPTIMAL NOISE ATTENUATION A ZERO-CROSSING, OR LAPLACIAN-OF-GAUSSIAN DETECTOR WAS DESIGNED THIS DETECTOR IS OBRAINED BY TAKING SECOND DERIVATIVE OF GAUSSIAN CURVE The resulting curve has characteristic ’Mexican’ hat shape NOW IF WE TAKE SECOND DERIVATIVE OF THE OUTPUT, WE NOTICE THAT EDGE IS WHEN SIGNAL CROSSES ZERO !

  40. ZERO CROSSING EDGE DETECTOR WILL BE BETTER IN NOISY CONDITIONS BUT IT IS MORE COMPLICATED SINCE IT REQUIRES MUCH MORE OPERATIONS FOR CALCULATION Assuming the we have such detector the next problem is how to build representation based on edges and this is shown next

  41. LINKING EDGE POINTS TO FORM CONTOURS OF OBJECTS: WE LINK OUPUT POINTS FROM EDGE DETECTOR WHEN THEIR VALUES ARE SMILAR: - SIMILARITY MEANS - AMPLITUDE DIFFERENCE IS SMALLER THAN SOME THRESHOLD - ANGULAR DIRECTION IS SIMILAR LINKED EDGES ARE THOUGHT TO BELONG TO SAME OBJECT

  42. HORIZONTAL DETECTOR ORIGINAL PICTURE • EXAMPLE RESULT OF EDGE LINKING VERTICAL DETECTOR

  43. SEGMENTATION HOW TO EXTRACT OBJECTS FROM PICTURES? THIS CAN BE DONE BASED ON FEATURES SUCH AS INTENSITY OR COLOR

  44. WE CAN GROUP AREAS WITH SPECIFIC FEATURES BY LINKING THEM TOGETHER IF TWO AREAS HAVE THE SAME FEATURE WE LINK THEM TOGETHER SEGMENTATION ALGORITHM START WITH SOME AREA AND DIVIDE IT IN FOUR PARTS, CONTINUE DIVISION UNTIL ONLY PARTS WITH SPECIFIC FEATURE ARE KEPT

  45. THRESHOLDING WE NEED TO DIFFERENTIATE BETWEEN THE ’USEFUL’ DATA AND ’NONEUSEFUL’ THRESHOLDING WORKS ON THE PRINCIPLE THAT USEFUL SIGNAL IS STRONGER. IF SIGNAL < T WE SET IT TO ZERO. HOW TO SELECT T?

  46. FOR THRESHOLDING, HISTOGRAM CAN BE USED SINCE IT OFTEN PROVIDES VIEW HOW OBJECT AND BACKGROUND CAN BE SEPARATED HOWEVER, FULLY AUTOMATIC THERSHOLDING IS DIFFICULT SINCE NOISE AND OBJECT LIGHT INTENSITIES MAY BE NOT COMPLETELY SEPARATED IF THE THRESHOLD IS SELECTED HERE WE CAN SEPARATE BACKGROUND AND OBJECT

  47. FEATURE DETECTION FEATURES ARE SMALL PARTS OF OBJECTS WHICH ARE CRITICAL FOR RECOGNITION AND REPRESENTATION FEATURES

  48. THIS CORNER IS COMPOSED OF TWO LINES • HOW TO DETECT FEATURES? THIS IS QUITE DIFFICULT PROBLEM. FEATURES ARE OFTEN COMPOSED OF SHORT LINE SEGMENTS, E.G. CORNERS WE CAN THINK TO APPLY EDGE DETECTOR AND THRESHOLDING FOR FINDING FEATURES CORNER EDGE MMSP Irek Defée

  49. FOR COMPACT REPRESENTATION WE HAVE TO ELIMINATE ALL NONRELEVANT SIGNAL ELEMENTS.THIS IS TASK SIMILAR TO MEDIA COMPRESSION MEDIA COMPRESSION HAS A GOAL TO MINIMIZE DESCRIPTION OF MEDIA WHILE PRESERVING PERCEPTUAL QUALITY. THIS IS ALSO IMPORTANT TO GENERAL MULTIMEDIA SIGNAL PROCESSING SINCE IT MINIMIZES THE AMOUNT OF INFORMATION TO BE PROCESSED.

  50. MEDIA SIGNAL IS A STREAM OF BITS • HOW TO REDUCE THE • NUMBER OF BITS NEEDED • FOR THE DESCRIPTION? • THIS CAN BE DONE IN 2 • WAYS: • MORE EFFICIENT • DESCRIPTION OF BITSTREAM • ELIMINATING PERCEPTUALLY • INSIGNIFICANT INFORMATION • Technically this is called compression of information :

More Related