INFORMATION REPRESENTATION

INFORMATION REPRESENTATION

There is no known general method how to represent information about objects to get similar level of performance as in biological systems. I think that there are two types of information about objects: - statistical -> distribution of features - structural -> location of features Both types are evaluated and combined in some yet unknown way

Let’s take a look into a very fresh example: FujiFilm's Latest Camera Aims at Dogs, Cats Mar 12, 2010 FujiFilm'sFinepix Z700 features a face-detection function that can recognize cat and dog faces, and it can snap a picture automatically when they look towards the camera lens. When it finds a face, a green box is drawn around it on screen and the camera automatically focuses. In the auto-shooting mode it waits until the animal turns to the camera before taking a picture. It worked well with the stuffed animals but it turns out real dogs and cats can be a little bit trickier. FujiFilm has a list of dog and cat breeds that are easier for its technology to identify. FujiFilm says the technology can also get confused if the animal has a dark coat, if it has large patches around its eyes, a wrinkly nose or hair over their eyes.

Cat Types Recommended for Detection Dog Types Recommended for Detection Detection of dogs with hair covering the eyes, nose or entire face can be difficult. Detection of cats with hair covering the facial contour can be difficult Detection of dogs/cats that have large patches around the eyes or nose (especially black patches) can be difficult. Dection of cats with thin faces can be difficult Detection of blackish dark colored dogs/cats can be difficult.

The question is how such detection algorithms are made. We do not know this (company secret) but we can think that the algorithms must work by identifying locations of basic features: eyes, ears, nose, coat color. But animals have also coat which is different in details but statistically same for same species Here we can see that this dog has specific eye,nose,mouth locations but fur is statistical

At our university we investigate problems in statistical and structural information about objects: How to produce such information? How useful is such information? What is the performance of a system using statistical information only?

What is our approach? We represent features by quantized block DCT transforms or by vectors build from transform coefficients in neighbouring blocks Then we form histogram of blocks

Our approach: We do not know how to describe locations of blocks so.... Let’s think first about GLOBAL content description in which locations are not considered! That is look first into the problem in which only block STATISTICS is considered

Impact of Quantization Distribution of DCT coefficients for typical 8x8 DCT block We can see that higher frequency coefficients are small. If we use strong quantization they will be quantized to zero.

Under strong quantization only first 4x4 block of coefficients will be nonzero. This is equivalent to 4x4 DCT transform. There is another effect too: The greater the quantization the smaller the number of DIFFERENT blocks. In fact, with no quantization, almost every block is different. Quantization is rounding the coefficients to limited number of values.

Coefficients of the 4x4 blocks DC – zero frequency, average light level in the block AC – correspond to different frequencies Quantization by QP [DC]=round[DC/QP] [AC]=round[AC/QP] DC AC ..... ... AC .... ..... ..... Higher QP -> more zeros in the block

Here is an illustration for a picture QP is quantization parameter, we see that as it is increasing the number of DCT patterns is reduced stronlgy

Now we use the following idea: Let’s see how the histogram of the quantized DCT blocks looks! For example, let’s find which blocks appear most often in a picture and create histogram of e.g. first 40 patterns

The shape of this histogram obviously depends on the quantization. If the quantization is low, the histogram will tend to be flat. If the quantization is high it will tend to have a peak.

Let us see example of histograms for two pictures Histograms of two face images

The database retrieval problem based on block histograms Assume we have database D of pictures 1,2,..i,,j..m We take a picture and want to check if it is in the database or if there are similar pictures there. Example: database of passport photographs. In our approach we will use the similarity measure between pictures based on their quantized histograms Histograms are treated as vectors and similarity is based on the following formula: Bi,j= i,j єD

This measure is city-block measure (differences between absolute values of coefficients) and it achieves minimum value = 0. Then two histogram vectors should be identical. The closer the value to zero the more similar pictures should be. Remember that blocks are quantized so noise and nonrelevant features are removed. The question is what is the performance of such scheme but before we can check this, we need to look into the light normalization problem.

Light normalization problem The values of DCT transform coefficients depend on the light level. If the light level is higher the values are higher. If we use the same quantization for two identical pictures with different light levels the quantized blocks will be different. Light level can be normalized. First, let’s calculate average light level for a picture. For this we use values of DC coefficients in blocks Here we get average light level for a picture

Average light level DCallin a database is calculated in the same way based on values of DCmeanfor each picture. Next, the values of light level for each picture are rescaled by the factor of Rescaling makes that the values of coefficients in the quantized blocks will be similar:

The DC coefficients problem At high quantization levels very many blocks will have only DC coefficients. Information about these blocks will be only DC that is what ist the average light level in the block. But of interest is how the average light level is changing between the blocks. We want to use this information. What we make is that we will account for the information in the differences between DC values in neighbouring blocks.

DC differences between blocks In a) we see fragment of a picture in which DC values of the blocks are shown. For each block we have 8 neighbours like shown in b). We calculate 9 differences between the neighbours (8 for directions and 1 for the average from all directions) as shown in c). Now we order the differences and form a vector from first k coefficients as shown in d) for k=4

Combined histogram A combined histogram for AC blocks and DC vectors is now formed H =[ HAC , α xHDC ] where α is a numerical parameter which will be optimized later. Combined histogram means that we have two vectors for minimizing and they are summed with parameter α Bi,j= i,j єD

Optimization of database retrieval The question is: How good can be the database retrieval based on combined histogram? This means e.g. how many errors it will be made. But we can also ask another question: What is the best achievable performance of this approach? Remember that we use only statistical information but we have several parameters which can be selected: - quantization level - size of histograms - parameter α for combining histograms - size of DC difference vectors

Optimization procedure We can check this problem taking some databases and optimizing the parameters for best retrieval. This will show us what is the maximum performance. We did this for face databases using the following scheme:

Evaluation of results Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A. The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR. The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.

Equal Error Rate The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR. From the FAR and FRR, an Equal Error Rate (EER) is achieved when both measures take equal values. The lower the EER is, the better is the system's performance, as the total error rate which is the sum of the FAR and the FRR at the point of the EER decreases. Typical performance of EER histogram for two face databases

Database selection • There are two cases: • Database in which there is only one (standard) • picture of each person • 2. Database in which there are multiple pictures of each • person (and they might very different) • In case 2. the same person should be retrieved for any • of its pictures which can be difficult.

Research database The FERET database contains overall more than 10,000 images from more than 1000 individuals taken in largely varying circumstances. The FERET database images are divided into several sets which are formed to match its methodology of evaluation. Here we made a test based on the sets fa and fb. In both of them, each face has one picture with picture in fb taken seconds after the corresponding picture in fa. The fa set which has size of 994 images and serves as the database, the fb set which has sizes of 992 images, is used as key images for retrieval from the fa.

Evaluation of results FERET is considered difficult database used in evaluation of professional applications: The best EER result is obtained when: QP_AC = 12, number of AC patterns = 400, QP_DC=12, number of Direction-Vector patterns = 400 and α=0.5, γ=4.

FERET methodology of evaluation For FERET there is another methodology based on calculation of how many correct retrievals will be obtained among n trials, n=1,2,…,3.

FERET specific evaluation method FERET evaluation is called cumulative match score. Results are seen for histogram (red) and is overlaid with other known good methods. Rank means how many retrievals are made, one retrieval is most demanding.

For each non-border 4x4 image block, there are eight blocks surrounding it. Such a 3x3 block matrix is utilized here to generate a Binary Feature Vector (BFV). Taking the DC coefficients as an example: the nine DC coefficients within this area form a 3x3 DC coefficient matrix. By measuring and thresholding the magnitude of differences between the non-center DC’s and the central DC coefficient, a binary vector length 8 is formed. Features based on Binary Feature Vectors Two different cases are considered here: Case1: 0 – current coefficient ≤ threshold 1 – current coefficient > threshold Case2: 0 – current coefficient < threshold 1 – current coefficient ≥ threshold Example

DC-BFV Histogram (based on DC coeff.) • AC-BFV Histogram (based on AC coeff.) Example of DC-BFV histogram

Performance results for the Feret database Result is quite good if we take into account that the method uses statistical information only

How about structural information? • Until now we compared pictures based on feature histograms treated as vectors. No information about location of features was taken into account. Thus, features can be located anywhere in the pictures and results would be the same. Results presented are valid for faces since we know that pictures are faces and not some random feature pictures BUT the question is how we should deal with structural information?

INFORMATION REPRESENTATION

INFORMATION REPRESENTATION

Presentation Transcript

Representation of Musical Information

Information Representation

Information Representation: Machine Instructions

Information Representation (Level ISA3)

Information representation and multimedia: Text

Information Representation: Machine Instructions Demo

Information Representation: Characters and Images

Information Representation: Characters and Images

Multimedia Information Representation

Lecture 3 Information Representation

Information Representation

Information Representation

Chapter 2 Multimedia Information Representation

INFORMATION REPRESENTATION AND COMPRESSION

Information Representation in Computer

Information Representation: Machine Instructions

Computer Representation of Information

Binary Representation of Information