Model Selection in Signal, Image, and Pattern Analysis

Model Selection in Signal, Image, and Pattern Analysis Mário A. T. Figueiredo Instituto de Telecomunicações e Departmento de Engenharia Electrotécnica e de Computadores, Instituto Superior Técnico Universidade Técnica de Lisboa PORTUGAL

What is model selection? 2 is too low... 12 is too high... 4 seems ok... Some experimental data... ...to which we want to fit a polynomial. Question: which order?

What is model selection? “overfitting” “underfitting” ok... How to identify the underlying trend of the data, ignoring the noise?

Outline 1. What is model selection ? 2. Introduction and some motivating examples 3. Bayesian model selection 4. Penalized likelihood and minimum description length (MDL) 5. The incompleteness of MDL 6. Model selection without order penalties: sparseness priors 7. Concluding remarks Signal/image/pattern analysis examples shown along the way...

Motivating example Goal: fit some (parametric) function Unknown model parameters: Usual minimum mean squared error estimate, for fixed k Parameter space of orderk, e.g., y Observed data x k is not necessarily the number of parameters, just some “model order”

Motivating example No, if the parameter spaces are nested: ...that is, for any there exists a such that Can we use the minimized criterion for model selection? Example: for every quadratic polynomial, there is an equivalentthird order polynomial; simply set to zero the extra coefficient.

Motivating example for the polynomial fitting example Parameter spaces are nested: The residual sum of squares (RSS) is non-increasing with k

Examples in signal/image/pattern analysis inference spline representationof a contour, with k control points Contour estimation y observed image Model selection question: how many control points ? (how simple/smooth?)

Examples in signal/image/pattern analysis inference image segmentation into k regions Image segmentation y observed image Model selection question: how many regions ?

Examples in signal/image/pattern analysis inference 400 600 200 0 location and parameters of the k segments Change point detection (or signal segmentation) y observed sequence/signal Model selection question: how many segments ?

Examples in signal/image/pattern analysis inference parameters of the kclusters/components Model-based clustering (with finite mixtures) y observed points (features) Model selection question: how many clusters/components ?

Probabilistic formulation Maximum likelihood (ML) estimate, for fized k With nested parameter spaces, Unknownquantities/parameters: Observed data: Observation model (likelihood function): ...thus the maximized likelihood is useless for model selection.

Bayesian formulation and k Likelihood function: A priori knowledge (prior): Posterior (Bayes law): Recall that an optimal Bayes decision minimizes the posterior expected loss: Loss incured when we decide and the truth is Unknowns:

Bayesian model selection Bayesian model selection rule: Under the usual 0/1 loss, Maximum a posteriori(MAP) model selection 1. Don’t care about , just k

Bayesian model selection marginal likelihood When comparing only two models: Priorodds ratio Bayesfactor Posteriorodds ratio x = Maximum a posteriori (MAP) model selection:

Bayesian model selection 2. Choosing the prior In general, can’t use improper non-informative priors Difficulties with Bayesian model selection: 1. Computing the marginal likelihood: which is only feasible analytically in very simple cases. Alternatives: Markov Chain Monte Carlo, variational approximations, Laplace approximation (more later).

Bayesian model selection: Bernoulli example Model 1 (same source): number of ones Model 2 (two sources): Nested: binary sequence Question: do both subsets come from the same source, or not?

Bayesian model selection: Bernoulli example Marginal likelihoods: Non-informative priors:

Bayesian model selection: Bernoulli example Decision criterion, for 0 50 0 A fully objective Bayes-optimal decision. No user-defined thresholds. Choose model 2 if 50

Bayesian model selection: Laplace-type approximations Laplace-type approximation: k-order maximum likelihood estimate Bayesian inference criterion (BIC), [Schwartz, 1978]. with k with k penalizes larger k With flat prior, constant In general, hard to compute Conditions: smooth prior, regularity of the likelihood, large n Order-penalized maximum likelihood.

BIC: Example estimate truth 3 Back to the polynomial toy example

Penalized likelihood criteria If Only depends on the model dimension Structural risk minimization type criteria,i.e., choose the best model from each class (k),select among these models. General form of penalized likelihood criteria: Complexity penalty Many instances: BIC, Cp, NIC, AIC, SIC, MDL, MML, NML, ...

Penalized likelihood criteria If does not depends only on k Regularization-type criteria,the selected model from each class (k) is not the corresponding ML estimate. General form of penalized likelihood criteria: Complexity penalty

The minimum description length (MDL) criterion good model encoder Rationale: short code long codebad model compressed data code lengthmodel adequacy Several flavors: [Rissanen 1978, 1987] [Rissanen 1996] [Wallace and Freeman, 1987] decoder Introduction: observed data

Formalizing the MDL criterion However, both k and are unknown to the decoder observed data encoder coded data decoder MDL criterion: extract Given , the shortest codelength for is (Shannon, 1948) A maximum penalized likelihood criterion

MDL criterion: coding the parameters Real-valued parameters: how to code with a finite number of bits ? truncated to finite precision Under regularity conditions, can be shown that optimal precision, when MDL = BIC This is the “standard” MDL; there are more recent/refined versions (more later).

Image analysis example: contour estimation observed image y a statistical model for the inside, observation mechanism ...another one for the outside other parameters Examples: Gaussians of different means and variances; Rayleigh of different variances (ultrasound images); different textures. [Figueiredo, Leitão, 1992], [Figueiredo, Leitão, Jain, 1997, 2000]. contour description

Spline contour representation contourcontrol points contour control points matrix with periodic B-spline basis functions Fewer control points simpler (smoother) shape Model selection:k = ?

Image analysis example: contour estimation Each q has two coordinates Another interpretation: We only want to estimate the contour with pixel precision pixel precision for Can show: pixel precision for number of image pixels Can be thought of as the “natural code-length” can be obtained by an iterative (EM-type) algorithm. [Figueiredo, Leitão, Jain, 1997, 2000]. MDL criterion for contour estimation:

Some results k = 9 k = 13 k = 7 same variance, different means k = 11 description length k = 5

The importance of the region-based model initialization same mean, different variances This contour could never be estimated withan edge-based (snake-type) approach.

Example on a real (brain MR) image

More examples on real medical images

Poisson field segmentation A Poissonian image Model:k regions/segments of constant mean: A sequence of Poisson counts (e.g., astronomical data) Model selection question:k = ?

Poisson field segmentation: MDL’s incompleteness 1. Given y andk (ML estimate) 2. Code and send 3. Standard MDL: now code y according to We are building codes for any possible y, ...but only y leading to are possible. Let’s consider the standard MDL approach Incompleteness! [Rissanen, 1996]

MDL’s incompleteness: Bernoulli example 1. Transmitter computes 2. Transmitter sends to receiver. Since receiver knows n=10, simply send “3”. 3. Only need codes for y with 3 ones Let Without incompletness removal, we would use Only one parameter

Non-incomplete MDL for Poisson segmentation Model 1 (same source): Model 2 (two sources): Nested: sequence of Poisson counts Simplest segmentation problem: are and from the same source?

Non-incomplete MDL for Poisson segmentation Generalization 1: are 2t i.i.d. Poisson r.v. (any parameter). Generalization 2: t t Key fact: let be two i.i.d. Poisson r.v. (any parameter).

Non-incomplete MDL for Poisson segmentation Model 1 (same source): (no parameter!) Model 2 (two sources): Nested: Approach: rather than the Poisson model, condition on the total count, and use the multinomial [Figueiredo and Nowak, 1999, 2000]

Non-incomplete MDL for Poisson segmentation 1. Description length under model 1(same source): in this case, there is no parameter to code. 2. Description length under model 2 (two sources): Part 1: Common, can be dropped Part 2: estimate , easy to show that because was sent. Part 3: attention to incompleteness receiver already knows and ,thus also

Non-incomplete MDL for Poisson segmentation model 1 Finally, MDL criterion: model 2 Can show: coincides with the fully Bayes criterion with uniform prior on . Part 3: Without incompleteness Receiver already knows and Have to code conditioned on these sums: Easy to show consistency via AEP (asymptotic equipartition property).

Non-incomplete MDL for Poisson segmentation Example -250 -255 Lk -260 -265 -270 -275 1 50 100 150 200 i model i segment at location i Fully Bayes-optimal / MDL-optimal segmentation. No user-defined thresholds, no approximations. no segmentation model 0 To segment sequence at arbitrary location: consider n competing models (n-1 segmentations and 1 non-segmentation)

Non-incomplete MDL for Poisson segmentation true intensity function Example: estimate Observed counts 40 30 20 10 0 200 400 600 [Figueiredo and Nowak, 1999, 2000] What about multiple change points ? Simply take each segment and re-apply the criterion (recursively)

Non-incomplete MDL for Poisson image segmentation 1 - competing models: no segmentation, all possible 4-segmentations, and all possible 2-segmentations ky Ny 1 Nx kx In 2D, we look for the best (if any) segmentation into two/four rectangles: Same multinomial-based criterion: - condition on the total count To fully segment an image: apply criterion recursively.

Segmenting Poisson images: synthetic example counts intensity estimates 250x250 pixels/bins segmentation True intensities [Figueiredo and Nowak, 1999, 2000]

Segmenting Poisson images: real example multi-look SAR adaptive recursive partioning (ARP) Can also develop non-incomplete MDL criterion for Gaussian data, but need to use usual asymptotic approximation. [Ndili, Figueiredo, Nowak, 2001]

Model based clustering using mixtures inference parameters of the kclusters/components y observed points (features) Model selection question: how many clusters/components ?

What’s a mixture? Mixing probabilities: Component densities,e.g. Gaussians Complete set of parameters: Given i.i.d. data: estimate maybe classify/cluster the data

Maximum likelihood estimate Models are nested is non-decreasing with k Need model selection criterion; let’s use MDL Another difficulty: maximizing expectation-maximization (EM) algorithm (previous lecture).

MDL for mixtures Total number of parameters: number ofpars. of each free ‘s Mixture MDL (MMDL) criterion: Important: depends explicitly on , not just on k [Figueiredo, Leitão, Jain, 1999] No!No regularity conditions; intuitively, not all parameters are estimated from all the data points.

Model Selection in Signal, Image, and Pattern Analysis

Model Selection in Signal, Image, and Pattern Analysis

Presentation Transcript

Model Evaluation and Selection

Model Assessment and Selection

Digital image processing Chapter 8 Image analysis and pattern recognition

Model Selection

Model Uncertainty and Model Selection

EM and model selection

Model Selection

Model selection

Best Signal Selection

Model Selection

Model Selection

Image Compression and Signal Processing

Image Selection

Model Selection Problems in Image Analysis

B s signal selection

Model Selection

Evolutionary Image Analysis and Signal Processing Stefano Cagnoni

Model selection

Signal and Image Processing Systems

IMAGE ANALYSIS AND PATTERN RECOGNITION Introduction Feature extraction:

Model selection and model building

Model Selection