1 / 22

# Probabilistic Reasoning for Modeling Unreliable Data

Probabilistic Reasoning for Modeling Unreliable Data . Ron Tal York University. Agenda. Modeling Uncertainty Bayesian Reasoning M-Estimation Maximum Likelihood Common Pitfall More Advanced Models . Modeling Uncertainty. Why is it necessary?

Télécharger la présentation

## Probabilistic Reasoning for Modeling Unreliable Data

E N D

### Presentation Transcript

1. Probabilistic Reasoning for Modeling Unreliable Data Ron Tal York University

2. Agenda • Modeling Uncertainty • Bayesian Reasoning • M-Estimation • Maximum Likelihood • Common Pitfall • More Advanced Models

3. Modeling Uncertainty • Why is it necessary? • The only certainty in this world is uncertainty • Often we cannot afford or are not capable of explicitly enumerating all variables absolutely • Sometimes uncertainty is caused by a limit of the reliability of the technology • Making decisions with unreliable data

4. Modeling Uncertainty (cont.) • Three competing paradigms: • Non-monotonous Calculus • Fuzzy Logic • Probability Theory • Since we cannot construct a deterministic solution to many problems, we model sources of uncertainty as probability distributions

5. Bayesian Reasoning • At the core of probabilistic frameworks is Bayesian Inference • Let’s define a few concepts: • - The probability of witnessing evidence E given a hypothesis H • - The probability of hypothesis H given the evidence E • - Probability of Hprior to observing E • -

6. Bayesian Reasoning: Bayes’ theorem • States that: • Our life becomes simpler Expressed in terms of our model We usually know! We don’t always care! What we want to maximize

7. Bayesian Reasoning: Bayes’ theorem • If we prefer, it can also be written as The joint probability

8. M-Estimation Bayesian Inference gives us a powerful tool to choose the hypothesis that models the data A simple example is the set of parameters of a line of best fit through noisy data Statistical tools to achieve this are called M-Estimators The most popular choice is a special case called “Maximum Likelihood Estimator”

9. Maximum Likelihood Recall Bayes’ theorem: The denominator is merely a normalization constant Maximum Likelihood can be applied if we assume the model prior is known

10. Maximum Likelihood (cont.) • When model prior is constant: • Thus, we can fit model parameters by maximizing the likelihood

11. Maximum Likelihood (cont.) • To determine parameters of a model, we maximize the negative log likelihood: • This let’s us avoid playing with products

12. Maximum Likelihood (cont.) • For Gaussian distribution this is especially convenient:

13. Maximum Likelihood • Becoming: Least Squares  Constant

14. Common Pitfall • We love Gaussian Distributions • We love Least-Squares • However, using Least-Squares without the process of probabilistic reasoning is a common rookie mistake

15. Common Pitfall: Illustration

16. Better Modeling • Many statistical tools are available for when the Gaussian assumption fails • Assumptions can include • Good Data is Gaussian, Outliers are present • pdf can be represented as a mixture of causes • No parametric model is best suited for the job

17. Robust Statistics In Robust M-Estimators it is assumed that the data is locally Gaussian but outliers make traditional Least-Squares unsuitable Essentially, we give ‘bad’ data more credibility than it deserves Robust formulation ‘weighs’ the data with a Robust Influence Function

18. Robust Statistics (cont.) E.g. Tukey’s Biweight:

19. Mixture Models • Data can be represented as caused by one of several possible causes • Essentially a weighted sum of distributions • GMM is extremely powerful • EM Clustering is the ideal estimator for that

20. Non-parametric Actual observed data is used in place of a fitted model Usually a histogram To find the ML fit between new observed data and the histogram we can minimize the Bhattachariyya Distance:

21. Non-parametric Very simple to use Sometimes most accurate Very inefficient for problems with high dimensionality

22. Thank You 

More Related