1 / 81

Introduction to Probability Theory in Machine Learning: A Bird View

Introduction to Probability Theory in Machine Learning: A Bird View. Mohammed Nasser Professor, Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com. Content of Our Present Lecture. Introduction Problem of Induction and Role of Probability Techniques of Machine Learning

thanh
Télécharger la présentation

Introduction to Probability Theory in Machine Learning: A Bird View

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Probability Theory in Machine Learning: A Bird View Mohammed Nasser Professor, Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com

  2. Content of Our Present Lecture • Introduction • Problem of Induction and Role of Probability • Techniques of Machine Learning • Density Estimation • Data Reduction • Classification and Regression Problems • Probability in Classification and Regression • Introduction to Kernel Methods

  3. Introduction The problem of searching for patterns in data is the basic problem of science. • the extensive astronomical observations of Tycho Brahe in the 16th century allowed Johannes Kepler to discover the empirical laws of planetary motion, which in turn provided a springboard for the development of classical mechanics. Johannes Kepler 1571 - 1630 Brahe, Tycho 1546-1601

  4. Introduction • Darwin’s(1809-1882) study of nature in five years voyage on HMS Beagle revolutionized biology. • the discovery of regularities in atomic spectra played a key role in the development and verification of quantum physics in the early twentieth century. Off late the field of pattern recognition has become concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.

  5. Problem of Induction • The inductive inference process: • Observe a phenomenon • Construct a model of the phenomenon • Make predictions →This is more or less the definition of natural sciences ! →The goal of Machine Learning is to automate this process →The goal of Learning Theory is to formalize it.

  6. Problem of Induction Let us suppose somehow we have x1,x2,- - -xn measurements n is very large, e.g. n=1010000000.. Each of x1,x2,- - -xn measurements satisfies a proposition, P Can we say that( n+1) (1010000000+1)th obsevation satisfies P. certainly.. No

  7. Problem of Induction Let us consider P(n)= - n The question: Is P(n) >0 ? It is positive upto very very large number, but after that becomes negative. What can we do now? Probabilistic framework to the rescue!

  8. Problem of Induction • What is the probability, p that the sun will rise tomorrow? • p is undefined, because there has never been an experiment that tested the existence of the sun tomorrow • The p = 1, because the sun rose in all past experiments. • p = 1-ε, where εis the proportion of stars that explode per day. p = d+1/d+2, which is Laplace rule derived from Bayes rule.(d = past # days sun rose, ) Conclusion: We predict that the sun will rise tomorrow with high probability independent of the justification.

  9. The Sub-Fields of ML Classification • Supervised Learning Regression Clustering • Unsupervised Learning Density estimation Data reduction • Reinforcement Learning

  10. Unsupervised Learning: Density Estimation What is the wt of the elephant? What is the wt/distance of sun? What is the wt of a DNA molecule? What is the wt/size of baby in the womb?

  11. Solution of the Classical Problem Let us suppose somehow we have x1,x2,- - -xn measurements One million dollar question: How can we choose the optimum one among infinite possible alternatives to combine these n obs. to estimate the target,μ What is the optimum n?

  12. We need the concepts: Probability measures, Probability distributions - - - Target that we want to estimate ith observations

  13. Meaning of Measure On any sample space Whenever An A Probability Measures Discrete P(A)=1, #(A)=finite or Continuous P{x}=0 for all x Absolutely Continous Non A.C.

  14. Discrete Distributions On Rk We have special concepts

  15. Continuous Distributions

  16. Different Shapes of the Models Is sample mean appropriate for all the models?

  17. I know the population means - - - I know Pr(a<X<b) for every a and b.

  18. Approaches of Model Estmation Bayesian • Parametric Non-Bayesian Cdf estimation • Nonparametric Density estimation • Semiparametric

  19. Infinite-dimensional Ignorance Generally any function space is infinite-dimensional Parametric modeling assumes our ignorance is finite-dimensional Semi-parametric modeling assumes our ignorance has two parts: one finite-dimensional and the other, infinite-dimensional Non-parametric modeling assumes our ignorance is infinite-dimensional

  20. Parametric Density Estmation

  21. Nonparametric Density Estmation

  22. Semiparametric /Robust Density Estmation Parametric model Parametric model Nonparametric model

  23. Application of Density Estmation Picture of Three Objects

  24. Distribution of three objects

  25. Curse of Dimension Courtesy: Bishop(2006)

  26. Curse of Dimension Courtesy: Bishop(2006)

  27. Unsupervised Learning: Data Reduction If population model is MVN with high corr., it works well.

  28. Unsupervised Learning: Data Reduction ??????? One dimensional manifold

  29. Problem-2 Fisher’s Iris Data (1936): This data set gives the measurements in cm (or mm) of the variables Sepal length Sepal width Petal length Petal width and Species (setosa, versicolor, and virginica) There are 150 observation with 50 from each species. We want to predict the class of a new observation . What is the available method to do job? LOOK! DEPENDENT VARIABLE IS CATEGORICAL*** INDEPENDENT VARIABLES ARE CONTINUOUS***

  30. Problem-3 BDHS (2004): The dependent variable is childbearing risk with two values (High Risk and Low Risk). The target is to predict the childbearing risk based some socio economic and demographic variables. The complete list of the variables is given in the next slide. Again we are in the situation where the dependent variable is categorical, independent variables are mixed.

  31. Problem-4 Face Authentication (/ Identification) • Face Authentication/Verification (1:1 matching) • Face Identification/Recognition (1:N matching)

  32. Applications  Access Control www.viisage.com www.visionics.com

  33. Applications  Video Surveillance (On-line or off-line) Face Scan at Airports www.facesnap.de

  34. Why is Face Recognition Hard? Inter-class Similarity Twins Father and son Intra-class Variability

  35. Handwritten digit recognition We want to recognize the postal codes automically

  36. Problem 6: Credit Risk Analysis • Typical customer: bank. • Database: • Current clients data, including: • basic profile (income, house ownership, delinquent account, etc.) • Basic classification. • Goal: predict/decide whether to grant credit.

  37. Problem 7: Spam Email Detection, Search Engine etc traction.tractionsoftware.com www.robmillard.com

  38. Problem 9: Genome-wide data 40 mRNA expression data hydrophobicity data protein-protein interaction data sequence data (gene, protein)

  39. Problem 10: Robot control • Goal: Control a robot in an unknown environment. • Needs both • to explore (new places and action) • to use acquired knowledge to gain benefits. • Learning task “control” what is observed!

  40. Problem-11 Wisconsin Breast Cancer Database (1992): This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. Or you get it from (http://www.potschi.de/svmtut/breast-cancer-wisconsin.data). The variables are: Clump Thickness 1 - 10 Uniformity of Cell Size 1 - 10 Uniformity of Cell Shape 1 - 10 Marginal Adhesion 1 - 10 Single Epithelial Cell Size 1 – 10 Bare Nuclei 1 - 10 Bland Chromatin 1 - 10 Normal Nucleoli 1 - 10 Mitoses 1 - 10 Status: (benign, and malignant) There are 699 observations are available. Now we want to predict the status of a patient weather it benignormalignant. DEPENDENT VARIABLE IS CATEGORICAL. Independent variables???

  41. Problem 12:Data Description Despreset al. pointed out that the topography of adipose tissue (AT) is considered as risk factors for cardiovascular disease. Cardiovascular diseases affect the heart and blood vessels and include shock, heart failure, heart valve disease, congenital heart disease etc. It is important to measure the amount of intra-abdominal AT as part of the evaluation of the cardiovascular-disease risk of an individual. Adipose Tissue

  42. Data Description Problem: Computed tomography of AT is ---- very costly ----- requires irradiation of the subject ----- not available to many physicians. Not available to Physician • Materials:The simple anthropometric measurements such as waist circumference which can be obtained cheaply and easily. • Variables: • Y= Number of deep abdominal AT • X=The Waist Circumference (in cm.) • Total observation is 109 (men) • Data sources:W. W.Daniel (2003) How well we can predict and estimate the deep abdominal AT from the knowledge of waist circumference ? Waist Circumference

  43. Complex Problem 13 Hypothesis: The infant’s size at birth is associated with the maternal characteristics and SES Variables: X Maternal & SES 1. Age (x1) 2. Parity (x2) 3. Gestational age (x3) 4. Mid-Upper Arm Circumference MUAC (x4) 5. Supplementation group (x5) 6. SES index (x6) CCA, KCCA, MR, PLS etc give us some solutions to this complex problem.

  44. Data Vectors Collections of features e.g. height, weight, blood pressure, age, . . . Can map categorical variables into vectors Matrices Images, Movies Remote sensing and satellite data (multispectral) Strings Documents Gene sequences Structured Objects XML documents Graphs

  45. Let US Summarize!!Classification (reminder) Y=g(X) X ! Y • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • discrete: • {0,1} binary • {1,…k} multi-class • tree, etc. structured

  46. Classification (reminder) Perceptron Logistic Regression Support Vector Machine Decision Tree Random Forest Kernel trick X • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • …

  47. Regression Y=g(X) X ! Y • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • continuous: • , d Not Always

  48. Regression Perceptron Normal Regression Support Vector regression GLM Kernel trick X • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • …

More Related