This chapter uses MS Excel and Weka

# This chapter uses MS Excel and Weka

Télécharger la présentation

## This chapter uses MS Excel and Weka

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. This chapter uses MS Excel and Weka

2. Statistical Techniques Chapter 10

3. 10.1 Linear Regression Analysis Equation 10.1

4. 10.1 Linear Regression Analysis • A Supervised technique that generalizes a set of numeric data by creating a math equation relating one or more ,nput variables to a single output variable. • With linear regression we attemp to model vairation in a dependent variable as a linear combination of one or more independent variable • Linear regression is appro when the relation betwee the dependent and the independent variables are nearly linear

5. Simple Linear Regression(slope-intercept form) Equation 10.2

6. Simple Linear Regression(least squares criterion) Equation 10.3

7. Multiple Linear Regression with Excel

8. Try to estimate the value of a building

9. A Regression Equation for the District Office Building Data

10. 10.1 Linear Regression Analysis • How accurate are the results • Use scatterplot diagram, and the line for the formula • Which ind vars are linearly related to dep vars. Use the stats? • Coefficient determination=1, no difference between actual (in the table) and computed values for dependent variable.(reps corrolation between actual and computed values) • Standard error for the estimate of dep var.

11. F stat for the regression analysis • Used to establish, if the coeff. deter. İs significant. • Look up f critical values (459) from one-tailed F tables in stat books using v1(number of ind vars, 4), v2 (no of instance – no of vars, 11-5=6) • Regression equation is able to correctly determineassesed values of office buildings that are part of the training data

12. Figure 10.1 A simple linear regression equation

13. Regression Trees

14. Figure 10.2 A generic model tree

15. Regression Tree • Essentially a desicion tree with leaf node with numeric variables • The value at an individual leaf node is numeric average of the output attribute for all instances passing through the tree to the leaf node posititon • Regresion trees are more accurate than lınear regression, when data is nonlinear • But is more difficult to interpret • Sometime regression trees are combined with linear regression to form model trees

16. Model Trees • Regression tree + linear regression • Each leaf node represents a linear regression quation instead of an average value • Model trees simplify regession trees by reducing the number of nodes in the tree. • More complex tree means less linear relationship between dep and ind vars.

17. 10.2 Logistic Regression

18. Logistic Regression • Using linear regresion to model problems with observed outcome restricted to 2 values (e.g. yes/no) is sriously flawed. Value restriction placed on output var is not observed in the regression equation, Linear regression produce straight line unbounded onboth ends. • Therefor the linear equation must be transform to restric output to [0,1], Thus regression equation can be thought of as producing a probablity of occurence or nonoccurence of a measured event. • Logistic regression applies logaithmic transform.

19. Transforming the Linear Regression Model Logistic regression is a nonlinear regression technique that associates a conditional probability with each data instance. 1 denotes observaton of one class (yes) 0 denotes observation of another class (no) Thus a conditional proabality of seeing class associatied with y=1 (yes) p(y=1|x), given the values in the feature vector x

20. The Logistic Regression Model Determine the coefficients in x, (ax+c) using an iterative method (tries to minimize the sum of logarithms of predicted probablities) Convergence occurs when logarithmic summation is close to 0 or when it doesn’t change from iteration to iteration Equation 10.7

21. Figure 10.4 The logistic regressioin equation

22. Logistic Regression: An Example Credit card Example: CreditCardPromotionNet file. LifeIns Pro is output CreditCardIns and Sex are most influantion attribs.

23. Logistic Regression • Classify a new instance using logistic regression • income=35K • Credit card insurance=1 • Sex=0 • Age=39 • P(y=1|x)=0.999

24. 10.3 Bayes Classifier • Supervised classification tech, categorical output attrib • All input vars are independent, of equal importance • P(H|E) likelihood of H (dependent var representing a predicted class) • P(E|H) conditional probability of H is true given evidence E (computed from training data) • P(H) apriori probability, denotes probability of H before the presentation of evidence E (computed from training data) Equation 10.9

25. Bayes Classifier: An Example Credit card promotion data set Sex is output

26. The Instance to be Classified Magazine Promotion = Yes Watch Promotion = Yes Life Insurance Promotion = No Credit Card Insurance = No Sex = ? 2 hypothesis, sex=female, sex=male

27. Computing The Probability For Sex = Male Equation 10.10

28. Conditional Probabilities for Sex = Male P(magazine promotion = yes | sex = male) = 4/6 P(watch promotion = yes | sex = male) = 2/6 P(life insurance promotion = no | sex = male) = 4/6 P(credit card insurance = no | sex = male) = 4/6 P(E | sex =male) = (4/6) (2/6) (4/6) (4/6) = 8/81

29. The Probability for Sex=Male Given Evidence E P(sex = male | E)  0.0593 / P(E)

30. The Probability for Sex=Female Given Evidence E P(sex = female| E)  0.0281 / P(E) P(sex = male | E) > P(sex = female| E) The instance is most likely a male credit card customer

31. Zero-Valued Attribute Counts Problem with Bayes is when of the counts are 0, to solve this problem a small constant to numerator/dominator n/d becomes k is 0.5 for an attrib with 2 possible values Example: P(E | sex =male) = (3/4)(2/4)(1/4)(3/4) = 9/128 P(E | sex =male) = (3.5/5)(2.5/5)(1.5/5)(3.5/5) = 0.0176 Equation 10.12

32. Missing Data With Bayes classifier missing data items are ignored.

33. Missing Data • Example

34. Numeric Data

35. Numeric Data Probability Density Function, (attribute values are assumed to be normally distributed) where e = the exponential function m = the class mean for the given numerical attribute s = the class standard deviation for the attribute x = the attribute value Equation 10.13

36. Numeric Data • Magazine Promotion = Yes • Watch Promotion = Yes • Life Insurance Promotion = No • Credit Card Insurance = No • Age = 45 • Sex = ? • … • P(E|sex=male) = …. P(age=45|sex=male) • σ = 7.69 П = 37, x=45 • P(age=45|sex=male) = 1/(….) = 0.03 • P(sex=male|E) = 0.0018/P(E) • P(sex=female|E) = 0.0016/P(E) • Instance belong to male

37. 10.4 Clustering Algorithms

38. Agglomerative Clustering Place each instance into a separate partition. Until all instances are part of a single cluster: a. Determine the two most similar clusters. b. Merge the clusters chosen into a single cluster. 3. Choose a clustering formed by one of the step 2 iterations as a final result.

39. Agglomerative Clustering: An Example

40. Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use similarity measure for creating clusters, compare average within-cluster similarity with overall similarity of all instances in dataset (domain similarity) This technique can be best used to eliminate clusterings rather than to choose a final result

41. Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use within-cluster similarity measure and within-cluster similarities of pairwise-combined clusters in the cluster set. Look for the highest similarity This technique can be best used to eliminate clusters rather than to choose a final result

42. Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use previous 2 techniques to eliminate some of the clusterings Feed each remaining clustering to a rule generator The clustering with best defining rules is chosen. (4th tech) Bayesian Information Criterion