Statistical Techniques Chapter 10
10.1 Linear Regression Analysis Equation 10.1
10.1 Linear Regression Analysis • A Supervised technique that generalizes a set of numeric data by creating a math equation relating one or more ,nput variables to a single output variable. • With linear regression we attemp to model vairation in a dependent variable as a linear combination of one or more independent variable • Linear regression is appro when the relation betwee the dependent and the independent variables are nearly linear
Simple Linear Regression(slope-intercept form) Equation 10.2
Simple Linear Regression(least squares criterion) Equation 10.3
10.1 Linear Regression Analysis • How accurate are the results • Use scatterplot diagram, and the line for the formula • Which ind vars are linearly related to dep vars. Use the stats? • Coefficient determination=1, no difference between actual (in the table) and computed values for dependent variable.(reps corrolation between actual and computed values) • Standard error for the estimate of dep var.
F stat for the regression analysis • Used to establish, if the coeff. deter. İs significant. • Look up f critical values (459) from one-tailed F tables in stat books using v1(number of ind vars, 4), v2 (no of instance – no of vars, 11-5=6) • Regression equation is able to correctly determineassesed values of office buildings that are part of the training data
Regression Tree • Essentially a desicion tree with leaf node with numeric variables • The value at an individual leaf node is numeric average of the output attribute for all instances passing through the tree to the leaf node posititon • Regresion trees are more accurate than lınear regression, when data is nonlinear • But is more difficult to interpret • Sometime regression trees are combined with linear regression to form model trees
Model Trees • Regression tree + linear regression • Each leaf node represents a linear regression quation instead of an average value • Model trees simplify regession trees by reducing the number of nodes in the tree. • More complex tree means less linear relationship between dep and ind vars.
Figure 10.3 A model tree for the deer hunter dataset (output attribute yes)
Logistic Regression • Using linear regresion to model problems with observed outcome restricted to 2 values (e.g. yes/no) is sriously flawed. Value restriction placed on output var is not observed in the regression equation, Linear regression produce straight line unbounded onboth ends. • Therefor the linear equation must be transform to restric output to [0,1], Thus regression equation can be thought of as producing a probablity of occurence or nonoccurence of a measured event. • Logistic regression applies logaithmic transform.
Transforming the Linear Regression Model Logistic regression is a nonlinear regression technique that associates a conditional probability with each data instance. 1 denotes observaton of one class (yes) 0 denotes observation of another class (no) Thus a conditional proabality of seeing class associatied with y=1 (yes) p(y=1|x), given the values in the feature vector x
The Logistic Regression Model Determine the coefficients in x, (ax+c) using an iterative method (tries to minimize the sum of logarithms of predicted probablities) Convergence occurs when logarithmic summation is close to 0 or when it doesn’t change from iteration to iteration Equation 10.7
Logistic Regression: An Example Credit card Example: CreditCardPromotionNet file. LifeIns Pro is output CreditCardIns and Sex are most influantion attribs.
Logistic Regression • Classify a new instance using logistic regression • income=35K • Credit card insurance=1 • Sex=0 • Age=39 • P(y=1|x)=0.999
10.3 Bayes Classifier • Supervised classification tech, categorical output attrib • All input vars are independent, of equal importance • P(H|E) likelihood of H (dependent var representing a predicted class) • P(E|H) conditional probability of H is true given evidence E (computed from training data) • P(H) apriori probability, denotes probability of H before the presentation of evidence E (computed from training data) Equation 10.9
Bayes Classifier: An Example Credit card promotion data set Sex is output
The Instance to be Classified Magazine Promotion = Yes Watch Promotion = Yes Life Insurance Promotion = No Credit Card Insurance = No Sex = ? 2 hypothesis, sex=female, sex=male
Computing The Probability For Sex = Male Equation 10.10
Conditional Probabilities for Sex = Male P(magazine promotion = yes | sex = male) = 4/6 P(watch promotion = yes | sex = male) = 2/6 P(life insurance promotion = no | sex = male) = 4/6 P(credit card insurance = no | sex = male) = 4/6 P(E | sex =male) = (4/6) (2/6) (4/6) (4/6) = 8/81
The Probability for Sex=Male Given Evidence E P(sex = male | E) 0.0593 / P(E)
The Probability for Sex=Female Given Evidence E P(sex = female| E) 0.0281 / P(E) P(sex = male | E) > P(sex = female| E) The instance is most likely a male credit card customer
Zero-Valued Attribute Counts Problem with Bayes is when of the counts are 0, to solve this problem a small constant to numerator/dominator n/d becomes k is 0.5 for an attrib with 2 possible values Example: P(E | sex =male) = (3/4)(2/4)(1/4)(3/4) = 9/128 P(E | sex =male) = (3.5/5)(2.5/5)(1.5/5)(3.5/5) = 0.0176 Equation 10.12
Missing Data With Bayes classifier missing data items are ignored.
Missing Data • Example
Numeric Data Probability Density Function, (attribute values are assumed to be normally distributed) where e = the exponential function m = the class mean for the given numerical attribute s = the class standard deviation for the attribute x = the attribute value Equation 10.13
Numeric Data • Magazine Promotion = Yes • Watch Promotion = Yes • Life Insurance Promotion = No • Credit Card Insurance = No • Age = 45 • Sex = ? • … • P(E|sex=male) = …. P(age=45|sex=male) • σ = 7.69 П = 37, x=45 • P(age=45|sex=male) = 1/(….) = 0.03 • P(sex=male|E) = 0.0018/P(E) • P(sex=female|E) = 0.0016/P(E) • Instance belong to male
Agglomerative Clustering Place each instance into a separate partition. Until all instances are part of a single cluster: a. Determine the two most similar clusters. b. Merge the clusters chosen into a single cluster. 3. Choose a clustering formed by one of the step 2 iterations as a final result.
Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use similarity measure for creating clusters, compare average within-cluster similarity with overall similarity of all instances in dataset (domain similarity) This technique can be best used to eliminate clusterings rather than to choose a final result
Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use within-cluster similarity measure and within-cluster similarities of pairwise-combined clusters in the cluster set. Look for the highest similarity This technique can be best used to eliminate clusters rather than to choose a final result
Agglomerative Clustering Final step of the Algorithm is to choose final clustering among all. (Requires heuristics) Use previous 2 techniques to eliminate some of the clusterings Feed each remaining clustering to a rule generator The clustering with best defining rules is chosen. (4th tech) Bayesian Information Criterion