1 / 14

Classification and Validation

Classification and Validation. Stefan Bentink 1/21/2010. Problem. Fit model (e.g. logistic regression). ?. predict. Class 1. Class 2. ?. ?. Objects/Individuals. ?. Evaluation. How many prediction errors in future predictions?. Look at the residuals for evaluation?.

jania
Télécharger la présentation

Classification and Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification and Validation Stefan Bentink 1/21/2010

  2. Problem Fit model (e.g. logistic regression) ? predict Class 1 Class 2 ? ? Objects/Individuals ?

  3. Evaluation How many prediction errors in future predictions? Look at the residuals for evaluation? X: Training data Y: Binary classification label β: Regression coefficients Y- βX Fit model of the form No! Y=βX

  4. Evaluation In order to test the prediction accuracy on new data, we need to test the model on new data! Class 1 Class 2 Training set Test set Apply model, Prediction accuracy? Fit model

  5. N-fold cross validation 1 2 3 … n Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 1 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Class 2 Train Train Train Test Train Train Test Train Train Test Train Train Test Train Train Train

  6. Classification in R • Goto R website: http://www.r-project.org • Click on CRAN • Select mirror • Click on packages (left menu bar) • Click on CRAN Task Views • Select task classification

  7. Problem 3 – Tutorial 2 (Lecture 4) • Read in the data file birthwt.txt. This file contains a data set on 189 births at a US hospital. The goal was to determine which set of covariates predict low birth weight. Currated version of data (birthwt_new.txt) generated by script generateBwtNew.r Binary response Predictors low age, lwt, smoke, ht, ui, ftv, ptd, race

  8. Implement model validation • Function to fit model • Function to predict model • Randomly split data into training and test set

  9. Multiple logistic regression model library(MASS) ##contains function stepAIC bw.new <- read.delim("birthwt_new.txt") model <- glm(low~.,family=binomial(link=logit),data=bw.new) model.opt <- stepAIC(model) log.odds <- predict(model,data=bw.new) probabilities <- exp(log.odds)/(1+exp(log.odds)) Remember from lecture 4

  10. Splitting data into training and test set n <- nrow(bw.new) k <- 2 train.test.size <- floor(n/k) partition <- rep(1:k,each=train.test.size) partition[n] <- k ##randomly choose training and test set set.seed(123) s <- sample(1:n) training.set.1 <- s[partition==1] test.set.1 <- s[partition==2]

  11. Train and validate model bw.train <- bw.new[training.set.1,] model.train.1 <- my.classify.logit(low~.,data=bw.train) bw.test <- bw.new[test.set.1,] true.predict.test.1 <- my.predict.logit(model.train.1,data=bw.test) class.test.1 <- as.numeric(true.predict.test.1>0.5) table(class.test.1,bw.test$low)

  12. Function ##The general framework my.function <- function(x,y) { … do something with x and y … … assign result to z … return(z) } ##Example my.function <- function(x,y) { z <- x+y return(z) }

  13. Function to fit model ##Function to fit logistic regression model ##f: formula (model specification) ##data: a data.fram my.classify.logit <- function(f,data) { require(MASS) model <- glm(f,family=binomial(link=logit), data=data) model.opt <- stepAIC(model) ##optimize model return(model.opt) }

  14. Function to predict new samples given a model ##function that predicts class probabilities ##model: A model fitted by my.classify.logit ##data: a data.frame with new data my.predict.logit <- function(model,data) { log.odds <- predict(model,data) probabilities <- exp(log.odds)/(1+exp(log.odds)) return(probabilities) }

More Related