230 likes | 374 Vues
AVMs and CAMA. The robots are taking over. What is CAMA?. What is an AVM?. Automated Valuation Model Uses a statistical model and a large amount of property data to estimate the market value of an individual property or portfolio of properties (RMBS – remember those?!)
E N D
AVMs and CAMA The robots are taking over
What is CAMA? What is an AVM? Automated Valuation Model Uses a statistical model and a large amount of property data to estimate the market value of an individual property or portfolio of properties (RMBS – remember those?!) A confidence level is also usually produced to indicate how accurate the valuation is Usually used for lending purposes • Computer-assisted mass appraisal • Uses a statistical model and a large amount of property data to estimate the market values of large numbers of properties • Usually used for tax purposes
Property Analytics • Data • Land Registry, Registers of Scotland • Surveyor reports • BCIS (for reinstatement valuations) • Royal Mail, Ordnance Survey, credit referencing co’s • Range of price estimation techniques • Surveyor emulation (comps search engine) • Multi-variate linear and non-linear regression • Repeat sales regression analysis
AVM accuracy • The most commonly used benchmark in measuring the performance of the AVM is surveyor valuations, e.g. a property can be valued using both the Rightmove AVM and a Surveyor: • 3 Badger Lane, Durham, DH1 3LN • Type: House • Style: Detached • Bedrooms: 3 • Surveyor valuation: £340,000 • AVM valuation: £324,500 • The difference between these two valuations is (AVM - Surveyor valuation) = - £15,500 = - 4.6% "error" Surveyor valuation £340,000 • If this measure is replicated across many properties, a spread of "errors" can be plotted...
Batch valuations from Rightmove analysed by Standard & Poor's, Moody's, Fitch and DBRS
Statistical model • Use multiple regression analysis (MRA) to infer a mathematical (e.g. linear) relationship between several property attributes and the price that a dwelling might trade for • Property attributes: size, type, age, location, etc. • Mathematical relationship encapsulated in an equation which can be used to estimate price in cases where the attributes are known but the price isn’t
Different from conventional valuation • Relies on large data set – big problem in UK • Provides a valuation and an estimate of variance • Quick • Cheap • Difficult to defend • Difficult to sue an AVM
Building the model – Start with simple linear regression model Response variable: PRICE (ave = £302,000, sd = £65,000) Predictor variable: Monthly rent (ave = £251, sd = £63) Frequency distribution is slightly positively skewed
Simple linear regression model Ordinary least squares (OLS)... where: y = estimate of the average sale price corresponding to a given value of x x = actual value of the monthly rent b0 = estimate of the intercept of the regression line b1 = estimate of the gradient of the regression line u = random component (residual error term) Valuers are being replaced by GCSE maths!
Simple linear regression model • Using the least squares principle (which minimises the sum of the squared differences between actual and predicted values of y) the regression line can be derived by solving for coefficients b1 and b0 using the variance of x and the covariance of x and y. • The expression from which b1can be calculated is • For b0 the expression is
(un-standardised) coefficients b1 = 215747/233509 = 0.9239 b0 = 302 – 0.9239 * 251 = 70.10 Both are significantly different from 0 at the 0.01% level y = 70.10 + 0.9239x So that’s £70,100 plus 0.92 x monthly rent...
Residual variation of predicted y from observed y y = b0+b1x +ui Response variable, y Total variation of observed y from mean y Meansale price Regression model variation of predicted y from mean y Predictor variable, x Interpretation of the model The mean value of the dependent variable y is a straight line on a scatter-plot as it would be the same for all values of an independent variable x
Total variation (SST)of each value of y about the mean value of y is calculated by taking the sum of the squared differences between observed values of y and the mean value of y Where = sale price of property i = average sale price i = 1, … , n (where n is the number of sales) Each point on the regression line (which slopes) varies from the mean value of y. This regression model variation (SSM) can be calculated as the sum of the squared differences between mean value of y and the regression line. Where = modelled sale price of property i Finally, residual variation (SSR) (variation unexplained by the regression model) can be calculated as the sum of the squared differences between observed values of y and the regression line. We would expect the total variation to comprise variation explained by the regression model plus residual variation, i.e. SST = SSM + SSR
Model performance As a measure of size of the relationship between the two variables we can calculate the amount of variance in the values of the dependent variable (SST) which is explained by the model (SSM), i.e. explained variation divided by total variation. This is known as the coefficient of determination, R2 R2 ranges from 0 to 1 and the smaller the residual variation as a percentage of total variation, the larger the R2 The F-ratio is the regression model variation (SSM) divided by the residual mean squares and is a measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy in the model. A good model will have a high F-ratio.
Model parameters • Un-standardised coefficients are in the source units for the variable • If x significantly predicts y it should have a b significantly different from zero. This is tested using a t-test: • For samples >= 60 observations (plus one additional observation for each parameter to be estimated) a predictor variable with a t-stat >= +/-2.00 indicates 95% confidence that b does not equal 0 and therefore xis significant in predicting y (if > +/2.58 then 99% confident)
Model residuals Residuals (difference between observed and predicted outcomes) should be normally distributed about the predicted responses with a mean of zero. A normal P-P plot of standardised residuals is a check on normality: plotted points should follow a straight line.
When the model fit is appropriate a scatter-plot of standardised residuals against predicted responses should be random, centred on the line of zero standard residual value • Standardised residuals with z-scores > +/-3 are outliers and therefore concerning • If > 1% standardised residuals have z-score > 2.5 the error in model is unacceptable • If > 5% standardised residuals have z-score > 2 this is also evidence that the model poorly represents the data So rent is a pretty good predictor of price. This is unsurprising as investors (buy-to-let) pay prices that bear a relationship (expressed as a yield or multiple) to the rent.