1 / 23

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods. Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo. November 10, 2005

diane
Télécharger la présentation

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo November 10, 2005 North American Meetings of Regional Science Association International

  2. Outlines • Research objectives • Methodology: specification • Methodology: illustration • Evaluating the performance of fuzzy clustering • Conclusions

  3. Research objectives • Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets • Comparison to K-means • Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters • Cluster validity index

  4. Cluster A X2 X2 Cluster A Cluster B Cluster B Cluster C Cluster C X1 X1 Housing market in metropolitan area p Housing market in metropolitan area q Challenges • Are the boundaries of clusters crisp?

  5. Methodology: specification

  6. Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area • Using fuzzy c-means algorithm • In order to examine whether fuzzy set-based clustering can do the better job • Implemented in 85 metropolitan areas • Most of data set are public (e.g. 2000 Census) • The whole procedure is automated in GIS

  7. Fuzzy Cluster Layer 1 2 c … Hard Cluster Layer Census Tract Layer Stepwise regression (k≤ m) … K-means Fuzzy C-means Methodology: flow chart For each metropolitan area Candidate variables National Regional Metro Local Uj: membership to cluster j Cluster Analysis Significant variables (c ≤ n) k: # selected variables c: # submarkets

  8. Explanatory variables for house price *National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units

  9. Study set: 85 metropolitan areas

  10. x1 x2 What is fuzzy c-means (FCM)? • Clustering method that minimizes the following objective function: Vectors of data point, 1 ≤k ≤ n Center of cluster i, 1 ≤ i ≤ c Membership degree of data point k with cluster i; [0,1] Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞ • Updates cluster means vi and membership degree uik until the algorithm converges (III-3a) (III-3b) Source: Bezdek 1981

  11. FCM: missing elements • Optimal number of clusters c* • Optimal fuzziness amount m* m c FCM

  12. Extended fuzzy c-means algorithm • Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ ccmax), m = 1 (1 ≤ mmmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold L; Choose validity indexv • Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….; • Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l) • Step 4: Update U(l+1) using (III-3b) and {vi(l)} • Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤ L to go step 6; otherwise return to Step 3. • Step 6: Compute the validity index for given c and m • Step 7: If c < cmax, then increase c  c + 1 and go to step 3; otherwise go to step 8 • Step 8: If m < mmax, then increase mm + minc and go to step 3; otherwise go to step 9 • Step 9: Obtain the optimal validity index from , optimal number of clusters c*, and optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*

  13. Cluster validity indices Partition coefficient Partition entropy SVi index where w is set to 2 in this study Xie-Beni index

  14. Determining c* and m* • Selected validity indices are calibrated over the study set Xie-Beni index is recommended as a validity index Average m* is 1.38

  15. Histogram of m* for FCM

  16. Methodology: illustration

  17. Median home value of Buffalo, NY

  18. Dimensionality of Buffalo housing market Hedonic regression equation of median home value in Buffalo, NY Adjusted R sq = 84.3%

  19. Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY Values in the cell represent Xie-Beni index given c and m

  20. Membership to Cluster 1 Membership to Cluster 2 Membership to Cluster 3 Defuzzified Clusters Buffalo housing submarkets c* = 3; m* = 1.3

  21. Evaluating the performance of fuzzy clustering

  22. Compare FCM with K-means (KM) • Compare the sum of squared error derived from KM (m=1) and FCM (m=m*) given c* Fuzzy clustering outperforms crisp clustering

  23. Conclusions • Fuzzy set theory provides a mechanism for uncertainty handling involved in classification task • Fuzzy c-means algorithm is of practical use in delineating housing submarkets • Fuzzy set theory needs further attention in social science fields • More works on the choice of parameters are needed

More Related