1 / 39

Examining Activity Patterns Using Fuzzy Clustering

Examining Activity Patterns Using Fuzzy Clustering. by D De Silva, University of Calgary JD Hunt, University of Calgary PROCESSUS Second International Colloquium Toronto ON, Canada June 2005. Overview. Introduction Data Method Preliminary Results Conclusions. Introduction. Context

carlow
Télécharger la présentation

Examining Activity Patterns Using Fuzzy Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Examining Activity Patterns Using Fuzzy Clustering by D De Silva, University of Calgary JD Hunt, University of Calgary PROCESSUS Second International Colloquium Toronto ON, Canada June 2005

  2. Overview • Introduction • Data • Method • Preliminary Results • Conclusions

  3. Introduction • Context • Activity-based transport models increasing • Need for grouping into segments • At present seems largely based on received wisdom • Motivations • Opportunity in Calgary • Large Household Activity Diary Survey • Interest in Activity-based model development • Willingness to explore issue of grouping • Increase understanding of activity patterns resulting from behavioral processes

  4. Introduction • Previous work • Fair amount of work drawing in essence on three basic elements • Data interpretation • Similarity or Dissimilarity Measures • Pattern Recognition Algorithms

  5. Introduction • Previous work (Contd.) • Data Interpretation • Some used Time Slices in 5 to 15 minute intervals (Recker et al; Wilson) • Others Disagreed with it and used number of stops made. (Pas) • Similarity or Dissimilarity Measures • Similarity Matrix (Pas;Wilson; Ma) • Sequential Alignment Method (Wilson; Jun Ma) • Walsh-Hadamand transformation, a Fourier Type Analysis, (Recker et al) • Pattern Recognition Algorithms • All have used Crisp Clustering Methods

  6. Introduction • Previous work (Contd.) • Groups with similar activities • Pas – 12 groups based on the number of non-home stops • Recker – 7 Groups based on Socio Economic Data • Wilson – 8 groups Similar to Recker • Applications • To Model Inter Shopping Duration (Bhat) • Micro simulation of Activity Patterns (Kitamura et al; Kulkarni et al) • Extension – the work described here • Time Slices • Sequential Alignment Method • Fuzzy Clustering

  7. DataHousehold Activity Survey (HAS) • 24-hour diary • Fall of 2001 • Sample size • 8,400 households overall • 5,900 on weekdays • 15-minute intervals • activity • location • Activities in 19 categories • Locations • X,Y • Home, Work , Travel, Other • All household members

  8. Activities Covered in HAS • Travel (A) • Pick Up Someone (B) • Drop Off Someone (C) • Work (D) • School / Homework (E) • Shopping (F) • Daycare (G) • Social (H) • Eating (J) • Entertainment / Leisure (K) • Medical / Financial (L) • Exercise (M) • Religious / Civic (N) • Sleeping (O) • Household Chores (P) • Park / Un-park Vehicle (X) • Work-Travel(e.g. Taxi Driver) (Y) • Out-of-Town (Z)

  9. Example Sequence • Activity Sequence of • 30 min Sleep • 15 min Eat • 30 min Travel • 1 hr Work • O O J A A D D D D

  10. Initial Sample for Testing • Covered in this presentation • 75 persons • 50 households • Just activity type and weekdays (not location & weekends) • Later consider: • Full sample • Weekends and weekdays • Location types as a further dimension

  11. Data Set (Time Slices) Fuzzy Cluster Memberships Dissimilarity Matrix Fuzzy Clustering (S-Plus Software) Sequential Alignment Method (CLUSTALG Software) • Cluster Center Interpretation • Socio Economic Variable Distribution • Fuzzy Weighted Frequency Distributions Groups of Similar Activity Patterns Method

  12. Sequential Alignment Method (SAM) • Alignment Methods first used in field of Molecular Biology for DNA matching • Activity Travel Patterns Intrinsically Sequential • SAM Evaluation of Sequence of Characters • Global Alignment (Whole Sequence) • Local Alignment (Short sequence within entire sequence) • Simplest case is Pairwise alignment

  13. Sequential Alignment Method • Pairwise Alignment • Two Character Sequences • ID 1: O O J A A D D D D • ID 2: O O O J A D D D O • Elementary Operations until equal • Insertions and Deletions (Indel) • Gaps • Gap insertion and extension Penalties • Global Alignment – Needleman & Wunch algorithm minimizing the distance or maximizing the similarity • ID 1: - O O J A A D D D D - • ID 2: O O O J A - D D D – O • Similarity Score = 70 • Lesser operations  Similar Pair

  14. Sequential Alignment Method • Gap Opening and Extension Penalties • Role of gap penalty • High Value • Alignment compressed • Literally to matches avoiding gaping • Resemble main activities at their relative times • Recommended values 8 and 3 (Wilson) • Low Value • Identification of similar activities displaced during the day • Better pairwise comparison • Little similarity to the actual activity Pattern • Recommended values 1 and 0.1 (Wilson) • Tested and accepted recommendation of Low Value for Transportation Research (Wilson)

  15. Sequential Alignment Method • Multiple Alignment • Extension of pairwise alignment to N dimensions • Computation power enormous after 10 sequences of reasonable length • Approximation method based on data of pairwise alignment • Use of ClustalG software by Wilson

  16. Sequential Alignment Method • Output is a Dissimilarity Matrix

  17. Fuzzy Clustering • Partition Clustering Method • Number of clusters k - specified in front • The Objects (Activity Patterns) are not assigned to a particular cluster but assigned a membership ranging between 0 and 1 for all clusters • Uses S-plus Software (Kaufman Procedure) • Dissimilarity matrix is input

  18. Fuzzy Clustering • Minimize Objective Function (Kaufman)

  19. Fuzzy Clustering • Number of clusters ? • An Open question – To be determined as part of research • Two quality indices from S-Plus • Dunn’s Coefficient • Average Silhouette Value with Shadow plot

  20. Fuzzy Clustering • Dunn’s Coefficient Where Fk always lies in the range [1/k,1]. •  entirely Fuzzy Clustering  •  Crisp Clustering 

  21. Average Silhouette Value (ASV) with Shadow plot Strength of Classification to the nearest crisp cluster compared to the next best cluster Width of Bar 1 – Well Classified 0 – Between two clusters 0< - Badly classified (lies near the next best cluster) Average Value gives a approximation to the best number of clusters ASV must be higher than 0.25 Fuzzy Clustering

  22. Cluster Center Interpretation • Distributions of socio-economic variables • Basis for grouping in subsequent modeling • Person characteristics: • Age • Gender • Person type category from survey • Employment Status • Household characteristics: attributed to persons • Only income so far • Household structure later • Fuzzy weighted frequency distributions • Need for eventual Crisp • Potentially use logit to assign cluster membership values • Calibrate ‘utility functions’ for clusters with person characteristics • Use Monte Carlo to select specific cluster in each case

  23. Cluster Center Interpretation • Fuzzy Weighted Frequency Distributions; • Bar for category in histogram for cluster is Percentage sum of people for that category in entire sample factored by cluster membership

  24. Results • Sequential Alignment • Low Vs High Gap Penalty Results • Cluster plot for 3 clusters Low Gap High Gap

  25. Results • Shadow Plot • Low Gap High Gap • Use low Gap Penalty – consistent with recommendation (1 and .1)

  26. Results • Number of Clusters • Clustal Plot Helps to See the potential range of number of clusters for Clustering

  27. Results • Number of Clusters • Potential range 2 to 5

  28. Results • Number of Clusters (k) • K=2 • Fk = 0.60 ASV = 0.42

  29. Results • Number of Clusters (k) • K=3 • Fk = 0.43 ASV = 0.40

  30. Results • Number of Clusters (k) • K= 4 • Fk = 0.34 ASV = 0.32

  31. Results • Number of Clusters (k) • K= 5 • Fk = 0.28 ASV = 0.20

  32. Results • Number of Clusters (k) ? • Use 3 clusters for testing • Expect different for total sample

  33. Fuzzy Cluster Memberships • Output of S-plus software • HH2701 has almost equal memberships to all three clusters -

  34. Results Fuzzy weighted frequency Distribution

  35. Results Crisp presentation Cluster Interpretation

  36. Results Cluster Interpretation - tends to be more; • Cluster 1 • Students age of 5 to 15 • Mainly KEJS and youths • Cluster 2 • Females • Seniors and other adults in Age range 66-70 • Retired home makers and volunteers • Cluster 3 • Males • 100% Adults workers • Age 40’s • Majority Adults workers not needing a car to work • Expect different for total sample

  37. Conclusions • Methods seems to work well to identify the clusters as intended – no hurdles. • Fuzzy clustering better indicate strength of membership • Best to have multiple measures “quality” of clustering regarding number of clusters • Still work in progress • Results not complete – just for example • But essential elements of analysis process set

  38. Conclusions • Future Work • Proceeding to full sample of 8,400 households including Weekends • Expanding to location dimension • Calibrate Logit model for allocation of clusters • Consider Household Structure

  39. Thank You ?

More Related