1 / 8

Data Mining Application: CART

Data Mining Application: CART. CART:. Binary Recursion Decision Tree program from Salford Systeems www.salford-systems.com 30-day evaluation copy from http://www.salford-systems.com/evals/cartreg.html Company: Villanova University Department: Computer Sciences.

didina
Télécharger la présentation

Data Mining Application: CART

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Application:CART

  2. CART: • Binary Recursion Decision Tree program from Salford Systeems • www.salford-systems.com • 30-day evaluation copy from • http://www.salford-systems.com/evals/cartreg.html • Company: Villanova University • Department: Computer Sciences

  3. CART Binary Recursive Trees • One target variable • Splits data into a number of classes on the target variable (set-able input parameter) • Many predictor variables • At each recursion CART determines one yes-no (binary) question based on one predictor variable • Various splitting criteria. Default (GINI) measures how well rule separates classes in parent node

  4. CART Tutorial • We have defined three market segments, numbered 1, 2, 3. They represent “profitability”, broadly defined as “how much money did we make from this person in the last year”. • We are interested in questions which distinguish these segments so we know how to better target future marketing.

  5. CART Gym Data Tutorial: Variables • SEGMENT Member's market segment (coded 1,2,or 3) • ANYRAQT Racquet ball usage (binary indicator coded 0, 1) • TANNING Number of visits to tanning salon • PERSTRN Personal trainer (binary indicator coded 0, 1) • ONAER Number of on-peak aerobics classes attended • OFFAER Number of off-peak aerobics classes attended • ANYPOOL Pool usage (binary indicator coded 0, 1) • CLASSES Number of classes taken • NSUPPS Number of supplements/vitamins/frozen dinners purchased • SMALLBUS Small business discount (binary indicator coded 0, 1) • OFFER Terms of offer • FIT Fitness score • NFAMMEN Number of family members • HOME Home ownership (binary indicator coded 0, 1)

  6. Potential Data Sources • CART uses data in the Systat for Windows format, extension .syd. (Systat is a very popular statistical package) www.spssscience.com/systat. • The downloaded version includes a dynamic link to a program called DMBS-copy, which also allows you to use other data formats such as ASCII, Excel, etc. www.conceptual.com/dbmscopy.htm.

  7. Summary: CART • Good for generating decision trees, and provides a lot of alternatives and a lot of information. • Can also use the rules created and the resulting data as input into additional tools • Far more information there than you want to look at if you don’t know what you’re looking for.

  8. CART Assignment: • Three pieces: • Download and install it • Work through the tutorial yourself and do a brief report. • Analyze a new set of data and answer some questions about it. • I am in the process of getting descriptions for the sample data in the download and will prepare questions based on one of those • Or if you have data in an appropriate format you may use your own data and questions.

More Related