1 / 3

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland. Data Mining in a Nutshell

bond
Télécharger la présentation

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

  2. Data Mining in a Nutshell Knowledge discovery in databases (KDD) was initially defined as the ‘non-trivial extraction of implicit, previously unknown, and potentially useful information from data’ [Frawley, Piatetsky-Shapiro, Matheus, 1991]. A revised version of this definition states that ‘KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data’ [Fayyad, Piatetsky-Shapiro, Smyth, 1996]. According to this definition, data mining is a step in the KDD process concerned with applying computational techniques (i.e., data mining algorithms implemented as computer programs) to actually find patters in the data. In a sense, data mining is the central step in the KDD process. The other steps in the KDD process are concerned with preparing data for data mining, as well as evaluating the discovered patterns, the results of data mining. IData. The input to a data mining algorithm is most commonly a single flat table comprising a number of fields (columns) and records (rows). In general, each row represents an object and columns represent properties of objects. IITypical data mining tasks. - Classification and regression; the task is to predict the value of one field from other fields. If the class is continuous, the task is called regression. If the class is discrete the task is called classification. - Clustering is concerned with grouping objects into classes of similar objects. A cluster is a collection of objects that are similar to each other and are dissimilar to objects in other clusters. - Association analysis is the discovery of association rules. Association rules specify correlation between frequent item sets. - Data characterisation sums up the general characteristics or features of the target class of data: this class is typically collected by a database query.

  3. - Outlier detection is concerned with finding data objects that do not fit the general behaviour or model of the data: these are called outliers. - Evaluation analysis describes and models regularities or trends whose behaviour changes over time. IIIOutputs of data mining procedures can be - Equations e.g. TotalSpent = 189.5275 x Age + 7146.89 [€] - Decision trees, e.g. Income  100.000 € > 100.000 € Age Yes - Predictive rules of a form IF Conjunction of conditions THEN Conclusion, e.g. IF income is  100.000 € and Gender = Male THEN not a Big Spender - Association rules e.g. {Gender = ‘Female’, Age = ‘>52’} {Big Spender = ‘Yes’} - Distance and similarity measures e.g.  58 > 58 Yes No - Probabilistic models e.g. Bayesian networks (For more details see Saso Dzeroski’s Relational Data Mining) -------------------------------------------------------------------------------------------------------------------- Our aim is to study in details a particular data mining method called GUHA and it’s computer implementation called LISp Miner. This approach is essentially as association analysis, however, classification, clustering and outlier detection tasks can be carried out by this method.

More Related