GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

Data Mining in a Nutshell Knowledge discovery in databases (KDD) was initially defined as the ‘non-trivial extraction of implicit, previously unknown, and potentially useful information from data’ [Frawley, Piatetsky-Shapiro, Matheus, 1991]. A revised version of this definition states that ‘KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data’ [Fayyad, Piatetsky-Shapiro, Smyth, 1996]. According to this definition, data mining is a step in the KDD process concerned with applying computational techniques (i.e., data mining algorithms implemented as computer programs) to actually find patters in the data. In a sense, data mining is the central step in the KDD process. The other steps in the KDD process are concerned with preparing data for data mining, as well as evaluating the discovered patterns, the results of data mining. IData. The input to a data mining algorithm is most commonly a single flat table comprising a number of fields (columns) and records (rows). In general, each row represents an object and columns represent properties of objects. IITypical data mining tasks. - Classification and regression; the task is to predict the value of one field from other fields. If the class is continuous, the task is called regression. If the class is discrete the task is called classification. - Clustering is concerned with grouping objects into classes of similar objects. A cluster is a collection of objects that are similar to each other and are dissimilar to objects in other clusters. - Association analysis is the discovery of association rules. Association rules specify correlation between frequent item sets. - Data characterisation sums up the general characteristics or features of the target class of data: this class is typically collected by a database query.

- Outlier detection is concerned with finding data objects that do not fit the general behaviour or model of the data: these are called outliers. - Evaluation analysis describes and models regularities or trends whose behaviour changes over time. IIIOutputs of data mining procedures can be - Equations e.g. TotalSpent = 189.5275 x Age + 7146.89 [€] - Decision trees, e.g. Income  100.000 € > 100.000 € Age Yes - Predictive rules of a form IF Conjunction of conditions THEN Conclusion, e.g. IF income is  100.000 € and Gender = Male THEN not a Big Spender - Association rules e.g. {Gender = ‘Female’, Age = ‘>52’} {Big Spender = ‘Yes’} - Distance and similarity measures e.g.  58 > 58 Yes No - Probabilistic models e.g. Bayesian networks (For more details see Saso Dzeroski’s Relational Data Mining) -------------------------------------------------------------------------------------------------------------------- Our aim is to study in details a particular data mining method called GUHA and it’s computer implementation called LISp Miner. This approach is essentially as association analysis, however, classification, clustering and outlier detection tasks can be carried out by this method.

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

Presentation Transcript

School of Education University of Tampere, Finland

Tampere, Finland

Tampere

Petri Nokelainen petri.nokelainen@uta.fi School of Education University of Tampere Finland

Introducing t he University of Tampere

Research Centre for Vocational Education University of Tampere Finland

Tampere Unit for Human-Computer Interaction University of Tampere

Tampere 7.10.2010

Dr Vuokko Kohtamäki, University of Tampere, Finland 19.9.2011

Mohammed Asaduzzaman, PhD Department of Management Studies, University of Tampere, Finland

Jari Multisilta Professor Tampere University of Technology

Globelics Academy Tampere, Finland June 2008

25.4.2014 Tampere

Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere

eGovernment in Tampere

Tampere in Brief

TUT – Tampere University of Technology

pizzeria tampere|pizzat tampere|kebabit tampere|falafelit tampere|hampinpizzeria

Tampere Today

Alpo Värri Institute of Signal Processing, Tampere University of Technology Tampere, Finland

Research activities at Tampere University of Technology

apartments tampere