Course on Data Mining (581550-4): Seminar Meetings

16.11. 02.11. 23.11. 09.11. 30.11. Seminar by Mika M Seminar by Pirjo P Course on Data Mining (581550-4): Seminar Meetings Ass. Rules Clustering P P Episodes KDD Process M P Text Mining Home Exam M

Course on Data Mining (581550-4): Seminar Meetings • R. Feldman, M. Fresko, H. Hirsh, et.al.: "Knowledge Management: A Text Mining Approach", Proc of the 2nd Int'l Conf. on Practical Aspects of Knowledge Management (PAKM98), 1998 • B. Lent, R. Agrawal, R. Srikant: "Discovering Trends in Text Databases", Proc. of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining, 1997. Today 16.11.2001

Course on Data Mining (581550-4): Seminar Meetings • Both papers refer to the Agrawal and Srikant paper we had last week: Rakesh Agrawal and Ramakrishnan Srikant: Mining Sequential Patterns. Int'l Conference on Data Engineering, 1995. Good to Read as Background

Knowledge Management: A Text Mining Approach R. Feldman, M. Fresko, H. Hirsh, et.al Bar-Ilan University and Instict Software, ISRAEL; Rutgers University, USA; LIA-EPFL, Switzerland Published in PAKM'98 (Int'l Conf. on Practical Aspects of Knowledge Management) Data Mining course Autumn 2001/University of Helsinki Summary by Mika Klemettinen

KM: A Text Mining Approach • Basic idea (see selected phases on the next slides): 1. Get input data in SGML (or XML) format Select only the contents of desired elements! (title, abstract, etc.) 2. Do linguistic preprocessing: 2.1 Term extraction (use linguistic software for this) 2.2 Term generation (combine adjacent terms to morpho- syntactic patterns like "noun-noun", "adj.-noun", etc. by calculating association coefficients) 2.3 Term filtering (select only the top M most frequent ones) 3. Create taxonomies (there is a tool for this) 4. Generate associations (you may constrain the creation) 5. Visualize/explore the results

2.1: Term Extraction

3: Taxonomy Construction

4: Association Rule Generation

5.1: Visualization/Exploration

5.2: Visualization/Exploration

Discovering Trends in Text Databases Brian Lent, Rakesh Agrawal and Ramakrishnan Srikant IBM Almaden Research Center, USA Published in KDD'97 Data Mining course Autumn 2001/University of Helsinki Summary by Mika Klemettinen

Discovering Trends in Text Databases • Basic ideas: • Identify frequent phrases using sequential patterns mining (see the slides & summaries from the Agrawal et. al paper "Mining Sequential Patterns" (MSP)) • Generate histories of phrases • Find phrases that satisfy a specified trend • Definitions: • Phrase: phrase p is  (w1)(w2) … (wn ), wherew is a word • 1-phrase:  (IBM) (data)(mining)  • 2-phrase:  (IBM) (data)(mining)   (Anderson) (Consulting)  (decision)(support)  • Itemset, sequence, is contained, etc.: as in MSP paper

Discovering Trends in Text Databases • Gaps: Minimum and maximum gaps between adjacent words: identify relations of words/phrases inside sentences/paragraphs, between words/phrases in different paragraphs, between words/phrases in different sections, etc. • Sentence boundary: 1000 • Paragraph boundary: 100.000 • Section boundary: 10.000.000 • Phases: • Partition data/documents based on their time stamps, create phrases for each partition (Lent & al. have patent data documents) • Select the frequent phrases and save their frequences • Define shape queries using SDL (Shape Definition Language)

Discovering Trends in Text Databases

Course on Data Mining (581550-4): Seminar Meetings

Course on Data Mining (581550-4): Seminar Meetings

Presentation Transcript

CSE 634 Data Mining Concepts and Techniques Association Rule Mining

Data Mining: Preprocessing Techniques

Chapter 3: Data Mining and Data Visualization

Mining data with PolyAnalyst

Data Mining on Streams

Web Mining

CSE 538 Web Search and Mining Web Crawling

CS490D: Introduction to Data Mining Prof. Walid Aref

What we have covered?

MMDSS 2007 Data stream management and mining

DATA MINING FOR INTRUSION DETECTION

Monte F. Hancock, Jr. Chief Scientist Celestech, Inc.

Data Mining Algorithms for Recommendation Systems

Data Mining with Big Data

Spatial Data Mining

Data Mining: Concepts and Techniques

CENG 464 Introduction to Data Mining